
How should a vocal signal be represented for topologically-augmented classification? Extracting robust topological characteristics from speech signals has recently emerged as a promising approach for improving classification performance. In this study, we compare the topological information extracted from several signal representations, including Takens’ embeddings, spectrograms, and spectrogram zeros. Using a dataset of 11,200 recorded vowel utterances (publicly available), we conduct an empirical analysis demonstrating that these topological features provide additional discriminative information for both speaker and vowel classification. Moreover, features derived from different signal representations appear to be complementary. Interestingly, our results suggest that low-persistence topological features, often dismissed as “topological noise”, encode important information about speech.
Guillem Bonafos, Pierre Pudlo, Jean-Marc Freyermuth, Samuel Tronçon, and Arnaud Rey.
2026.
Speech Communication 178 (March): 103363. — @HAL