
What self-supervised speech models know about animal sounds: Deep transfer learning and the evolution of acoustic communication across species
October 2 @ 14:00 - 18:00
Thesis defense of Jules Cauzinille
Thesis supervisors : Benoit Favre (directeur de thèse), Arnaud Rey, Ricard Marxer
Jury :
Benjamin Lecouteux (président du jury)
Nicolas Farrugia (rapporteur)
Marie Tahon (rapporteur)
Emmanuel Chemla (examinateur)
Abstract:
This thesis introduces a novel approach to the study of vocal communication and its evolution across species through the use of deep transfer learning. We ask whether, and how, self-supervised models trained on speech data may provide suitable representations of bioacoustic information. Our initial focus is on non-human primates and is motivated by the apparent gap between state-of-the-art speech processing methods and the way the vocalizations of our closest living relatives are handled in recent computer science literature. We present experiments centred around transferring knowledge from speech to gibbon songs and propose a set of hypotheses on the evolution of acoustic communication and its phylogenetic ties. A second set of experiments aims at challenging initial hypotheses by extending the experimental framework across a broader range of species and tasks where we explore speech models’ latent representations through innovative probing strategies. We conclude on theoretical perspectives related to the convergent evolution of vocal communication across species, while advocating for the integration of speech in future research aimed at developing general bioacoustic foundation models.