PhD projects

2022 - 2025

High-level cerebral representations of vocalizations investigated in macaques using electrophysiology and deep neural networks

Josephine Raugel
Supervisors : Pascal Belin, Institut des Neurosciences de La Timone, Thierry Artières, Laboratoire d'Informatique et Systèmes

We propose to investigate higher-level cerebral representations of conspecific vocalizations in macaques, taking advantage of an exceptional conjunction of techniques and expertise offered by ILCB: measurement of spiking activity of 100’s of single neurons in fMRI-localized macaque voice patches, in response to 1000’s of natural or synthetic macaque vocalizations, and modeling of the vocalizations and associated neural responses via advanced neural network modeling approaches. Experiments will aim to examine whether representations of macaque vocalizations provided by the layers of deep convolutional networks trained with the vocalizations map linearly onto measures of neural activity in the macaque voice patches, explaining a large part of the variance in neural activity based on clearly identified high-level representations, and allowing demonstrations such as reconstructions of vocalization stimuli from the neural activity. Results will provide considerable advances in our understanding of the cerebral mechanisms of voice information processing in one of our closest primate relatives, and open a unique comparative window by allowing direct comparison with similar data currently obtained for voice in the human brain.


1/10/2022 - 31/10/2025

Vers une approche comparée de la complexité des signaux de communication : apports croisés de l’éthologie et de la linguistique

Lise Habib-Dassetto
Supervisors : Marie Montant, Laboratoire de Psychologie Cognitive et Cristel Portes, Laboratoire Parole et Langage

De plus en plus de chercheu.e.s en linguistique, en psychologie comparée et en éthologie estiment qu’une approche comparative des systèmes de communication chez les primates humains et non-humains centrée sur la syntaxe et sur le canal vocal/auditif est trop restrictive pour comprendre au mieux la complexité des systèmes de communication et les origines évolutives du langage humain. Ce projet a pour but d’étudier la complexité de la communication multimodale lors d’interactions sociales spontanées chez les babouins de Guinée en mêlant une approche quantitative empruntée à l’éthologie (analyse de séquences comportementales, clustering, mesure de la complexité structurelle et contextuelle sur la base de l’entropie de Shannon) et une méthode qualitative empruntée à la linguistique interactionnelle (analyse de paires adjacentes, turn-taking, réparation de l’interaction, etc.). Les babouins sont un modèle de choix pour l’étude de l’histoire évolutive du langage humain puisqu’ils possèdent une socialité complexe et un habitat naturel ouvert, proches de ceux rencontrés dans l’histoire des sociétés humaines. Nous souhaitons développer des outils quantitatifs permettant de comparer la complexité de systèmes de communication de diverses espèces de primates qui ne soient pas centrés sur des caractéristiques définitoires du langage humain telles que la syntaxe. Nous souhaitons également transposer à l’étude de la communication des primates non-humains les méthodes et outils d’analyse de la linguistique interactionnelle et ainsi, identifier plus précisément ce qui, dans la complexité de l’interaction, constitue un socle commun aux primates humains et non-humains.


1/10/2022 - 31/10/2025

Self-supervised representation learning of primate vocalizations, from analysis to synthesis

Jules Cauzinille
Supervisors : Benoît Favre, Laboratoire d’Informatique et Systèmes, Ricard Marxer, Laboratoire d’Informatique et Systèmes, Thierry Legou, Laboratoire Parole et Langage, Arnaud Rey Laboratoire de Psychologie Cognitive

Recently, deep learning models and their progressive shift towards self-supervision and transfer learning approaches, started yielding impressive results in natural language processing applications, and showing high potential when used as research tools for linguistics and the understanding of human communication. While they has been fairly successful in the context of human language, its application to animal communication remains under-explored. The study of animal vocalizations could benefit from the latest advances in deep representation learning, especially when it comes to processing sound and images with very little to no supervision. Recent advances in digital sound recording and the passive acoustic monitoring approach that is being invested in many ethology studies are now giving such projects an essential role in the evolution of bioacoustic research ( Ness et al. [2013]). In addition to that, very similar topics are being explored with great success in speech processing with non-textual unsupervised models for automatic unit discovery, speech synthesis and other tasks based on raw audio data ( Hsu et al. [2021], Lakhotia et al. [2021], Borgholt et al. [2022]). We think that these topics should be applied to animal vocalizations and their inherent non-textual nature and that the study and probing of self-supervised acoustic models, trained without the need for manual labeling of instances, is a necessary step towards the inclusion of animal communication
studies in the deep learning revolution. We want to explore the behavior and performances of unsupervised models trained on animal acoustic recordings. Can speech-based approaches even apply to this very distinct type of audio and what would this teach us about the interface between human and non-human communication? With the intent of setting an innovative experimental setup aimed at answering these questions, we propose a PhD project at the crossroads between bioacoustics, speech processing, primatology and computer science. The idea is to address specific problems in the field of automatic animal communication processing and to build computational tools for the study of unsupervised representation learning on different types of acoustic data. The project will mostly focus on human and non-human primates but may also include experiments in processing soundscapes and multi-source noise. It will be articulated as a series of experiments, ranging from the collection of new datasets to the implementation of non-textual audio-based representation learning models, their subsequent training on primate vocalizations, computational models probing, and attempts at data-driven synthesis of animal vocalizations. It will also explore auxiliary questions regarding the pre-processing of animal acoustic datasets and a comparison between corpora recorded in captivity and in the wild. Finally, the project will lay the foundations for future methodological work in specific domains such as nonverbal human speech processing, the probing of deep representation learning models, cross-species acoustic transfer learning and deep learning architectures. All data and models produced within the project will be made available to the community.


1/11/2022 - 31/10/2025

The Spatiotemporal Dynamics of Syntax across Language Modalities

Bissera Ivanova
Supervisors : Kristof Strijkers, Laboratoire Parole et Langage, Benjamin Morillon, Institut de Neurosciences des Systèmes & Liina Pylkkänen, NYU, USA.

The goal of this project is to study the neurobiology of syntax, the brain’s ability to structure linguistic elements and create meaning, focusing on two understudied topics: (1) Spatiotemporal dynamics, namely ‘how do the neural sources of basic combinatorial syntax manifest over time?’; (2) Language modality, namely ‘do the perception and production of combinatorial syntax rely on the same neural machinery?’. Addressing these two questions is important because the quite disperse neurobiological models of syntax make different predictions with regard to the degree of spatiotemporal and language modality integration. Hence, investigation these two dimensions will provide insights to constrain brain language models from a novel perspective. More concretely, in this project we will focus on the two core brain areas routinely associated with syntactic processing, the left posterior temporal lobe (LPTL) and the left inferior frontal gyrus (LIFG), and test (1) whether they form a parallel integrated network or rather serve distinct functions over time; and (2) whether their dynamics are identical or not when perceiving vs producing syntax. To test these predictions, we adapt Pylkkänen and colleagues’ minimalistic paradigm to syntax and contrast the final word of simple three-word noun phrases (NPs) that differ in their syntactic tree and computation (e.g., a left-branching Adjective-NP ‘joli buisson fleuri’ vs. a right-branching Adverb-NP ‘buisson joliment fleuri’). This paradigm will be used in three Work Packages (WPs): In WP1 we will test the spatiotemporal dynamics (high-density EEG) in perception (exp1) and production (exp2) separately. In WP2 we will test (MEG) perception and production within the same brain (exp3). And finally, in WP3 we will test the two interlocutors simultaneously (EEG hyper-scanning) while interacting with each other using the minimal NPs (exp4). In sum, this project will assess the degree of spatiotemporal integration of the syntactic parser across the language modalities, allowing to constrain neurobiological models of syntax in function of linguistic behavior and social interaction.


1/11/2021 - 31/10/2024

Le Moteur de la Syntaxe (qu’y a-t-il sous le capot ?)

Raphaël Py
Supervisors : Marie Montant, Marie-Hélène Grosbras, Laboratoire de Neurosciences Cognitives & Claudio
Brozzoli, Centre de Recherche en Neurosciences de Lyon

Il semblerait qu’une syntaxe domain-general supporte à la fois des processus langagiers et moteurs. Sur la
base de nos précédentes études, nous souhaitons éclaircir ce phénomène et apporter des clés de compréhension
sur sa mise en place au cours du développement ontogénique et phylogénique.


1/10/2021 - 30/09/2024

Les feedbacks en interaction : un modèle naturel et son adaptation computationnelle pour une interaction humain/machine

Auriane Boudin
Supervisors : Philippe Blache, Laboratoire Parole et Langage & Magalie Ochs Laboratoire d’Informatique et Systèmes

Cette thèse se propose de construire le premier modèle multimodal de feedbacks conversationnels, permettant d’en prédire la réalisation (position et type) pendant une conversation. Nous utiliserons pour la construction du modèle des techniques d’apprentissage automatique en nous appuyant sur un ensemble de corpus annotés. Le modèle ainsi construit sera implanté dans un agent conversationnel qui nous permettra d’étudier et de comparer pour la première fois l’existence d’une corrélation entre la production des feedbacks et l’engagement des locuteurs dans une conversation dans des interactions humain-humain et humain-machine. Il s’agit là d’une hypothèse régulièrement énoncée, mais jamais vérifiée. Cette thèse conduira à une compréhension approfondie du rôle des feedbacks dans le processus interactionnel.


1/10/2020 - 30/09/2023

Bridging communication in behavioural and neural dynamics

Isaïh Mohamed
Supervisors : Daniele Schön, Institut de Neurosciences des Systèmes & Leonardo Lancia Laboratoire de Phonétique et Phonologie

The aim of this project is to bridge interpersonal verbal coordination and neural dynamics. In practice, we will collect neurophysiological data on individuals (mostly patients with intracranial recordings) performing different interactive language tasks. We will use natural language processing methods to estimate objective features of verbal coordination on speech/language signals. Then we will use machine learning and information theory driven approaches to bridge the dynamics of the coordinative verbal behavior to spatio-temporal neural dynamics.
More precisely, we plan to use several tasks that have been proven to be efficient in the study of verbal interactions. Some tasks are rather constrained and controlled (allowing to manipulate the coordinative dynamics) while others assess conversation in more natural conditions. Speech recordings allow quantifying coordination at different linguistic levels in a time resolved manner. These metrics can then be used to interpret changes in neural dynamics as a function of verbal coordination. We plan to use different approaches, a machine learning approach (decoding the speech signal of the speaker based on the neural signal of the listener) as well as information-theoretic approach (to model to what extent the relation between neural signals and upcoming speech is influenced by the current level of coordination estimated by convergence, for instance).
Overall, this project will allow gathering a better understanding of the link between behavioural coordinative dynamics and neural dynamics. For instance, compared to simple coordinative dynamics, more difficult coordinative behaviour will probably require a change in the ratio between top-down and bottom-up connections between frontal regions and temporal regions in specific frequency bands (increase of top-down beta and decrease of bottom-up gamma).
The strength of this project is to merge sophisticated coordination designs, advanced analysis of verbal coordination dynamics and front edge neuroscience tools with unique neural data in humans.


1/10/2020 - 30/09/2023

Ouvrir une fenêtre sur l'esprit des lecteurs : Détermination par TMS et EEG du réseau cortical impliqué dans le comportement oculomoteur de lecture

Régis Mancini
Supervisors : Françoise Vitu, Laboratoire de Psychologie Cognitive & Boris Burle Laboratoire de Neurosciences cognitives

Les mouvements oculaires pendant la lecture ont été étudiés depuis plus d’un siècle, révélant un comportement très stéréotypé, en dépit même d’une importante variabilité de l’amplitude des saccades et des positions des fixations sur les lignes de texte. La majorité des modèles proposés pour rendre compte de ce comportement repose sur un guidage cognitif du regard, et suppose donc un contrôle essentiellement descendant. Ces modèles descendants sont néanmoins contredits par le fait rapporté récemment qu’un modèle analphabète de programmation des saccades dans le colliculus supérieur, une structure sous-corticale multi-intégrative, prédise assez fidèlement le comportement oculomoteur des lecteurs simplement à partir de traitements visuels précoces effectués dès la rétine (contraste de luminance). Ce résultat suggère au contraire un rôle secondaire du néocortex dans le contrôle oculomoteur pendant la lecture.
La thèse envisagée aura pour but d’une part de caractériser le réseau cortical impliqué dans le contrôle oculomoteur pendant la lecture, et d’autre part de déterminer la dynamique temporelle d’activation de ces différentes aires corticales. Ces recherches reposeront d’abord sur l’utilisation de la stimulation magnétique transcrânienne (TMS), permettant d’inactiver transitoirement une aire corticale donnée chez des participants sains, conjointement à l’enregistrement des mouvements oculaires pendant une tâche de lecture de phrases. L’effet de l’inactivation d’une aire corticale donnée sur les comportements oculomoteurs classiquement observés renseignerait donc de son implication dans la lecture. Dans un second temps, les études TMS seront complétées par une approche basée sur des enregistrements électroencéphalographiques (EEG).


Eye movements during reading have been studied for more than a century, revealing a very stereotyped behaviour, despite a significant variability in the amplitude of saccades and the positions of fixations on the lines of text. Most of the models proposed to account for this behaviour are based on a cognitive guidance of the gaze, and therefore presuppose an essentially top-down control. These top-down models are nevertheless contradicted by the recently reported fact that an illiterate model of saccade programming in the superior colliculus, a multi-integrative subcortical structure, fairly accurately predicts the oculomotor behaviour of readers simply from early visual processing (luminance contrast). This result suggests on the contrary a secondary role of the neocortex in oculomotor control during reading.
The thesis aims on the one hand to characterize the cortical network involved in oculomotor control during reading, and on the other hand to determine the temporal dynamics of activation of these different cortical areas. This research is primarily based on the use of transcranial magnetic stimulation (TMS), which temporarily inactivates a given cortical area in healthy participants, in conjunction with the recording of eye movements during a sentence-reading task. The effect of the inactivation of a given cortical area on the oculomotor behaviours classically observed would therefore indicate its involvement in reading. In a second step, TMS studies will be complemented by an approach based on electroencephalographic (EEG) recordings.

1/10/2020 - 30/09/2023

Development of Children's Communicative

Mitja Nikolaus
Supervisors : Abdellah Fourtassi, Laboratoire d'Informatique Système, Laurent Prévot, Laboratoire Parole et Langage

Research Lab: CoCoDev
The study of how the ability for coordinated communication emerges in development is both an exciting scienti c frontier | at the heart of debates about the uniqueness of human cognition (Tomasello, 2014) | as well as an important applied issue for AI (Antle, 2013).
Early signs of coordination (e.g., through gaze and smile) can be found in preverbal infants (Yale, 2003), but the ability to engage in coordinated verbal communication (Clark, H. & Brennan, 1991) takes years to mature.
Learning such coordination, especially with the caregivers, is crucial for the child's healthy cognitive development (Ho , 2006; Gelman, 2009).
Very few studies examined the nature of children's communicative coordination and its development in the natural environment (that is, outside controlled laboratory studies).
Further, existing naturalistic studies (e.g., Clark, E. 2015), though insightful, have been based on anecdotal observations, leading to rather qualitative conclusions.
Thus, previous work did not provide any theoretical model that could explain, quantitatively, the naturally occurring data, let alone provide a basis for theory-informed applications. This project will contribute to ll this gap.
We will combine AI tools from NLP and Computer Vision to study the multimodal dynamics of children's communicative coordination with caregivers, laying the foundation for a data-driven model that would 1) provide us with a scienti c understanding of the natural phenomena and 2) guide us through the design of child-computer interaction systems that can be used to test and evaluate the model.

Antle (2013). Research opportunities: Embodied child{computer interaction. International Journal of Child-Computer Interaction.
Clark, E. (2015) Common ground. The Handbook of Language Emergence.
Clark, H. & Brennan (1991). Grounding in communication. Perspectives on socially shared cognition.
Gelman (2009). Learning from others: Children's construction of concepts. Annual review of psychology.
Ho (2006). How social contexts support and shape language development. Developmental Review.
Tomasello (2014). A natural history of human thinking. Cambridge, MA: Harvard University Press.
Yale, Messinger, Cobo-Lewis, & Delgado (2003). The temporal coordination of early infant communication. Developmental Psychology.

1/11/2019 - 31/10/2022

The emergence of social conventions in the categorisation of speech sounds

Elliot Huggett
Supervisors : Noël Nguyen, Laboratoire Parole et Langage, Nicolas Claidière, Laboratoire de Psychologie Cognitive

The ability to establish shared conventions is a fundamental part of linguistic categorisation behaviour. Across all levels of a language, speakers of that language must agree upon ways in which the world is divided up, in order for successful communication to be possible. This behaviour is often studied in the domain of semantics, with words for colours and kinship terms being prime examples of problems that need to be solved with categorisation, but have many different possible solutions that are found in the worlds languages. A categorisation problem with similarly diverse cross-linguistic solutions that remains unstudied, however, is the categorisation system that must be established for a language to have a shared, consistent phonology. The space of possible vowels is continuous, and must be divided up into discrete categories, not only as a function of the individual perceptions of one speaker, but taking into account the perceptions of all speakers of the language, so that a shared and consistent solution can be reached. This solution will be the result of generations of speakers interacting and constantly solving the problem, with the solution of a given language being a culturally evolved behaviour, with the categorisation system at any given time being not only a result of dynamics between speakers of the language at that time, but the dynamics between speakers of that language stretching back generations.
In this project we aim to investigate the effects different group dynamics have upon conventionalisation of speech sound categories. This will be done through a series of innovative speech categorisation experiments, in which participants are trained on categories at the extremes of an acoustic space bounded by sounds unfamiliar to them in their native language, and then asked to categorise other sounds existing within this space. Initially, this will be performed individually to establish their prior tendencies. Following from this, they will be assigned to pairs, larger groups, or iteration chains, and the ways in which their categorisation becomes conventionalised under these different social dynamics will be studied. The results from these experiments will be used to inform the creation of multi-agent models, to help us better understand the dynamics at play and the ways in which interaction and transmission lead to the conventionalisation of categories in the unfamiliar acoustic space. Over the course of the project new methods and analyses, drawing on and combining the literatures surrounding phonology, categorisation behaviour, and cultural evolution will be developed, providing key insight and highlighting further areas of interest in a fundamental part of linguistic behaviour.

1/10/2019 - 30/09/2022

Wavelet-based muldimensional characterizaon of brain networks in language tasks

Clément Verrier
Supervisors : Bruno Torrésani, Instut de Mathémaques de Marseille ,  Christian Bénar, Instut de Neurosciences des Systèmes

Brain function involves complex interactions between cortical areas at different spatial and temporal scales. Thus, the spatio-temporal definition of brain networks is one of the main current challenges in neuroscience. With this objective in view, electrophysiological techniques such as electroencephalography (EEG) and magnetoencephalography (MEG) offer a fine temporal resolution that allows capturing fast changes (at the level of the millisecond) across a wide range of frequencies (up to 100 Hz).
However, the spatial aspects require solving a difficult (extremely ill-posed) inverse problem that projects the signals recorded at the level of surface sensors to the cortex. Current techniques for extracting spatio-temporal networks in MEG and EEG suffer from the inherent difficulties arising from solving the inverse problem. We propose to use a novel wavelet analysis approach in order to improve the extraction of language networks from MEG signals. The methods will be validated using simultaneous MEG-intracerebral EEG recordings. More precisely, the objective is to develop algorithms and data analysis procedures for spatio-temporal characterization of brain networks across multiple frenquencies, for EEG and MEG signals, validate them on simulated and real signals, and apply the developed methodology on language protocols in the framework of ILCB.

1/11/2018 - 31/10/2021

Implication of Subcortical Brain Structures and Cortico-Subcortical Loops in Early and Late Stages of Speech Motor Sequence Learning: a Within Subject fMRI/MEG Study

Snežana Todorović
Supervisors : Elin Runnqvist (Aix-Marseille Université, CNRS-LPL), Sonja Kotz (University of Maastricht)
Collaborator: Andrea Brovelli (Aix-Marseille Université, CNRS-INT)

The ability to interpret and produce structured sequences with meaning is undoubtedly at the core of human language. Most frequently, the learning of such sequences in language consists of auditory speech-input being used to learn articulatory speech-output. In my PhD project, I use MEG and fMRI to examine the implication and potential cooperation across different timescales of cortical and subcortical structures involved in skills that are related to, part of, or a prerequisite for the acquisition of new speech motor sequences, such as error monitoring, motor control, vocal learning in songbirds or motor sequence learning in humans and non-human primates. While using MEG will enable us to investigate functional connectivity dynamics between brain regions across different time scales, which is specially relevant when studying an intrinsically dynamical process such as learning that likely happens on different timescales, fMRI will give us necessary spatial resolution for detecting activation and functional networks in subcortical regions.

1/11/2018 - 31/10/2021

Understanding the vocal brain using new deep learning approaches

Shinji Saget
Supervisors : Thierry ARTIERES (PR1 AMU), Pascal BELIN (INT)

01/10/17 - 30/09/20

Rational exploitation of available intonational cues:  

The case of the signaling relationship between the Initial Rise of F0 and the (non-corrective) contrastive focus

Axel Barrault
Supervisors : James Sneed GERMAN (LPL),Pauline WELBY (LPL)

In languages where intonation has a post-lexical function – as in French, intonational contours extend beyond the word level and signal information structure and discourse relations (Ladd, 2008) in spite of the pervasive variability. Despite contingencies, there is no one-to-one mapping between an intonational contour and a discourse function (e.g., Grice et al., 2017). Yet listeners make use of intonational cues to speed up the processing of continuous speech. This lack of invariance (Liberman et al., 1967) has previously led to a probabilistic view of perceptual processing. Listeners continuously integrate several parameters - linguistic or not - interacting together in an incremental manner. For some, prediction is a central mechanism in speech processing, as has been shown for the syntactic (Kleinschmidt & Jaeger, 2015; Kamide, 2012; Fine et al., 2013), pragmatic (Grodner & Sedivy, 2011; Yildirim et al., 2016), and speech processing (Clarke & Garrett, 2004; Bradlow & Bent, 2008; Creel et al., 2008) levels. Intonation processing too involves the integration of 'bottom-up' acoustic cues interacting with 'top-down' predictions that accelerate speech processing (Ito & Speer, 2008; Ip & Cutler, 2017) and on the other hand a rational adaptation mechanism (Kurumada et al., 2014; Roettger & Franke, 2019).

In this project, we seek to determine which factors modulate listeners' evaluation of the evidential strength of intonational cues in order to infer communicative intent. In line with the literature on pragmatic reasoning, we assume that listeners' interpretation can be modeled in terms of Bayesian inferences (Goodman & Frank, 2016). We build on the hierarchical Bayesian model of the evidential strength of intonational cues proposed in Roettger & Franke's (2019) study on German. We attempt to refine the predictions of this probabilistic model by providing data in French on the role of factors contributing to the development of hierarchical prior beliefs about the speaker's production behavior, as well as on the interaction of factors conditioning their actualization through exposure.

To this end, we have been studying in perception and in production experiments a case of probabilistic association between the so-called "initial" rise (IR) in fundamental frequency (F0) and contrastive (non-corrective) focus in French, reported in production (German & D'Imperio, 2016). Several factors influence the presence of IR (Jun & Fougeron, 2000; Welby, 2006). Yet, German and D’Imperio (2016) concluded that this “relatively weak association can nevertheless be informative in a model of interpretation that integrates multiple probabilistic inputs to initial rise occurrence”. In fact, the IR has been consistently shown to be used in word segmentation (Welby, 2007; Spinelli, Grimault, Meunier & Welby, 2010).

The prosodic system of French having its own particularities, it presents a particular angle to study the flexibility of abstract representations. Moreover, the associations between discourse functions and intonational patterns are less regular in French than in the Germanic languages from which most of the studies on listeners' adaptation to the variability of intonational level cues originate. In other words, the nature of a reliable signaling relationship between intonational cues and discourse function in French differs from that in English. This research will allow a better understanding of the link between production and perception by providing information on the phonology of French intonation. However, despite the specificities of the French system, the case of prosodic implementation of contrastive focus examined in our study is observed in many - if not all - of the languages studied. The same is true for the adaptation mechanism observed at all levels of processing. This research will contribute to the discussion of the flexibility of abstract representations and their relationship with variability in interaction with other constraints. And thus, it contributes to the development of a model of the cognitive architecture of human communication.