Quantitative models of early language acquisition by Emmanuel Dupoux
The past 40 years of psycholinguistic research has shown that infants learn their first language at an impressive speed. During the first year of life, even before they start to talk, infants converge on the basic building blocks of the phonological structure of their language. Yet, the mechanisms that they use to achieve this early phonological acquisition are still not well known. We show that a modeling approach based on machine learning algorithms and speech technology applied to large speech databases can help to shed light on the early pattern of development. First, we argue that because of acoustic variability, phonemes cannot be acquired directly from the acoustic signal; only highly context dependent and talker dependent phones or phones fragments can be extracted in a bottom-up way. Second, words cannot be acquired directly from the acoustic signal either, but a small number of protowords or sentence fragments can be extracted on the basis of repetition frequency. Third, these two kinds of protolinguistic units can interact with one another in order to converge with more abstract units. The proposal is therefore that the different levels of the phonological system are acquired in parallel, through increasingly more precise approximations. This accounts for the largely overlapping development of lexical and phonological knowledge during the first year of life.