Simone Fuscone, Benoit Favre & Laurent Prévot
The reproducibility of scientific studies grounded on language corpora requires approaching each step carefully, from data selection and pre-processing to significance testing. In this paper, we report on our reproduction of a recent study based on a well-known conversational corpus (Switchboard). The reproduced study Cohen Priva et al. (J Acoust Soc Am 141(5):2989–2996, 2017) focuses on speech rate convergence between speakers in conversation.
While our reproduction confirms the main result of the original study, it also shows interesting variations in the details. In addition, we tested the original study for the robustness of its data selection and pre-processing, as well as the underlying model of speech rate, the variable observed. Our analysis shows that another approach is needed to take into account the complex aspects of speech rate in conversations. Another benefit of reproducing previous studies is to take analysis a step further, testing and strengthening the results of other research teams and increasing the validity and visibility of interesting studies and results. In this line, we also created a notebook of pre-processing and analysis scripts which is available online.