Tour de CLARIN: Interview with Jose Pérez-Navarro

Submitted by Jakob Lenardič on 11 November 2020

Jose Pérez-Navarro is a PhD candidate at the University of the Basque Country who works in developmental cognitive neuroscience.

1. Please introduce yourself. Could you briefly describe your research background and current position?

I am a developmental cognitive neuroscientist, interested in the cognitive and neural underpinnings of language acquisition. Currently, I work as a predoctoral researcher at the Basque Center on Cognition, Brain and Language (BCBL), and I am a PhD candidate at the University of the Basque Country (UPV) at San Sebastian, Spain.

2. What is your involvement with the Spanish CLARIN K-Centre? Could you briefly describe the collaboration?

My collaboration with the Spanish CLARIN K-Centre is through Mikel Iruskieta, who is the coordinator of the Centre. I reached him when I read the article by Otegi et al. (2017) about ANALHITZA, which is a tool for performing linguistic annotation of Basque, English and Spanish. I thought the tool could be used to enrich the corpus that me and my PhD supervisor Marie Lallier are working on, since to my knowledge it is the only tool for performing language annotation equally in Basque and Spanish. Although it is not specifically designed for spoken language (it is rather primarily aimed at written registers), it does a great job in annotating our corpus, which includes speech productions of Basque-Spanish bilingual children, and consequently a considerable proportion of Basque-Spanish code-switching.

3. How have you used the tools ANALHITZA? What made it beneficial for your research purposes?

Thus far I have benefited from the tool’s lemmatization component. We have relied on the expertise of Mikel Iruskieta and the IXA group who helped us batch-analyse the corpus, and specifically the subsets that correspond to the two first stages of bilingual language learning, which takes place in children who are between four and five years old. ANALHITZA is also crucial for me because it takes into account both Spanish and Basque, and provides outputs with comparable indexes in each language. That is to say, the tool has allowed us to determine, in an equally robust way for both languages, whether linguistic aspects like the lexical diversity or the morphosyntactic complexity of bilingual children are similarly dependent on factors of the bilingual environment, such as the amount and quality of the exposure to each of the languages, rather than the greater linguistic differences between the two languages.

4. Could you describe some of your on-going work in child language acquisition and bilingualism?

My PhD work focuses on the amount of exposure to each language within bilingual contexts, and how it shapes language acquisition at a cognitive and neural level. To this end, my supervisor and I are acquiring data on children who have not yet learnt to read, and analysing how different naturalistic measures of language acquisition, such as spontaneous speech productions or the brain’s ability to synchronize speech in either Spanish or Basque, are influenced by the relative amount of exposure to each of these languages. Some preliminary results of this longitudinal project, in which we also use ANALHITZA, show that the amount of exposure and the children’s age are crucial factors for the acquisition of phonology, vocabulary and morphosyntactic proficiency. In other words, we show that, at least during early language learning, when the vast majority of linguistic input is oral, proficiency in a given language is highly dependent on the amount of exposure to it.

5. Why is the Spanish K-Centre especially important for researchers working with Basque?

I believe that the K-Centre is especially important because it provides knowledge on two languages, namely Basque and Spanish, that share only a small amount of linguistic complexity and features but are very different in many other aspects. Looking at both of these two languages contrastively can lead to relevant discoveries about what aspects of the influence of the amount of exposure to a language on language learning are more universally shared (e.g., phonological and speech comprehension abilities that allow us to understand speech as it unfolds over time) and which features could be more specific to either Spanish or Basque (e.g., acquiring a proficient degree of morphosyntactic knowledge). Thanks to the expertise on corpus annotation which the IXA group leading the Spanish Knowledge Centre is supporting us with, we are able to more efficiently analyse how Spanish and Basque develop using language tools like the aforementioned ANALHITZA, thus getting a precise snapshot of how they are used in everyday settings.

6. Are there any tools and resources that you would like to see developed in the future that either your field or the Basque language community could benefit from?

In the field of bilingual language learning there are already several useful tools that the IXA group has developed; apart from ANALHITZA, there is also Ixati, which provides morphosyntactic analysis and phrase chunking for Basque. I do not know if it is already in the catalogue of the Knowledge Centre, but a tool that accounts for code-switching between Basque and Spanish in a single utterance and disentangles whether it is a Basque utterance with certain Spanish nouns or the other way around would be wonderful for working with corpora of natural language. Another example of a tool that would be valuable for our current research would be a syntactic parser trained on Basque and Spanish spoken language, which to my knowledge does not yet exist.

7. What is in store for your future collaboration with the Spanish CLARIN Knowledge Centre?

Since we work on longitudinal projects, there are several linguistic aspects of our emergent corpora that we cannot predict very precisely, especially given that they have not been previously assessed in the Basque-Spanish bilingual combination. For instance, it is still unclear to which extent the morphosyntactic development of children remains independent in each language and at which age children start to generalize the knowledge in one language to build increasingly complex syntactic structures in both languages. By finding the most suitable methods that can account for all the linguistic variability that takes place in speech production during language learning, we can capture the emergence of structures of increasing complexity in both languages. Therefore, we plan to reassess whether the existing tools can be improved by the Knowledge Centre to account for such variability at an even more efficient rate – I am also happy to say that thus far the Centre has always accommodated our requests.