Tour de CLARIN highlights prominent User Involvement (UI) activities of a particular CLARIN national consortium. This time the focus is on the DLU and Cora Pots, a PhD student in the Quantity and quality in linguistics project at the University of Leuven. The following interview took place via Skype and was conducted and transcribed by Jakob Lenardič.
1. Could you briefly describe your academic background and your current position?
My interest in linguistics started when I was studying for my Bachelor’s Degree at Utrecht University. It was a two-track programme in modern literature and linguistics. In the linguistics track, I was working in the generative framework, primarily child language acquisition and syntax. I did an internship where I researched speech perception and language development in younger children, which really sparked my interest in research and inspired me to pursue it as a career choice. In my Master’s Degree studies, I slightly changed the focus of my research and began investigating syntactic variation in Germanic languages. When I became a Research Assistant at the Meertens Institute under the supervision of Professor Sjef Barbiers, I began combining my formal background with a computational approach to linguistics and co-wrote the Educational Module for the MIMORE tool, which is used to investigate the morphosyntactic variation of Dutch dialects. After obtaining my Master’s Degree, I worked on various projects; for instance, I was a part-time lab manager at the Babylab at Utrecht Institute of Linguistics (Utrecht University), and I furthermore worked for the AnnCor project - a project to make CHILDES, a large collection of corpora of child language, syntactically searchable. In 2016, I started my current position, which is a 4-year PhD track at the University of Leuven, so I'm halfway through now.
2. How did you start collaborating with CLARIN DLU? Could you briefly describe the project you’re currently involved in, called Quantity and quality in linguistics: reverse dialectometry?
I started the tools and services of CLARIN DLU when I became a PhD student at the KU Leuven (Catholic University of Leuven). In this project, I investigate the formal properties of Dutch dialects/regiolects as spoken both in Flanders and the Netherlands, for which I use tools and resources provided by the Dutch and the Flemish consortia (OpenSoNaR, MIMORE)
3. Why is investigating non-standard language valuable for linguistic theory?
I’d like to answer this question by exemplifying a syntactic phenomenon related to infinitives in verb clusters. In Dutch dialects, the position of the infinitival marker te, which is the equivalent of English to, varies in the sense that in some dialects it gets doubled in a verbal cluster (for instance, te zitten te werken, literally “to sit to work”), while in other dialects one of the markers gets dropped either in the first (_ zitten te werken) or in the second position (te zitten _ werken). This is a linguistic fact that you wouldn’t be able to observe if you studied only the official variant of Dutch, which is kept in check by the prescriptive rules, so such empirical data from the dialects actually give you a far more complex insight into the grammatical structure of Dutch infinitives. Additionally, MIMORE also shows the geographical distribution of the variation, which then allows you to investigate other possible grammatical phenomena that are also tied to the same pattern. Needless to say, without a tool like MIMORE, it would be impossible to attain such insights into the linguistic structure of the Dutch language.
4. Given that you are an early-stage researcher, could you share your experience on how CLARIN can support researchers who are just starting their research and career? Do you have any advice for your fellow novice researchers?
What is great about CLARIN is that it allows you to explore a wealth of data that are already collected. Whenever you start working on a research project, you normally don’t have any idea what’s actually going on in the linguistic data. However, using a tool like MIMORE, you can quickly start working on a topic without having to do the field work yourself, which would of course be extremely time consuming. What is more, such resources have already been parsed and annotated by experts, so this is another aspect of CLARIN that I find amazing; it allows you to start applying its tools and resources fairly quickly even if you don’t have a lot of technical skills or a computational background.
My advice is to simply start using the available tools and resources, no matter whether you’re a student or a more advanced researcher. The main problem, I think, is that not many people are aware of the research possibilities that CLARIN tools and resources afford. I know that there are many young researchers who study Dutch dialects, for example, and who would greatly benefit from using the CLARIN resources like the MIMORE databases, which are a goldmine of data. In this respect, these databases are very valuable since they not only consist of Dutch data as spoken in the Netherlands, but also all the Flemish dialects.
5. What can the Flemish consortium offer researchers working in the generative tradition?
I believe that GrETEL, which is a tool that is developed by the Flemish consortium, has proven itself to be a very valuable service for a generative grammarian. What GreTEL primarily does is it allows you to efficiently search for specific syntactic constructions in the MIMORE databases without having to rely on technical knowledge about complex query languages. Normally, you would have to spend a lot of time searching a database for all the variants of a specific syntactic construction, but with GrETEL, you only input an example of your own that conforms to the syntactic pattern you’re interested in, and you immediately get all the relevant data.
6. Is working with a research infrastructure an established practice in your research community?
Well, it depends on what you consider my research community. As far as my fellow PhD students are concerned, a lot of them indeed make use of research infrastructures. Within the formal framework, however, researchers use it infrequently. Though I don’t think it should be obligatory for generative grammarians to use corpus data in all their work, I still believe that many non-empirical researchers would still find it very helpful if they checked their claims in corpora. For one thing, corpus data can show you that your intuition about a linguistic phenomenon isn’t really all that representative across dialects. On the other hand, using a corpus-based approach can provide a very good stepping stone for a beginner, since it quickly shows you what the relevant linguistic situation looks like, from which you can then move on to making formal claims.
7. Since you are also involved in teaching, which is a major priority of CLARIN’s user involvement initiative, can you tell us how you integrate the resources and tools provided by the Flemish consortium in your courses? Do you have any suggestions how the link between CLARIN and university curricula could be strengthened?
I teach a Master’s course on syntactic variation in Dutch and Flemish dialects with my supervisor. We usually spend one class showing the students how to use MIMORE. We ask them to pick a specific formal topic or problem and then investigate how the formal claims correspond to the data in MIMORE. We then show the students how to use such data in their writing assignments. The main goal of the course, which I think is an important one especially in terms of bridging the gap between the empirical and formal worlds, is how to work with a large dataset (MIMORE comprises of 267 dialects) and apply the empirical data to a formal analysis, which is far from a trivial problem.
In the end, students really like this approach because it often allows them to get fairly novel results even at the beginning stages. And many syntactic topics would be impossible to tackle were it not for these tools. We also use GrETEL to help students write their Bachelor’s and Master’s theses.
As for university curricula in general, I think the main problem is that only few teachers have experience in combining both of these worlds – that is, formal analyses with empirical research. I think the first step that has to be made is to encourage professors, post-docs and PhD students who also teach to become aware of these tools and resources and show them how to implement them in their courses. I think that guidelines like the Educational Module could really help in this regard.
8. Given that you have experience with two CLARIN consortia (apart from CLARIN DLU, you previously worked with CLARIN-NL), could you describe how the two consortia complement each other?
What must be understood is that the division between the Dutch speaking part of Belgium and the Netherlands is a political state of affairs that does not correspond to the division of dialects. That is, dialects do not know political borders. However, tools and resources like MIMORE and GrETEL, which were often developed in collaboration by researchers working with both consortia, also contain data from Flemish dialects along with the data from Dutch as spoken in the Netherlands. Consequently, such tools which transcend borders in this sense are really the only way to get an accurate linguistic representation of our language, and it’s for this very reason that in our courses/thesis supervision at the KU Leuven we use both GrETEL, which is “our” tool, and MIMORE, which was developed by CLARIAH-NL. Additionally, the Educational Module that showcases MIMORE and GrETEL is still being updated by Sjef Barbiers, who is from the University of Leiden, and Ineke Schuurman and Liesbeth Augustinus, who work at the KU Leuven Leuven, which I think is a great cross-border collaboration. Consequently, I see no reason why other consortia as well should not collaborate in a similar manner, especially if the languages in question are similar.
9. What would you say is the first thing CLARIN should do to be even more useful for researchers in your field?
I would find it really wonderful if researchers could use tools like GrETEL and MIMORE to search historical variants of Dutch (dialects). I also know that there is a lot of dialect material that cannot yet be accessed, which is something that I would like to see available through CLARIN one day, but I understand that this is often related to copyright problems.
Click here to read more about Tour de CLARIN