Tour de CLARIN: Interview with Pilar Barbosa

Submitted by Karina Berger on 13 August 2021

Interview by Jakob LenardiÄŤ

Pilar Barbosa is an associate professor of general and Portuguese linguistics who has benefited from PORTULAN CLARIN tools in the annotation of a spoken-language corpus that she used in her syntactic research.

1. Please introduce yourself – your academic background and current position.

I am an associate professor of General and Portuguese Linguistics at the Department of Portuguese and Lusophone Studies, Universidade do Minho. I hold a PhD in theoretical linguistics from the Department of Linguistics and Philosophy at the Massachusetts Institute of Technology (MIT). My 1995 thesis, entitled Null Subjects, was written under the supervision of Noam Chomsky and Alec Marantz. My research interests are formal syntax, the interfaces between syntax and morpho-phonology and semantics, comparative syntax, experimental syntax, language variation and Portuguese and Romance linguistics. I am currently director of the master’s program in Linguistics at the University of Minho and coordinator of the research group Theoretical and Experimental Linguistics at the Centre for the Humanities at UMinho (CEHUM).

2. With Cristina Flores, you are a part of the so-called 'network of implementation partners' in the context of PORTULAN CLARIN. How did you get involved in the network? What are the goals of your involvement (e.g., what will you contribute to PORTULAN and how will you benefit from PORTULAN)?

I got involved through Professor AntĂłnio Horta Branco, whom I've known since 2000. In 2003, I participated in the project GramaXing, which he coordinated, and then later he invited CEHUM to join the PORTULAN CLARIN network. CEHUM is a humanities research centre and there are several ways in which the PORTULAN CLARIN tools can be helpful. As a linguist, I am particularly interested in spoken corpora, but as part of the projects developed by CEHUM researchers, there are a number of literature and theatre digital databases that will certainly benefit from the tools developed by the PORTULAN CLARIN network.

3. Which PORTULAN CLARIN tools have you used to annotate your corpora? Could you describe the annotation process? Which features made these tools especially valuable for your purposes?

I have used the LX-Tagger, developed by the NLX-Natural Language and Speech Group, which is the coordinating centre of PORTULAN and is led by Professor AntĂłnio Horta Branco. I had the transcriptions of the spoken corpus Sociolinguists Profile of the Speech of Braga in the EXMARaLDA format and I wanted the text to be annotated so as to facilitate automatic searches for particular syntactic constructions. The NLX team ran the LX-Tagger and then we hired two (half-time) research assistants, who manually verified the part-of-speech annotation, here at the Universidade do Minho. Now the annotated corpus is available for public use in the repository of PORTULAN CLARIN.

4. Could you briefly present these corpora? For what kind of research are they (or will they be) intended?

The Sociolinguists Profile of the Speech of Braga is a Portuguese speech corpus with eighty hours of recorded spontaneous speech, aligned with its transcription in the EXMARaLDA format. It is composed of one-hour interviews with speakers from Braga, Portugal, randomly selected and stratified according to sex, age and level of education. Thus constructed, the corpus is representative of contemporary Portuguese spoken in the city of Braga and allows for the study of variation and change in European Portuguese.

5. Your research primarily involves theoretical syntax, which does not often involve the exploration of linguistically annotated corpora. How do you think your field in general can benefit from syntactically parsed corpora?

In Principles and Parameters theory (Chomsky 1986), intra-linguistic and cross-linguistic variation are conceived in the same way, i.e., in terms of a differentiated application of parameters (pre)determined by universal principles. In more recent developments of this research program, such as the Minimalist Program (Chomsky 1995 and subsequent work), the parameters are in the functional lexicon, more particularly, in the feature content of the functional inventory of the language. This framework has been explored in the study of macro-parametric variation and micro-parametric variation within a range of different languages, including Portuguese. However, in order to study intra-linguistic variation and identify ongoing processes of language change, we need to carry out quantitative analyses and these require the collection of data samples that may be considered representative of a linguistics community. This is why I have become interested in corpus data. Annotation enables faster searches for particular constructions.

6. Have you used such corpora or any other tools (developed by PORTULAN) in your syntactic research? Could you briefly present the main results?

The results of our research on the corpus Sociolinguists Profile of the Speech of Braga have been published in a John Benjamins volume, entitled Studies on Variation in Portuguese. I have contributed two papers. In one paper, written in collaboration with Cristina Flores and Ana Bastos-Gee, we studied a particular case of variable syncretism found in the region of Minho in Portugal, involving the 1st and 3rd person singular forms of 'strong' preterites. In the speech of some speakers, these forms can be levelled and levelling can be obtained in two ways, by shifting the 3rd person to 1st person (as in the sentence Não sei se ele.3sg fiz.1sg aquilo... 'I don’t know if he did that ...' where the preterite verb fiz 'did' has first person singular features despite the third person subject pronoun ela 'she') or, alternatively, by shifting the 1st person to the 3rd (as in E então que fez.3sg eu.1sg? 'And then what did I do?', where the preterite verb fez 'did' has third person singular features whereas the pronoun subject eu 'I' has first person features).

Our statistical analysis of the corpus data identified education level among the predictors for levelling. In addition, we discovered that there is a consistent use of a given form per verb within the speech of the same speaker, which led us to the conclusion that variation is not random. In particular, there is inter-individual variation in the choice of the form used for paradigm levelling. Since each individual speaker alternates between the use of the standard form and syncretism, there are two different kinds of variation: intra-and inter-individual. We developed an account of these paradigm levelling effects that is based on the interaction between the internal syntax of strong preterites and the Late Insertion of underspecified functional Vocabulary Items, as proposed in the framework of Distributed Morphology (Halle and Marantz 1993). We proposed a derivation of the different forms in the standard dialect and then offered an analysis of levelling where intra-speaker variation is tied to the probabilistic application of feature-deleting Impoverishment operations along the lines of Nevins and Parrott (2010). Inter-speaker variation is attributed to different choices as to which feature sets are subject to Impoverishment: the features for Person or Tense. This paper is a good example of how corpus data can be used to inform formal theories of morphosyntax.

The other paper, written in collaboration with Maria da Conceição de Paiva and Kellen Cozine Martins, focused on clitic climbing, that is, structures in which clitic pronouns can (optionally) be attached to the highest verb. To briefly illustrate this phenomenon, let’s compare the placement of the clitic se (in bold) in the corpus example

Passa essa ponte, vai deparar-se com outra via rápida e os sinais.

'You go past that bridge and then you will come across another highway and the signs.'

with its placement in the example

Ora, o autocarro sai de paragem, vai-se deparar com uma rotunda…

'Well, the bus leaves the bus stop, it will come across a roundabout…'

Both sentences contain the finite auxiliary vai 'will' complemented by the infinitival verb deparar 'come across'. In the first example, the clitic is attached to the lower infinitival verb, thus forming the complex head deparar-se. In the second, it is attached in a higher position, namely to the finite auxiliary, forming the complex head vai-se. It is this latter phenomenon that is called clitic climbing, the idea being that in such 'climbing' structures the clitic adjoins to the finite verb by moving out of the lower position next to the infinitival verb, in which it was originally inserted as a thematic argument of the infinitival.

According to previous work on the topic (Magro 2005), clitic climbing is more productive in the northern varieties of European Portuguese. In this paper we discuss the findings of a comparative corpus analysis of Braga and Lisbon oral speech, and conclude that this is not the case. Our main claim is that clitic climbing is a case of stable variation in both varieties. By means of a multivariate analysis, we show that clitic climbing is more frequent than attachment of the clitic to the infinitive in the two varieties. Moreover, we discuss evidence in favour of the claim that this syntactic variation presents the same configuration in both varieties: it is lexically constrained (as shown by the fact that not all verbs that take infinitival complements allow for clitic climbing) and not socially marked. These results indicate that the phenomenon of clitic climbing is a stable property of the grammar of European Portuguese and should be studied as such.

7. What are your hopes for PORTULAN in the near future (e.g., what can PORTULAN do to help your research community)?

I believe that PORTULAN CLARIN will be of great help to linguists interested in modelling variation and change in Portuguese. Hopefully, corpora from other contemporary varieties of Portuguese will be included, as well as texts covering the diachrony of the language. But scholars in other fields will also certainly benefit from what PORTULAN CLARIN has to offer to research in the humanities.


