Goals and Objectives
While corpus methods are widely used in linguistics, including gender analysis, this tutorial shows the potential of richly annotated language corpora for research of the socio-cultural context and changes over time that are reflected through language use. The tutorial encourages students and scholars of modern languages, as well as users from other fields of digital humanities and social sciences who are interested in the study of socio-cultural phenomena through language, to engage with user-friendly digital tools for the analysis of large text collections. The tutorial is designed in such a way that it takes full advantage of both linguistic annotations and the available speaker and text metadata to formulate powerful quantitative queries that are then further extended with manual qualitative analysis in order to ensure adequate framing and interpretation of the results.
The tutorial demonstrates the potential of parliamentary corpora research via concordancers without the need for programming skills. No prior experience in using language corpora and corpus querying tools is required in order to follow this tutorial. While the same analysis could be carried out on any parliamentary corpus with similar annotations and metadata, in this tutorial we will use the siParl 2.0 corpus which contains parliamentary debates of the National Assembly of the Republic of Slovenia from 1990 to 2018. Knowledge of Slovenian is not required to follow the tutorial. To reproduce the analyses in other languages, we invite you to explore a parliamentary corpus of your choice from those available through CLARIN.
Department of Translation, Faculty of Arts, University of Ljubljana
Description of the Training Materials
|(Sub)discipline, topic, language(s)||
Modern languages, digital humanities, social sciences, corpus query
|Keywords||parliamentary proceedings, parliamentary corpora, language and gender, digital humanities|
The tutorial demonstrates the potential of parliamentary corpora research via concordancers without the need for programming skills. No prior experience in using language corpora and corpus querying tools is required in order to follow this tutorial.
The tutorial encourages students and scholars of modern languages, as well as users from other fields of digital humanities and social sciences who are interested in the study of socio-cultural phenomena through language, to engage with user-friendly digital tools for the analysis of large text collections.
|Structure and duration||
This tutorial starts with a brief introduction to corpora and corpus analysis, followed by an introduction of the characteristics of specialised corpora of parliamentary debates and an overview of research into language and gender.
The second part of the tutorial is hands-on, which demonstrates the potential of some best-known corpus analysis techniques, such as concordances, frequency lists, keywords and collocations, to explore the topics female MPs debate in the Slovenian Parliament over time and to compare and contrast their language use with that of their male counterparts.
The resources and tools used in this tutorial are online and available under open licence. Corpus querying is demonstrated with the NoSketchEngine concordancer, while additional manual analysis and visualisation of the results will be performed in a spreadsheet editor (e.g., Google Spreadsheet or MS Excel). Screencasts, explanations of corpus querying procedures and links to the results are provided in blue boxes for anyone who wishes to reproduce the searches on their own.
The siParl 2.0 corpus can be queried online through the NoSketchEngine or KonText concordancers at CLARIN.SI, the Slovenian consortium of CLARIN , the European research infrastructure for language resources and technology. The siParl 2.0 corpus can also be downloaded from the CLARIN.SI repository and then further analysed with other corpus or text mining tools.
This tutorial is an updated version of the original tutorial, which was based on the previous version of the siParl corpus. In comparison to siParl 1.0, the siParl 2.0 corpus contains a richer and cleaner speaker and session metadata, which makes it possible to distinguish between members of parliament and other speakers. In addition, speeches have been labelled with parliamentary terms, which simplifies comparative analysis across different legislative periods. Furthermore, additional linguistic annotation layers, such as Universal Dependency features, syntactic parses and named entities, have also been added to the corpus, but since these will not be used in this tutorial, we do not elaborate on them further.
Based on the students' feedback, it takes about 5 hours to go through the tutorial.
|Course(s) in which the training material was used||
Assoc. Prof. Darja Fišer, Ph.D.,
Postgraduate school of the Research Centre of the Slovenian Academy of Sciences and Arts
|Licence and (re)use||Creative Commons Attribution Non-commercial - NoDerivative 4.0 International licence (CC BY-NC-ND 4.0)|
|Creation date||June 2021|
|Last modification date||June 2021|
Experience with Using CLARIN Resources in Teaching and Further Instructions
This particular corpus was a great resource to use for the teaching of a heterogeneous social sciences and humanities audience. The developers of the corpus were very responsive in explaining some unclear details about the structure of the corpus. They very quickly corrected a couple of small errors we identified in the process of creating the tutorial (e.g. wrong gender assignment to a few MPs etc. due to their ambiguous names).
The development of the tutorial requires a lot of time. In the process of making this tutorial, a new version of the corpus was released just after the first version of the tutorial had been published. The journal where the first version of the tutorial was published does not have an option to publish an updated version of the tutorial, so we had to publish it on another platform. It would be great if CLARIN could provide a platform that supports version control of the training material.
For the updated version of the tutorial, we had to perform the entire analysis (and subsequently also the interpretation of the results) from scratch, as the corpus had changed substantially. This also required the recording of the screencasts from scratch and the creation of new images and visualisations from scratch. Now that the updated tutorial has been published, we have been informed, that a major update of the user interface of the noSkE concordancer has been released, and the old version will no longer be maintained. This means the tutorial will need to be updated once again. These are very serious bottlenecks for the development of sustainable training materials and substantial resources are required for this, not only one-off but also for maintenance.
This is why it would be ideal for CLARIN to pool resources and ensure systematic and coordinated development of training materials that can then be maximally reused.
The tutorial guides us through basic corpus analysis techniques to show how they can be employed to uncover complex traces of gender stereotypes and inequality in the parliament. By following this tutorial, the users can gain insights into explicit and implicit gender-related discriminatory language practices and explore concrete examples showing how gender shapes language. The tutorial presents an innovative approach to researching gender-related topics in parliamentary discourse by combining corpus and sociohistorical analysis, and thus promotes cross-disciplinary cooperation, which makes it an especially valuable and timely contribution for the research community.
Tjaša Cankar, research assistant at the University of Ljubljana
This multimedia tutorial enables the users to acquire fundamental methodological and terminological knowledge needed for conducting high-quality corpus analysis. It also introduces the users to discourse analysis which can be at the core of the study or used only as a complementary analytical technique. The strong points of this tutorial are its systematic outline and accessible, yet highly informative descriptions that provide enough details also for the advanced users. The tutorial enables students and researchers to go from learning to doing basic corpus analysis in no time.
Mladen Zobec, doctoral student at the University of Graz
Additional Information and Resources
The tutorial is quite self-contained and is meant to be intended for self-study purposes. If they wish, students can start directly with the hands-on exercises in section 6. The first three sections (3, 4 and 5) are dedicated to a general theoretical overview and an introduction to the basic terminology. Even though students can perform the tasks in section 6 without studying these three introductory sections, we strongly encourage them to do so before finishing this tutorial. The sections provide students with the necessary theoretical foundations that will ensure a comprehensive understanding and independent use of the demonstrated analytical procedures and adequate interpretation of the results.
The previous version of the tutorial is available here.
Related materials are available on the PARTHENOS training portal.