A workshop on a proposed standard format for parliamentary data is organised by the CLARIN Interoperability Committee in close collaboration with Tomaž Erjavec and Andrej Pančur from CLARIN Slovenia.
1. Workshop goal
The goal of the workshop is to propose an outline of a standard format for parliamentary data to the research community, to assess the support for it and to identify potential or real problems for its development and wide adoption. It is intended as a preparation for the work on the CLARIN endorsed proposal for a standard encoding of parliamentary data. If the workshop shows sufficient support for the proposed standard format, it may be followed in the future by a shared task and accompanying workshop.
The standard format will be proposed, elaborated and documented by Tomaž Erjavec and Andrej Pančur. It is based on their treatment of the Slovenian parliamentary data as reported on at the ParlaCLARIN workshop at LREC, Japan in May 2018. Erjavec & Pančur used the TEI/Drama and TEI/Speech formats and created a converter from the former to the latter. The encoded corpus is available from CLARIN.SI, and will serve as the basis for the proposed format. Tomaž and Andrej will draft and present the overall architecture of the standard and its ecosystem, and carry out the subsequent work in fleshing out the draft into a well-documented and exemplified proposal that can be easily used and further developed by other researchers.
The workshop participants will be invited to comment on their proposal from the perspective of the parliamentary data they work with and their research goals, so that we get a good picture of the support for this standard, and for any real or potential problems that would lead to problems in its use and adoption.
2. Link with CLARIN’s strategic priorities
The topic of the proposed workshop is linked to CLARIN’s strategic priority to increase interoperability among datasets and tools, as well as to the strategic priority of creating more visibility for specific families of resources, and CLARIN’s User Involvement work plan for 2018 and 2019. The parliamentary data have been the focus of a recent workshop organised by CLARIN as part of this strategic priority, but standardisation of the formats used was not a topic at that workshop. Thus, the workshop proposed here is a natural follow-up of the ParlaCLARIN workshop, and of earlier workshops on parliamentary data organised by CLARIN (e.g. in Sofia, 2017).
The focus on parliamentary data is also very timely, as evidenced not only by the fact that parliamentary corpora have been chosen as one the CLARIN Resource Families and the success of the LREC Workshop, but also because the research community working on parliamentary data is organising itself and very active, as witnessed by the wide participation in several workshops on parliamentary data in the past years.
3. The proposed Standard
The envisaged standard, here called teiParla, is based on the TEI Guidelines, as this recommendation has the following advantages:
it is long-lived, mature, and regularly maintained with a well-set up governance
it is human-readable and editable (unlike e.g. LOD), an important factor for humanities scholars
it is general, meaning that researchers can easily add additional TEI elements to the basic teiParla schema
it is accompanied by TEI Stylesheets, a collection of XSLT scripts allowing transformation of various legacy formats to TEI and from TEI to display oriented formats
there already exist parliamentary corpora, in particular by the proposers, that are encoded in TEI, giving a good baseline encoding.
TEI is composed of modules that cover different text types and analytic purposes. The teiParla schema will use the following modules:
- Transcriptions of Speech (base text type)
- Names, Dates, People, and Places (for detailed markup of these entities)
- Language Corpora (for forming a collection of records as a language corpus)
- Linking, Segmentation, and Alignment (for interconnecting elements)
- Simple Analytic Mechanisms (for automatic linguistic markup)
The proposed standard briefly described above will be presented at the workshop. With the feedback from the participants, a revised proposal will be made in the months following the workshop, and when a stable version is available, we will try to organise a shared task to convert one’s data into the proposed standard and an accompanying workshop
Participation is limited to 30 persons. Funded participation is limited to around 20 participants from CLARIN member and observer countries.
Researchers who work with parliamentary data and are interested in the workshop can submit a request for participation. To that end, they will have to describe their interest, the data that they work with and the format(s) that they use in maximally 2 pages A4, and they have to show a willingness for active participation in the workshop. The request must be submitted via Easychair before the Submission Deadline. The Programme Committee will evaluate the submissions and it will consult the national CLARIN coordinator of the researcher’s country to make its decision on acceptance or rejection of the request for participation, and to decide on funded or unfunded participation. (See https://www.clarin.eu/content/participating-consortia for an overview of the national coordinators per CLARIN member / observer).
If a participant is funded by CLARIN, this means that the costs for travel (cheapest option, up to maximally 225 euro), and maximally one night (up to maximally 125 euro) will be reimbursed.
5. Important Dates:
Submission Deadline: Sunday March 17, 2019, 23:00 CET
Notification of Acceptance: Tuesday April 2, 2019
Workshop Dates: May 23 & May 24, 2019
Workshop Times: May 23, 13:00-17:00 hrs and May 24, 9:00-13:00 hrs
Workshop Venue: Utrecht University
Workshop City: Utrecht
Easychair Submission URL: https://easychair.org/conferences/?conf=parlaformat2019
6. Programme Committee
The Programme Committee is equal to the CLARIN Interoperability Committee, and consists of the following members:
Krister Lindén, University of Helsinki, FIN-CLARIN
Nicolas Larrousse, CNRS, CLARIN FR
Monica Monachini, Institute for Computational Linguistics A. Zampolli, Italian National Research Council, CLARIN IT,
Jan Odijk, Utrecht University, CLARIAH NL (Chair)
Dieter Van Uytvanck, CLARIN ERIC
7. Further Information
In case you have any questions about this workshop, please contact the Programme Committee via email email@example.com with ParlaFormat in the Subject field.