You are here

ParlaFormat Workshop

Thursday, 23 May, 2019 - 13:00 to Friday, 24 May, 2019 - 13:00

A workshop on a proposed standard format for parliamentary data was organised by the CLARIN Interoperability Committee in close collaboration with Tomaž Erjavec and Andrej Pančur from CLARIN Slovenia. 

You can read more about the workshop in the blog post written by Jan Odijk (Utrecht University, CLARIAH NL)


Agenda and presentation slides

Time Thursday 23 May
13:00-13:15 Jan Odijk: ParlaFormat Workshop: Introduction
12:00-13:30 Tomaž Erjavec & Andrej Pančur: Introduction to the proposed annotation scheme
14:00-14:15

Presentations and discussion:

Maciej Ogrodniczuk: The Polish Parliamentary Corpus. Encoding format

Maarten Marx: Political Mashup @ ParlaFormat Workshop

Petya Osenova: Parliamentary Speech: the case of Bulgarian Parliamentary Corpus

Dorte Haltrup Hansen: A TEI-encoding of Danish Parliament Corpus 

14:15-14:30 Coffee break
14:30-17:00

Presentations and discussion:

Stian Rødven Eide: Teaching computers to understand politicians, or: Semantic Analyses of Swedish Parliamentary Data

Roberts Darģis: Annotation of the Corpus of the Saeima with Multilingual Standards

Andreas Blätte & Christoph Leonhardt: The Framework For Parsing Plenary Protocols (frappp). Why parlaTEI matters

Arjan van Hessen & Henk van den Heuvel: Improving the workflow:A faster and better way to make the Hansard. A Hansard that has more options than just reading the text

Tanja Wissik: ParlAT: Corpus of Austrian Parliamentary Records

19:00 Dinner
   
Time Friday 24 May
09:00-10:30

Presentations and discussion:

Csaba Molnár: Legislative data on Hungarian

Giancarlo Luxardo: Enhancing interchange of annotated parliamentary debates

Adrien Barbaresi: German political speeches from the 21st c.

Monica Palmirani: Akoma Ntoso for Parliamentary Documents

10:30-11:00 Coffee break
11:00-11:30

Presentations and discussion:

Pjotr Bański: In search of the bedrock for teiParla: a view from the ISO-TEI perspective

Tomas Kralevičius & Vaidas Morkevičius: Working with Lithuanian Parliamentary Data

11:30-12:00 Tomaž Erjavec & Andrej Pančur: Response to the Presentations
12:00-13:00 General discussion & Conclusions + Sandwich Lunch
   

Overview of the workshop

1.     Workshop goal

The goal of the workshop is to propose an outline of a standard format for parliamentary data to the research community, to assess the support for it and to identify potential or real problems for its development and wide adoption. It is intended as a preparation for the work on the CLARIN endorsed proposal for a standard encoding of parliamentary data. If the workshop shows sufficient support for the proposed standard format, it may be followed in the future by a  shared task and accompanying workshop.
The standard format will be proposed, elaborated and documented by Tomaž Erjavec and Andrej Pančur. It is based on their treatment of the Slovenian parliamentary data as reported on at the ParlaCLARIN workshop at LREC, Japan in May 2018. Erjavec & Pančur used the TEI/Drama and TEI/Speech formats and created a converter from the former to the latter.  The encoded corpus is available from CLARIN.SI, and will serve as the basis for the proposed format. Tomaž and Andrej will draft and present the overall architecture of the standard and its ecosystem, and carry out the subsequent work in fleshing out the draft into a well-documented and exemplified proposal that can be easily used and further developed by other researchers.

The workshop participants will be invited to comment on their proposal from the perspective of the parliamentary data they work with and their research goals, so that we get a good picture of the support for this standard, and for any real or potential problems that would lead to problems in its use and adoption.

2.     Link with CLARIN’s strategic priorities

The topic of the proposed workshop is linked to CLARIN’s strategic priority to increase interoperability among datasets and tools, as well as to the strategic priority of creating more visibility for specific families of resources, and CLARIN’s User Involvement work plan for 2018 and 2019. The parliamentary data have been the focus of a recent workshop organised by CLARIN as part of this strategic priority, but standardisation of the formats used was not a topic at that workshop. Thus, the workshop proposed here is a natural follow-up of the ParlaCLARIN workshop, and of earlier workshops on parliamentary data organised by CLARIN (e.g. in Sofia, 2017).
The focus on parliamentary data is also very timely, as evidenced not only by the fact that parliamentary corpora have been chosen as one the CLARIN Resource Families and the success of the LREC Workshop, but also because the research community working on parliamentary data is organising itself and very active, as witnessed by the wide participation in several workshops on parliamentary data in the past years.

3.     The proposed Standard

The envisaged standard, here called teiParla, is based on the TEI Guidelines, as this recommendation has the following advantages:
it is long-lived, mature, and regularly maintained with a well-set up governance
it is human-readable and editable (unlike e.g. LOD), an important factor for humanities scholars
it is general, meaning that researchers can easily add additional TEI elements to the basic teiParla schema
it is accompanied by TEI Stylesheets, a collection of XSLT scripts allowing transformation of various legacy formats to TEI and from TEI to display oriented formats
there already exist parliamentary corpora, in particular by the proposers, that are encoded in TEI, giving a good baseline encoding.

TEI is composed of modules that cover different text types and analytic purposes. The teiParla schema will use the following modules:

The proposed standard briefly described above will be presented at the workshop. With the feedback from the participants, a revised proposal will be made in the months following the workshop, and when a stable version is available, we will try to organise a shared task to convert one’s data into the proposed standard and an accompanying workshop

4.     Participation

Participation is limited to 30 persons. Funded participation is limited to around 20 participants from CLARIN member and observer countries. 
Researchers who work with parliamentary data and are interested in the workshop can submit a request for participation. To that end, they will have to describe their interest, the data that they work with and the format(s) that they use in maximally 2 pages A4, and they have to show a willingness for active participation in the workshop. The request must be submitted via Easychair before the Submission Deadline.  The Programme Committee will evaluate the submissions and it will consult the national CLARIN coordinator of the researcher’s country to make its decision on acceptance or rejection of the request for participation, and to decide on funded or unfunded participation. (See https://www.clarin.eu/content/participating-consortia for an overview of the national coordinators per CLARIN member / observer).
 
If a participant is funded by CLARIN, this means that the costs for travel (cheapest option, up to maximally 225 euro), and maximally one night (up to maximally 125 euro) will be reimbursed.

5.     Important Dates:

Submission Deadline: Sunday March 17, 2019, 23:00 CET
Notification of Acceptance: Tuesday April 2, 2019
Workshop Dates: May 23 & May 24, 2019
Workshop Times: May 23, 13:00-17:00 hrs and May 24, 9:00-13:00 hrs  
Workshop Venue:  Rijksdienst voor het Cultureel Erfgoed (RCE)
Workshop City: Amersfoort
Easychair Submission URL: https://easychair.org/conferences/?conf=parlaformat2019 

6.     Programme Committee

The Programme Committee is equal to the CLARIN Interoperability Committee, and consists of the following members:

Krister Lindén, University of Helsinki, FIN-CLARIN
Nicolas Larrousse, CNRS, CLARIN FR
Monica Monachini, Institute for Computational Linguistics A. Zampolli, Italian National Research Council, CLARIN IT,
Jan Odijk, Utrecht University, CLARIAH NL (Chair)
Dieter Van Uytvanck, CLARIN ERIC

7.     Further Information

In case you have any questions about this workshop, please contact the Programme Committee via email events@clarin.eu with ParlaFormat in the Subject field.

 Agenda

Day

Start

End

Item

Presenters

23-May

13:00

13:15

Introduction

Jan Odijk: ParlaFormat Workshop: Introduction

13:15

14:15

Introduction to the proposed standard format

Tomaž Erjavec & Andrej Pančur: Introduction to the proposed annotation scheme

14:15

15:15

presentations + discussion

Maciej Ogrodniczuk: The Polish Parliamentary Corpus. Encoding format

Maarten Marx: Political Mashup @ ParlaFormat Workshop

Petya Osenova: Parliamentary Speech: the case of Bulgarian Parliamentary Corpus

Dorte Haltrup Hansen: A TEI-encoding of Danish Parliament Corpus 

15:15

15:45

coffee break

15:45

17:00

presentations + discussion

Stian Rødven Eide: Teaching computers to understand politicians, or: Semantic Analyses of Swedish Parliamentary Data

Roberts Darģis: Annotation of the Corpus of the Saeima with Multilingual Standards

Andreas Blätte & Christoph Leonhardt: The Framework For Parsing Plenary Protocols (frappp). Why parlaTEI matters

Arjan van Hessen & Henk van den Heuvel: Improving the workflow: A faster and better way to make the Hansard. A Hansard that has more options than just reading the text

Tanja Wissik: ParlAT: Corpus of Austrian Parliamentary Records

19:00

Dinner

Address: 
Rijksdienst voor het Cultureel Erfgoed (RCE)
Smallepad 5
3811 MG
Amersfoort
Netherlands