Picture courtesy of Cecília Magalhães
Thorsten Trippel (CLARIN-D) has written a detailed report about the
ESU2018 took place in Leipzig, Germany from 17 to 27 July 2018.
The goal of the event
The goal of the Summer School was to provide an intensive course programme for students and scholars in the digital humanities in their early career stages to receive additional training in digital methods and tools. The event took place in Leipzig,
Contributions of the event to strategic goals of CLARIN ERIC
CLARIN provides data, services and competences in the field of digital humanities and wants to reach out to users in that field to apply the NLP tools, data and utilize services. Additionally, CLARIN wants to support networks of scholars, especially transnationally – this summer school fosters such networks.
Detailed description of the content of the workshops
Workshop Asking questions to data in the humanities: right, correct, efficient (Introducing and comparing XQuery, SQL, SPARQL for data from the humanities)
offered by Christoph Draxler (CLARIN centre in Munich) and Thorsten Trippel (CLARIN centre Tübingen)
The amount of data in the digital humanities and its complexity is growing continuously. Modern database storage and access technologies are needed to handle this data. This course gave an introduction to three relevant technologies: relational databases and SQL, XQuery for XML-formatted data, and graph databases for highly-interconnected data.
Relational databases organize their data in simple tables. SQL is the standard query language to search for and extract data from the database. The technology is mature, there are many excellent database systems available, most programming languages and application programs provide easy access to relational databases.
XML is a language to describe the structure of documents as a hierarchy. XQuery is the standard language to query XML single documents or document collections to search for and extract content from these documents. XML is used for large corpora of text, it is supported by many programming languages, there are numerous application programs and editors for XML data, and it is often used in web-based environments.
Graph databases are a relatively new development. In this workshop, data is seen as information nodes, and the nodes are linked via named arcs. Graphs are highly dynamic and thus well suited in the exploration phase when working with corpora.
For the introduction to these technologies, sample data was used to compare the different query methods by using the same underlying information in all paradigms. In course projects, participants worked with sample data provided by us or their own data. By looking at the data, ways of asking questions to the data and then try to express them in the query language(s), were discussed.
After working on the basics of SQL and XQuery, introducing the basic concepts and syntax of the query languages SQL (for relational databases) and XQuery (for XML data) the workshop looked at advanced constructions in SQL and XQuery. Additionally, we will look at graph databases. Following this introduction, we examined how to find questions based on the data, select the appropriate formalism and express the question in the query language. We also looked at applying XQuery to query TEI documents. Key concepts of this workshop were inserting, updating, deleting data, SQL's stored procedures and XQuery's user defined functions, graph databases, SPARQL, application of query languages to participant's own research questions.
Workshop Reflected Text Analysis in the Digital Humanities
offered by the CLARIN centre in Stuttgart and taught by Sarah Schulz and Nils Reiter.
The course introduced the concept of reflected text analytics, and covered various relevant topics to that end. The core idea was to "lift the veil": The participants learned both theoretical concepts and their practical implementation on real code, such that they are able to apply the learned concepts on their own research questions. Topics of the workshop were: Annotation and concept development through annotation, programming with python for text processing and machine learning, machine learning in theory and application. Participants worked on their own programs and data, wrote code and trained models by themselves (under guidance).
Workshop The humanities scholar's perspective on rule based machine translation
held by Tommi A. Pirinen (CLARIN centre Hamburg)
Rule-based machine translation is an interesting application for natural language processing as well as digital humanities, for the reason that it spans over so many of the topics and concepts of NLP and DH. It can therefore be included in so many DH and NLP work-flows, producing necessary resources, such as dictionaries, digital grammars, as a side-product of research work in digital texts and corpora. During this course the participants created a simple rule-based machine translation system (based on Apertium) that is capable of translating one short text from one language to another. We will learn to write necessary dictionaries in an XML-based format, use version control software, and participate in open source development community. The schedule contained following topics:
- Intro to Machine Translation
- Installing the platform and tools
- Working with the tools
- XML basics
- Digital lexicography and morphology
- Word-based translations
- Phrase parsing / chunking
- Re-ordering and grammatical changes
- Evaluation and Quality Assurance
- Comparative grammars
- Connecting to large coverage dictionaries
Summary of the event
CLARIN-D has been approached continuously to contribute with classes to the international summer school in digital humanities "Culture & Technology" – The European Summer University in Digital Humanities. CLARIN-D accepted the invitation and supported international participation. As the 60-70 participants were from the humanities and in their qualification phase, some being advanced master students, others being graduate students, usually with an international background, this was an outstanding opportunity for CLARIN to create visibility in the user community, receiving practical feedback and educating talented young scholars in the use of CLARIN tools, services and data. As the target group is international, we applied for additional funding from CLARIN ERIC to support travel costs of the experts sent by CLARIN.
Five participants have submitted blogs about their experience at the ESU 2018. These have been published in the CLARIN-D blog:
- Viviana Pezzullo: https://www.clarin-d.net/de/blog-clarin-d/70-esu-viviana
- Cecília Magalhães: https://www.clarin-d.net/de/blog-clarin-d/69-erfahrungsbericht-cecilia-magalhaes
- Laura Ivaska: https://www.clarin-d.net/de/blog-clarin-d/68-erfahrungsbericht-laura-ivaska
- Linda Brandt: https://www.clarin-d.net/de/blog-clarin-d/67-esu-linda-brandt
- Erdal Ayan: https://www.clarin-d.net/de/blog-clarin-d/66-esu-erdal-ayan
Links to training materials developed for the event
- Christoph Draxler (Ludwig-Maximilians-Universität Munich, Germany) / Thorsten Trippel (Eberhard Karls Universität Tübingen, Germany): Asking questions to data in the humanities: right, correct, efficient (Introducing and comparing XQuery, SQL, SPARQL for data from the humanities) [http://hdl.handle.net/11022/0000-0007-CAED-B]
- Nils Reiter / Sarah Schulz (Universität Stuttgart, Germany): Reflected Text Analysis in the Digital Humanities [http://www.culingtec.uni-leipzig.de/ESU_C_T/node/940]
- Tommi A. Pirinen (Universität Hamburg, Germany): The humanities scholar's perspective on rule based machine translation [http://www.culingtec.uni-leipzig.de/ESU_C_T/node/947]
Potential impact and next steps
CLARIN wants to continue the tradition of this summer school, including a stronger involvement of European CLARIN partners, CLARIN involvement in the programme committee by representatives from CLARIN-D and CLARIN ERIC, broadening the focus to include additional CLARIN tools and services and utilize the summer school for training of DH researchers.
Plans for organizing similar events
Due to the success of this event and of the past Summer Schools, CLARIN-D intends to contribute to next year’s summer school as well, possibly with different courses. Classes that address core features of CLARIN such as Data Management, Metadata creation and development or reuse of existing data, CLARIN webservices need to be embedded into an issue description, demonstrating the relevance for DH researchers.
Information about the organizing team
Summer school chair: Elisabeth Burr, Leipzig, Germany
CLARIN involvement coordinated by:Erhard Hinrichs (National Coordinator, Germany), Thorsten Trippel (Liaison Coordinator, Germany)
Elisabeth Burr and her team have been organizing this event almost on a yearly basis since 2009. CLARIN-D is involved since 2014, providing courses for the Summer School and offering organizational support.
List of CLARIN-D lecturers (with courses taught)
- Christoph Draxler (Ludwig-Maximilians-Universität Munich, Germany) / Thorsten Trippel (Eberhard Karls Universität Tübingen, Germany): Asking questions to data in the humanities: right, correct, efficient (Introducing and comparing XQuery, SQL, SPARQL for data from the humanities)
- Nils Reiter / Sarah Schulz (Universität Stuttgart, Germany): Reflected Text Analysis in the Digital Humanities
- Tommi A. Pirinen (Universität Hamburg, Germany): The humanities scholar's perspective on rule based machine translation
The following workshops had to be cancelled:
- Isabel Fuhrmann (Berlin-Brandenburgische Akademie der Wissenschaften Berlin, Germany) / Erhard Hinrichs/ Yana Strakatova (Universität Tübingen, Germany): Collocations from a multilingual perspective: theory, tools, and applications (1st week)
- Jochen Tiepmar (ScaDS, University of Leipzig / University of Dresden, Germany): Text Mining with Canonical Text Services (2nd week)
ESU website: http://www.culingtec.uni-leipzig.de/ESU_C_T/