SSHOC Workshop: Using Corpora for Implementing Validation. Workflows that combine quantity and quality



Prof. Dr. Andreas Blätte and Christoph Leonhardt

Description & aims of the workshop

The formula to „validate, validate, validate“ findings from quantitative corpus analysis (Grimmer/Stewart 2013) has become a staple in discussion on the potentials and perils of quantitative approaches to text and is also central to the SSHOC project (Social Sciences & Humanities Open Cloud). However, technically, restrictions to implement validation remain quite high. Usually, dedicated resources for setting up and maintaining a server-based environment with a graphical user interface are still required. Lowering the costs of integrating quantitative and qualitative steps in workflows using corpora in social science research designs is a major objective of Research Infrastructures such as CLARIN, and of the polmineR package, an open source R package available at the Comprehensive R Archive Network / CRAN.

In this workshop, we introduce the polmineR package and explore three basic scenarios using it:

  1. Validating the results obtained from dictionary-based sentiment analysis and classification, 
  2. Validating the results of LDA topic modelling,
  3. Giving substantial meaning to the results of cooccurrence analyses. 

We will discuss whether to potentially combine the scenarios with semi-supervised learning, and how to leverage of machine learning (MI) approaches. As the dataset and tool combination, we will use the polmineR R package in combination with a multilingual corpus of the UN General Assembly.


The workshop is intended for political and social scientists who are interested in using large text collections in their research. No programming skills are needed but a general familiarity with basic statistical operations on texts will be helpful. Please bring your own laptop for the hands-on session.

SSHOC involvement

This workshop addresses the challenges that specific user communities experience when contributing to SSHOC, the availability of procedures, tools and services to address these challenges, and the extent that these procedures, tools and services are sufficiently applicable for specific user communities, which is one of the major goals of the SSHOC project, tackled within WP9.

Practical details

  • Registration link: The number of participants of the workshop is limited to 25. We kindly ask you to register as soon as possible. The registration will be closed when the limit is reached.
  • Workshop attendance is free of charge. Refreshments (coffee breaks and lunch) are provided by the organisers. 
  • This workshop is co-located with CLARIN Annual Conference. Workshop participants who are interested in attending the CLARIN Annual Conference should contact conference organisers (via to check whether this would be feasible. 

University of Leipzig
The Paulinum – Assembly Hall and University Church of St. Paul, Augustusplatz 10