service

Nederlab, online laboratory for humanities research on Dutch text collections

Nederlab is a user-friendly and tool-enriched open access web interface that aims at containing all digitized texts relevant for the Dutch national heritage and the history of Dutch language and culture (c. 800 - present).

The Nederlab project aims to bring together all digitized texts relevant to Dutch national heritage, the history of Dutch language and culture (c. 800 -present) in one user-friendly and tool-enriched open access web interface, allowing scholars to simultaneously search and analyze data from texts spanning the full recorded history of the Netherlands, its language and culture. The project builds on various initiatives: for corpora Nederlab collaborates with the scientific libraries and institutions, for infrastructure with CLARIN (and CLARIAH), for tools with eHumanities programmes such as Catch, IMPACT and CLARIN (TICCL, frog).

Nederlab allows researchers to search and refine its content on basis of metadata, text and several layers of annotations for this text, such as lemmata, part-of-speech tags, named entities or syntactic annotations. These enrichments are added during a preprocessing stage that also applies automatic spelling normalization. Search results can of course be inspected one-by-one, via lists or keyword-in-context concordances, but also in several aggregated forms. For example, results can simultaneously be grouped on basis of publication date and genre and then displayed as visualisations or exported. Or they can be presented as collocations. Statistics about the result set are available as well, as are frequency lists over any subcollection. Search results can be stored as virtual collections in the researcher’s personal workspace. A range of tools will be available in this workspace to analyse the collections or to compare them to each other.

The first version of Nederlab was launched in early 2015, it’ll be expanded until the end of 2017.

CLARIN Centre

Meertens Instituut in collaboration with Huygens ING and the Institute for Dutch Lexicology.

Principal Investigator

prof. dr. Hans Bennis

Project Leader

Hennie Brugman

Country

Netherlands

Language

Dutch

Contact email

hennie.brugman@meertens.knaw.nl

Links

Acknowledgements

Nederlab is financed by NWO, KNAW, CLARIAH and CLARIN-NL.

OpenSONAR

OpenSoNaR is an online system that allows for analyzing and searching the large scale Dutch reference corpus SoNaR. Due to the size of the corpus (500 million words), accessing the information contained in the dataset has proven to be difficult for less technically inclined researchers. OpenSoNaR facilitates the use of the SoNaR corpus by providing a user-friendly online interface.

Background

SoNaR is a 500-million-word reference corpus of contemporary written Dutch for use in different types of linguistic (incl. lexicographic) and HLT research and the development of applications. The STEVIN funded SoNaR project (2008-2011) built on the results obtained in the D-Coi and Corea projects which were awarded funding in the first call of proposals within the STEVIN programme.

SONAR contains over 500 million words (i.e. word tokens) of full texts from a wide variety of text types including both texts from conventional media and texts from the new media. All texts except for texts from the social media (Twitter, Chat, SMS) have been tokenized, tagged for part of speech and lemmatized, while in the same set the Named Entities have been labelled. All annotations were produced automatically, no manual verification took place.

OpenSONAR is an online application for exploration of and searching in the SoNaR corpus. In the Exploration (Dutch: verken) interface you can look into the corpus distributions, request statistics from sub-corpora, retrieve n-grams from sub-corpora and search for specific documents using the SoNaR document ID. In the Search (Dutch: zoek) interface you can use four different search strategies: simple (simpel), extended (uitgebreid), advanced (geavanceerd) or expert (expert).

Due to the size of the SoNaR corpus the number of hits shown in OpenSONAR is limited to 8 million hits. If the results of your query exceeds this limit only the first 8,000,000 hits will be shown.

In OpenSONAR click the green question mark in the left upper corner for a guided tour (in Dutch).

CLARIN Centre

INL

Project leader

Dr. Martin Reynaert (Tilburg University)

Contact email

reynaert@uvt.nl

Links

Website and toolservice link

Manual

Acknowledgements

The SoNaR project was carried out by Katholieke Universiteit Leuven (CCL), Hogeschool Gent (Dept. Vertaalkunde, LT3), Radboud University Nijmegen (CLST), Tilburg University (TiCC/ILK), Twente University (HMI), and Utrecht University (UiL-OTS). It was coordinated by Radboud University.

service

Nederlab, online laboratory for humanities research on Dutch text collections

CLARIN Centre

Principal Investigator

Project Leader

Country

Language

Tags

OpenSONAR

Background

Tags

CLARIN – the research infrastructure for language as social and cultural data