Natural Language Processing Methods

  

Goals and Objectives  

Natural Language Processing ( ) is an interdisciplinary field whose goal is to create machines that understand natural languages. 

This training material features: (i) an introduction to the key concepts and techniques of NLP; (ii) hands-on activities on some NLP tasks, such as lemmatization, part-of-speech tagging and named entity recognition using CLARIN- tools. The use case covered by the hands-on activities is particularly suited for trainees in the field of Digital Humanities given that the text is taken from a corpus of historical travel writings.

Learning Outcomes

After the first part of the lesson, trainees will acquire a basic knowledge of the key concepts and methods of NLP (e.g., pipelines, linguistic tasks, text annotation, machine learning, and evaluation metrics). 

After the second part, trainees will be able to 1) automatically analyze texts with NLP tools; 2) choose the NLP tool that best fits their specific purposes among those provided by CLARIN-ERIC; 3) create a geographical visualization starting from a text processed with a Named Entity Recognition tool.

Author(s)

Rachele Sprugnoli

Researcher
Dipartimento di Discipline Umanistiche, Sociali e delle Imprese Culturali – Università degli Studi di Parma, Via D’Azeglio 85, 43125 Parma, Italy
 

Description of the Training Materials

(Sub)discipline / topics & language(s)

Computational Linguistics, Natural language processing, Language Technology, Digital Humanities, Lnguistics | Language: English

Keywords

NLP, computational linguistics, linguistic annotation, language technology, digital humanities

Workload Designed for a full-day workshop: 6 hours. Divided in two parts: theory (3 hours), and hands-on session (3 hours).
Project URL https://zenodo.org/record/6798390#.YsQ3_i8QM04
CLARIN language resources and tools used
  • Tools available in the Language Resource Switchboard: WebLicht, UDPipe, NameTag, NLPHub;
  • Repository: ILC4CLARIN;
  • Treebank interface: TüNDRA;
  • Geolocation and visualization: DARIAH-DE Geo-Browser (accessible with CLARIN-ERIC credentials)
Target audience

Students of humanities or general trainees interested in NLP. Given the introductory nature of the course, no prior specific computer skills are required, apart from basic skills (e.g. use of a web browser and use of spreadsheets).

Facilities required

All tools are accessible using web interfaces thus Internet access is fundamental.

As for the software, a text editor and a spreadsheet editor (e.g. Excel, Google Sheets, Numbers, LibreOffice Calc) are required. Suggested text editors are SublimeText, JEdit, Atom

During the hands-on part of the class, some CLARIN-ERIC services are used. Teachers and trainees can check if they can log in to CLARIN services using their institutional account by accessing WebLicht.

  • Here is a tutorial on logging in with institutional credentials:
  • If teachers and trainees cannot log in with any institutional account, they can ask for new CLARIN credentials.

The dataset used in the hands-on activity is available in the ILC4CLARIN repository.

Format

PDF, PPTX, ODP, txt | The material includes slides and a file to be used during the hands-on activity.

Course (s) in which the materials have been used Summer school “Digital Tools for Humanists” organized in 2022 by the University of Pisa: http://digitaltools.labcd.unipi.it/.
Licence and (re)use

The course materials are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. We ask that original materials are acknowledged in any reuse.

Creation date

10 June 2022

Last modification date  5 July 2022
 

Experience with Using CLARIN Resources in Teaching 

One of the most positive aspects of CLARIN services is the fact that they can be used online: this means that the tools are easily usable even by those without programming or command line skills. The interfaces are user-friendly and visual, suitable for non-expert users. In addition, installation problems that can slow down the lesson can be avoided.

By loading a text in the "Language Resource Switchboard" it is possible to immediately see the list of available tools: the display of the tools divided into tasks is particularly useful. 

The tools available for English are numerous: unfortunately, for other languages such as Italian, the list is more limited for now. The hope is that more resources will soon be made available for languages other than English.

Reusability Notes 

The theoretical part has an introductory nature and covers various aspects: depending on the time available to the teacher, it is possible to divide it into shorter lessons or to add material to deepen one or more topics. The practical part can be applied to a different text chosen by the teacher, or each student can apply the tools to a text of their choice.

Cite this Work

Rachele Sprugnoli. (2022, July 5). Natural Language Processing Methods. Zenodo. https://doi.org/10.5281/zenodo.6798390
 

Contact Information

Teachers who reuse and adapt this training material are invited to share their feedback via training@clarin.eu.