Goals and Objectives
Natural Language Processing ( ) is an interdisciplinary field whose goal is to create machines that understand natural languages.
This training material features: (i) an introduction to the key concepts and techniques of NLP; (ii) hands-on activities on some NLP tasks, such as lemmatization, part-of-speech tagging and named entity recognition using CLARIN- tools. The use case covered by the hands-on activities is particularly suited for trainees in the field of Digital Humanities given that the text is taken from a corpus of historical travel writings.
Learning Outcomes
After the first part of the lesson, trainees will acquire a basic knowledge of the key concepts and methods of NLP (e.g., pipelines, linguistic tasks, text annotation, machine learning, and evaluation metrics).
After the second part, trainees will be able to 1) automatically analyze texts with NLP tools; 2) choose the NLP tool that best fits their specific purposes among those provided by CLARIN-ERIC; 3) create a geographical visualization starting from a text processed with a Named Entity Recognition tool.
Author(s)
Description of the Training Materials
(Sub)discipline / topics & language(s) | Computational Linguistics, Natural language processing, Language Technology, Digital Humanities, Linguistics | Language: English |
Keywords | NLP, computational linguistics, linguistic annotation, language technology, digital humanities |
Workload | Designed for a full-day workshop: 6 hours. Divided in two parts: theory (3 hours), and hands-on session (3 hours). |
Project URL | https://zenodo.org/record/6798390#.YsQ3_i8QM04 |
CLARIN language resources and tools used |
|
Target audience | Students of humanities or general trainees interested in NLP. Given the introductory nature of the course, no prior specific computer skills are required, apart from basic skills (e.g. use of a web browser and use of spreadsheets). |
Facilities required |
All tools are accessible using web interfaces thus Internet access is fundamental. As for the software, a text editor and a spreadsheet editor (e.g. Excel, Google Sheets, Numbers, LibreOffice Calc) are required. Suggested text editors are SublimeText, JEdit, Atom. During the hands-on part of the class, some CLARIN-ERIC services are used. Teachers and trainees can check if they can log in to CLARIN services using their institutional account by accessing WebLicht.
The dataset used in the hands-on activity is available in the ILC4CLARIN repository. |
Format | PDF, PPTX, ODP, txt | The material includes slides and a file to be used during the hands-on activity. |
Course (s) in which the materials have been used | Summer school “Digital Tools for Humanists” organized in 2022 by the University of Pisa: http://digitaltools.labcd.unipi.it/. |
Licence and (re)use | The course materials are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. We ask that original materials are acknowledged in any reuse. |
Creation date | 10 June 2022 |
Last modification date | 5 July 2022 |
Experience with Using CLARIN Resources in Teaching
One of the most positive aspects of CLARIN services is the fact that they can be used online: this means that the tools are easily usable even by those without programming or command line skills. The interfaces are user-friendly and visual, suitable for non-expert users. In addition, installation problems that can slow down the lesson can be avoided.
By loading a text in the "Language Resource Switchboard" it is possible to immediately see the list of available tools: the display of the tools divided into tasks is particularly useful.
The tools available for English are numerous: unfortunately, for other languages such as Italian, the list is more limited for now. The hope is that more resources will soon be made available for languages other than English.
Reusability Notes
The theoretical part has an introductory nature and covers various aspects: depending on the time available to the teacher, it is possible to divide it into shorter lessons or to add material to deepen one or more topics. The practical part can be applied to a different text chosen by the teacher, or each student can apply the tools to a text of their choice.