Goals and Objectives
The training materials are all based around teaching the use of GATE, a freely available open-source toolkit for Natural Language Processing that has been widely used in both academia and industry for many different tasks.
The modules provide instruction on how to get to grips with the GATE toolkit for basic language processing, as well as more advanced techniques, and include a number of different scenarios, such as processing social media, hate speech and misinformation detection. They include modules both for programmers who want to further develop their own tools within the toolkit, and for non-programmers who want to just make use of existing tools. The modules teach not only the use of GATE itself, but also how to adapt it to one’s own needs (for example, to adapt English tools to a different language, or how to customise existing tools), and also the basic concepts around a number of language processing tasks including both low-level (tokenisation, POS tagging, parsing) to more sophisticated (information extraction, social media analysis, hate speech detection, misinformation detection), as well as how to interpret and integrate the results of the processing. Finally, it teaches programmers how to extend the toolkit itself, by adding new tools or integrating it into other systems.
- Diana Maynard, Senior Research Fellow/course organiser
- Prof. Kalina Bontcheva, course co-organiser
- Ian Roberts, Xingyi Song, Mark A. Greenwood, Mehmet Bakir, Johann Petrak, Ye Jiang (additional course material providers)
Description of the Training Materials
|(Sub)discipline, topic, language(s)||
Natural Language Processing; social sciences; digital humanities; computer science; corpus linguistics
Training materials are in English
|Keywords||Natural Language Processing; Machine Learning; GATE; social media analysis; disinformation; online abuse detection; Python; Deep Learning; information extraction; digital humanities; corpus linguistics; annotation|
|CLARIN resources||GATE toolkit for language processing|
|Structure and duration||The course comprises eleven modules. Each module consists of slides and practical exercises. Hands-on materials (corpora, tools, ready-made applications etc.) and slides are provided on the website for download. The course is designed for both real-time teaching and for self-driven e-learning. Most modules are three hours long when taught by an instructor, though some are shorter. The modules can be mixed and matched depending on relevance, though some have prerequisites that are clearly explained. The real-time course is taught by members of the GATE team and some time is allocated for the participants to try out the exercises, while further exercises are designed to be carried out by the participants in their own time. Further exercises are optional and are graded so that participants can choose depending on their skills and interests. The split between teaching and exercises varies depending on the module, but is typically around 60% teaching and 40% practical exercises. Videos of the real-time course are made available to participants for later self-study. The course can also be followed as open access without formal teaching, as all materials are designed for self-study from the materials provided.|
|Target audience||The course includes modules both for programmers who want to further develop their own tools within the toolkit and for non-programmers who want to just make use of existing tools, e.g. social scientists, humanities researchers. It is designed equally for students, academic researchers and researchers from industry who may want to use the tools in their work or business. For the introductory modules, no specific skills are required beyond basic IT competence. For some programming modules, knowledge of Java and/or Python is required.|
All materials are provided, though students can also use their own corpora or tools additionally. The software is freely available for download and needs to be installed as part of the course. GATE will work with any operating system (Windows/Linux/Mac)
1. Java 8 or later - we recommend AdoptOpenJDK (best choice for Windows users) or Azul Zulu (in particular for new Apple Silicon Macs) but any compatible OpenJDK or Oracle JDK should work
3. A text editor. Please note that Word and Windows Notepad are NOT suitable (they are not always able to handle files that were created on Linux/Mac); The jEdit Programmer's Text Editor is a good Java-based cross-platform alternative.
Online training course comprising eleven modules. Each module consists of slides and practical exercises. Hands-on materials (corpora, tools, ready-made applications etc.) and slides are provided on the website for download. The course is designed for both real-time teaching and for self-driven e-learning.
|Course(s) in which the training material was used||
In addition to the annual GATE training course, parts of the materials have been used in:
|Licence and (re)use||
The course materials are licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. Users are free to:
We ask that the GATE team is acknowledged in any reuse of the material for teaching purposes.
|Creation date||20 July 2009|
|Last modification date||15 February 2021|
Experience with Using CLARIN Resources in Teaching
Additional Information and Resources
We recommend our accompanying book for additional context around the materials and tools, as well as providing further background NLP knowledge and explaining how GATE compares with alternative tools:
- D. Maynard, K. Bontcheva, I. Augenstein. Natural Language Processing for the Semantic Web. Morgan and Claypool, December 2016. ISBN: 9781627059091 Morgan and Claypool link