GATE Training Course

Goals and Objectives  

The training materials are all based around teaching the use of GATE, a freely available open-source toolkit for Natural Language Processing that has been widely used in both academia and industry for many different tasks.

The modules provide instruction on how to get to grips with the GATE toolkit for basic language processing, as well as more advanced techniques, and include a number of different scenarios, such as processing social media, hate speech and misinformation detection. They include modules both for programmers who want to further develop their own tools within the toolkit, and for non-programmers who want to just make use of existing tools. The modules teach not only the use of GATE itself, but also how to adapt it to one’s own needs (for example, to adapt English tools to a different language, or how to customise existing tools), and also the basic concepts around a number of language processing tasks including both low-level (tokenisation, POS tagging, parsing) to more sophisticated (information extraction, social media analysis, hate speech detection, misinformation detection), as well as how to interpret and integrate the results of the processing. Finally, it teaches programmers how to extend the toolkit itself, by adding new tools or integrating it into other systems.

 

Author(s)

  • Diana Maynard, Senior Research Fellow/course organiser
  • Prof. Kalina Bontcheva, course co-organiser
  • Ian Roberts, Xingyi Song, Mark A. Greenwood, Mehmet Bakir,  Johann Petrak, Ye Jiang (additional course material providers)
Department of Computer Science, Faculty of Engineering
University of Sheffield
Sheffield, UK
 

Description of the Training Materials

(Sub)discipline, topic, language(s)

Natural Language Processing; social sciences; digital humanities; computer science; corpus linguistics

Training materials are in English

Keywords Natural Language Processing; Machine Learning; GATE; social media analysis; disinformation; online abuse detection; Python; Deep Learning; information extraction; digital humanities; corpus linguistics; annotation
CLARIN resources GATE toolkit for language processing
Course URL https://gate.ac.uk/wiki/TrainingCourseFeb2021/
Structure and duration The course comprises eleven modules. Each module consists of slides and practical exercises. Hands-on materials (corpora, tools, ready-made applications etc.) and slides are provided on the website for download. The course is designed for both real-time teaching and for self-driven e-learning. Most modules are three hours long when taught by an instructor, though some are shorter. The modules can be mixed and matched depending on relevance, though some have prerequisites that are clearly explained. The real-time course is taught by members of the GATE team and some time is allocated for the participants to try out the exercises, while further exercises are designed to be carried out by the participants in their own time. Further exercises are optional and are graded so that participants can choose depending on their skills and interests. The split between teaching and exercises varies depending on the module, but is typically around 60% teaching and 40% practical exercises. Videos of the real-time course are made available to participants for later self-study. The course can also be followed as open access without formal teaching, as all materials are designed for self-study from the materials provided.
Target audience The course includes modules both for programmers who want to further develop their own tools within the toolkit and for non-programmers who want to just make use of existing tools, e.g. social scientists, humanities researchers. It is designed equally for students, academic researchers and researchers from industry who may want to use the tools in their work or business. For the introductory modules, no specific skills are required beyond basic IT competence. For some programming modules, knowledge of Java and/or Python is required.
Facilities required

All materials are provided, though students can also use their own corpora or tools additionally. The software is freely available for download and needs to be installed as part of the course. GATE will work with any operating system (Windows/Linux/Mac)

1. Java 8 or later - we recommend AdoptOpenJDK (best choice for Windows users) or Azul Zulu (in particular for new Apple Silicon Macs) but any compatible OpenJDK or Oracle JDK should work
3. A text editor. Please note that Word and Windows Notepad are NOT suitable (they are not always able to handle files that were created on Linux/Mac); The jEdit Programmer's Text Editor is a good Java-based cross-platform alternative.

 
Format

Online training course comprising eleven modules. Each module consists of slides and practical exercises. Hands-on materials (corpora, tools, ready-made applications etc.) and slides are provided on the website for download. The course is designed for both real-time teaching and for self-driven e-learning.

Course(s) in which the training material was used

In addition to the annual GATE training course, parts of the materials have been used in:

Licence and (re)use

The course materials are licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. Users are free to:

  • share – copy and redistribute the material in any medium or format
  • adapt – remix, transform, and build upon the material

We ask that the GATE team is acknowledged in any reuse of the material for teaching purposes.

Creation date 20 July 2009
Last modification date 15 February 2021
 

Experience with Using CLARIN Resources in Teaching 

The CLARIN resource we use is the one we have developed ourselves, the GATE NLP toolkit. We have found it to be a fantastic way to explain NLP in simple terms to people from other communities, such as social sciences, humanities and computer scientists. In particular, it offers an easy visual way to explain tools and techniques and to provide a way for people to use NLP without having to program or see inside the box if they don't want to. It also offers programmers plenty of scope for modifying the tools themselves. Many people come to the course with their own ideas of how to use it and find that they are easily able to integrate their own ideas, data and even tools and make real progress with a little bit of guidance, and the feedback has been very positive. There has been increasing demand for more training videos, which is something we’re working on in the future to supplement our existing materials.
 

Students' Testimonials

I have a fair amount of experience in NLP, but there were gaps in my knowledge, and I wanted to switch to GATE from the current tools I was using. A custom GATE training course designed around my specific needs, using my own data sets, turned out to be the perfect solution. My instructor did a top-notch job, with excellent in-advance preparation. I am a very satisfied student! -- Larry Rafsky (January 2021)

Download Information

All the materials can be downloaded from the course website.
 

Additional Information and Resources

We recommend our accompanying book for additional context around the materials and tools, as well as providing further background NLP knowledge and explaining how GATE compares with alternative tools:

  • D. Maynard, K. Bontcheva, I. Augenstein. Natural Language Processing for the Semantic Web. Morgan and Claypool, December 2016. ISBN: 9781627059091 Morgan and Claypool link

Cite this Work

D. Maynard, K. Bontcheva, I. Roberts, X. Song, M. A. Greenwood, M. Bakir, J. Petrak, Y. Jiang (2021). GATE Training Course, https://gate.ac.uk/wiki/TrainingCourseFeb2021/.
 

Contact Information

Teachers who reuse and adapt this training material are invited to share their feedback via training@clarin.eu