This online course offers basic programming knowledge and skills for text processing and analysing tabular data related to linguistics and language studies.
The course mainly targets the Humanities and Social Sciences undergraduate students who want to acquire hands-on knowledge and skills in working with language data. No previous programming experience is required, but some elementary knowledge and skills in the practical use of your computer, keyboard, web browser and file system, as well as some basal knowledge of linguistics and mathematics, are presupposed.
The core of the course consists of a series of Jupyter notebooks that combine examples of Python programs with explanatory text. The notebooks demonstrate simple language processing, some of which reuse data from real linguistics projects, CLARINO, or other sources. They also suggest exercises which should be possible to solve based on the given examples and explanations. Ideally, the course should be presented by a teacher, and the exercises should be supervised, but the modules are also designed for self-study.
Learning Outcomes
By the end of the course, learners will be able to:
- Write and interpret small programs that search, tokenize, count, sort, visualize, or otherwise process textual data and tabular data related to language research.
See the full metadata.
Author: Koenraad De Smedt
University: LLE, University of Bergen, Norway
(Sub)discipline(s): Linguistics and language studies, or any discipline that uses textual materials
Topic (s): Jupyter Notebooks, Python, text processing
Language (s): English
Keywords: Python, text processing
Workload: About 24 teaching hours plus 24 exercise hours when used in a classroom setting.
Resource Type: Web-based module which can be used in the classroom or for self-study.
Resource URL Type: URL
CLARIN Language Resources used in the course:
Target Audience: Bachelor’s level students with a basic knowledge of linguistics and basic computer literacy, but previous programming experience is not necessary.
Expertise (Skills) Level: Beginner/ intermediate
Facilities Required: Web browser and platform for running Jupyter notebooks. The course materials are optimized for Google Collaboratory but can easily be adapted to other platforms.
Format: Web-based modules, including Google Colab notebooks.
University Course(s) in which the materials have been used: These online materials are used in
Language and Computers (
https://www.uib.no/en/course/LING123) in the Bachelor’s program in Linguistics at the University of Bergen.
Creation Date: June 1, 2022
Last Modification Date: May 25, 2023 (The course may be updated anytime.)
Experience with Using CLARIN Infrastructure in Teaching: CLARIN resources are suitable as example data for learning how to search, tokenize, count, sort, visualize and otherwise process textual and tabular data.
Reusability Notes for Teachers and Trainers: When teaching inexperienced students, these materials are best used under the guidance of a teacher who lectures, demonstrates the programs, answers questions and supervises the exercises. More experienced students can use the materials for self-study.
Contact Information: If you use this learning resource for either self-study or as a classroom tutorial, please share your experience at training [at] clarin.eu (training[at]clarin[dot]eu).
Related Learning Resources