The UPSKILLS Learning Content

The essence of machine learning for linguists in tech

      🕑   5 ECTS

Description and scope

This learning block is a guide on how to acquire the core set of notions in machine learning that are necessary for students of language or linguists who plan to work with engineers and scientists, or anyone with a similar background and interests..

Its intended use is supervised study, whereby a student learns actively under the supervision of a teacher. To complete all the steps in the guide, students will need to invest around 150 hours of active learning, which includes various activities: reading, watching videos, installing and setting up programs, taking quizzes and writing programs. Note that basic knowledge of Python programming (like the one covered in our own Start programming with Python in 10 steps)  is a prerequisite for this guide. 

The teacher should give students feedback on their progress according to a mutual agreement. The minimum required feedback from the teacher are solutions to some of the tasks. We will share these solutions with the teacher on request. It is in your best interest not to see the solutions before you solve the tasks yourself.

Block outline

(the overall workload associated with this block amounts to 5 ECTS distributed among the following units)

  1. Machine learning needs linguists
  2. Things in mathematical space
  3. Data points need labels
  4. Setting boundaries with the perceptron algorithm
  5. Linguists need neural networks
  6. Meanings are vectors
  7. Learning meanings with (large) language models
  8. NLP tasks for happy users
  9. How good is an NLP model?
  10. The practice and ethics of large language models (LLMs) 
Learning outcomes

Overall, the materials and activities present in this block will allow students to:

  • explain language representation in mathematical terms;
  • test and evaluate machine learning algorithms;
  • explain (intuitively) how neural networks process natural language;
  • argue about advantages and disadvantages of computational modelling of natural language. 
Target audience

The primary target audience are lecturers who (want to) teach about machine learning to students of linguistics, translation and other language-related areas. Students can also use the materials autonomously, but should be aware that this is not a typical self-study course.

Creative Commons License

This UPSKILLS learning content block is licensed under a Creative Commons Attribution 4.0 International License.

Block designers

Tanja Samardžić

Lonneke van der Plas

Marc Tanti