The UPSKILLS Learning Content

Automatic speech recognition and forced alignment

🕑 3 ECTS (+ a student project amounting to 3 extra ECTS)

Description and scope

This learning block aims to make a bridge between students with a linguistic background and Automatic Speech Recognition as a technical topic. It is set up to present a transparent and balanced presentation of ideas, at appropriate levels of detail. That said, users should be aware that the ASR field is progressing rapidly, which means that information about the increasing substantial role of AI in ASR since 2020 is not covered in detail here.

The philosophy behind the block is that students should take an active approach in learning. To that end, the learning goals are aligned with activities, assessment and learning materials, which have been designed to stimulate the interest or curiosity of the students.

The block’s units are conceived in a modular way that allows lecturers and learners to take (or adapt) them either in sequence or as self-standing contents, depending on their needs. They are associated with a Moodle implementation with quizzes containing multiple-choice questions. This question bank is never final: lecturers are invited to refresh these questions on the basis of the need for new assessments and on the basis of the rapid progress in this research field.

Block outline

(the overall workload associated with the first 9 units of this block amounts to 3 ECTS)

The speech signal
Acoustic features
Bayes and Viterbi
Architectures of ASR (I)
Architectures of ASR (II)
Forced alignment as a special case of ASR
Data selection criteria / justification
Dialogue
Language models
Student project (3 ECTS)

Learning outcomes

Overall, the materials and activities present in this block will allow students to:

explain the basics of automatic speech recognition;
identify and define problems, critically examine them and break them down into manageable parts;
reason about possible approaches in specific conditions;
extract essential information to develop workable solutions to test.

Target audience

The primary target audience are lecturers who (want to) teach about automatic speech recognition as a technical topic. Students can also use the materials autonomously, but should be aware that this is not a typical self-study course.

Access on Moodle

Download