The UPSKILLS Learning Content

Introduction to language data: Standards and repositories

      🕑  6 ECTS (+ a student project amounting to 1 or 2 extra ECTS)

Description and scope

The aim of this learning block is to provide lecturers in BA/MA language-related programmes with a pool of learning resources and activities that they can use in the classroom to introduce students to research data repositories and their role in the linguistic research data lifecycle in the context of Open Science and FAIR data principles.

The philosophy behind the block is that students should take an active approach in learning. We provide a discussion of key concepts accompanied by examples and practical activities that will guide students towards their deeper understanding. 

The block’s units are conceived in a modular way that allows lecturers and learners to take (or adapt) them either in sequence or as self-standing contents, depending on their needs. The units comprise interactive presentations and learning activities, examples of assignments and hands-on tutorials, demonstrating how research data repositories can be used to discover, process, analyse, share, publish and archive language research data.

Block outline

(the overall workload associated with the first 5 units of this block amounts to 6 ECTS)

  1. Introduction to the Language Resource Lifecycle and Management
  2. How Research Data Repositories Help Make Language Data FAIR
  3. Finding and (Re)using Language Resources in the CLARIN repositories
  4. Citing Language and Linguistic Data
  5. Legal and Ethical Issues in Language Data Collection, Sharing and Archiving
  6. Student project – Designing, compiling and archiving a corpus of bank bulletins (1 or 2 ECTS)
  7. Glossary
Learning outcomes

Overall, the materials and activities present in this block will allow students to:

  • explain the main concepts related to research data repositories and the role they play in the linguistic research data lifecycle in the context of Open Science and FAIR;
  • find and use certified research data repositories to discover, share, publish, and archive language and linguistic resources and datasets;
  • find and use integrated repository services and tools to process, annotate, and analyse different types of corpora according to standards and formats used by the community;
  • identify potential legal and ethical issues when collecting, sharing and reusing language data and resources.
Target audience

The primary target audience are lecturers who (want to) teach about standards and repositories related to linguistic data. Students can also use the materials autonomously, but should be aware that this is not a typical self-study course.

Creative Commons License

This UPSKILLS learning content block is licensed under a Creative Commons Attribution 4.0 International License.

Block designers

Iulianna van der Lek

Darja Fišer

(with additional contributions from Francesca Frontini, Pawel Kamocki, Alexander König and Willem Elbers)