The UPSKILLS Learning Content

A glimpse into language data science

🕑 6 ECTS (+ a student project amounting to 1 extra ECTS)

Description and scope

This learning block is dedicated to language data science intended as statistical analysis of data about language. It covers foundational statistical concepts involved in data description and visualisation, and it introduces statistical hypothesis testing and inference.

Compared to other existing teaching and learning materials on statistics, this block focuses more on explaining the core concepts and the statistical way of thinking, including aspects that are often taken for granted. No prior knowledge of data analysis is required, but a basic understanding of research methods (as covered in detail in the learning block First steps into scientific research) is expected.

The philosophy behind the block is that students should take an active approach in learning. We provide definitions of important concepts accompanied by examples and practical activities that will guide students towards their deeper understanding.

The block’s units are conceived in a modular way that allows lecturers and learners to take (or adapt) them either in sequence or as self-standing contents, depending on their needs. The units have a theoretical component presented in Moodle books, coupled with a set of activities shown in Moodle pages, Moodle labels and Moodle quizzes. In addition, materials for practical exercises in the R software are included (where the use of R is not compulsory – it can be skipped or replaced with other tools).

Block outline

Welcome to statistics – populations and samples, variables, measurement (1 ECTS)
Working with R – installing R, functions, objects, data preparation (1 ECTS)
Calculating summary numbers – frequencies, mean, median, measures of variability (1 ECTS)
Showing data on graphs – barplots, boxplots, scatterplots, mosaic plots (1 ECTS)
The logic behind inferential statistics – hypothesis testing, statistical significance, inference (1 ECTS)
Some simple statistical tests – Chi-square, correlation, t-tests, Wilcoxon tests (1 ECTS)
Student project – putting the pieces together by studying the frequency and familiarity of words (1 ECTS)

Learning outcomes

Overall, the materials and activities present in this block will allow students to:

explain the basic concepts involved in statistical data analysis, from data types to probability and statistical inference;
identify the most appropriate methods to describe and visualise different types of quantitative language data;
choose the appropriate statistical test among those commonly used for simple research setups;
implement methods and techniques for statistical data analysis using dedicated statistical software such as R.

Target audience

The primary target audience are lecturers who (want to) teach about quantitative analysis in the domain of linguistics, translation and other language-related areas. Students can also use the materials autonomously, but should be aware that this is not a typical self-study course.

Access on Moodle

Download