# Syllabus for Introduction to Data Science

Introduktion till dataanalys

## Syllabus

• 10 credits
• Course code: 1MS041
• Education cycle: Second cycle
• Main field(s) of study and in-depth level: Mathematics A1N, Data Science A1N, Computer Science A1N
• Grading system: Fail (U), Pass (3), Pass with credit (4), Pass with distinction (5)
• Established: 2020-02-27
• Established by:
• Revised: 2022-02-02
• Revised by: The Faculty Board of Science and Technology
• Applies from: Autumn 2022
• Entry requirements:

120 credits including 80 credits in computer science and mathematics, of which at least 15 credits in computer science including programming, and at least 30 credits in mathematics including probability and statistics, linear algebra and analysis. Proficiency in English equivalent to the Swedish upper secondary course English 6.

• Responsible department: Department of Mathematics

## Learning outcomes

On completion of the course the student shall be able to:

• find publicly available data sets and evaluate their usefulness for given purposes,
• process data and transform it for analysis,
• use common clustering and dimension reduction methods to explore data sets and argue on mathematical grounds for the relevance of the methods to the data set and the purpose in question,
• choose among common probabilistic models for analysis of data set,
• when choosing model, take into account limitations in computational capacity and complexity,
• evaluate the reliability of a solution by applying appropriate theoretical principles, including finite sample bounds,
• consistently take into account aspects of ethics, law and integrity,
• present the conclusions of an analysis / end product of an application.

## Content

Basics of axiomatic probability theory including measure theory and concentration inequalities. ​Common probability models and risk minimisation problems, such as regression/classification, hypothesis testing; typical applications in data science, such as prediction, recommendation, A/B testing,as well as common algorithms for their solutions,; modeling of dependence in underlying probability distributions arising from temporal, spatial and network structure, including Markov chains; elementary data processing (ELT/Extract-Load-Transform): transformation/cleaning of data for later processing by combining data from different sources and using dimension reduction and clustering methods, including PCA and random projection; use of visualization for exploratory data analysis and communicating results; legal and ethical aspects regarding collecting, processing and storing of data, case studies using real data involving relevant data processing, modeling and inference.

## Instruction

Lectures and labs.

## Assessment

Written examination (7.5 credtis). Assingments and written presentation and active participation in labs (2.5 credits).

If there are special reasons for doing so, an examiner may make an exception from the method of assessment indicated and allow a student to be assessed by another method. An example of special reasons might be a certificate regarding special pedagogical support from the disability coordinator of the university.

## Syllabus Revisions

Applies from: Autumn 2022

Some titles may be available electronically through the University library.

• Wasserman, Larry All of statistics : a concise course in statistical inference

New York: Springer, cop. 2004

Find in the library

• Kompendium och anteckningar

Matematiska institutionen,