Syllabus for Data Engineering II

Data engineering II

A revised version of the syllabus is available.


  • 7.5 credits
  • Course code: 1TD075
  • Education cycle: Second cycle
  • Main field(s) of study and in-depth level: Computer Science A1F, Data Science A1F, Computational Science A1F

    Explanation of codes

    The code indicates the education cycle and in-depth level of the course in relation to other courses within the same main field of study according to the requirements for general degrees:

    First cycle

    • G1N: has only upper-secondary level entry requirements
    • G1F: has less than 60 credits in first-cycle course/s as entry requirements
    • G1E: contains specially designed degree project for Higher Education Diploma
    • G2F: has at least 60 credits in first-cycle course/s as entry requirements
    • G2E: has at least 60 credits in first-cycle course/s as entry requirements, contains degree project for Bachelor of Arts/Bachelor of Science
    • GXX: in-depth level of the course cannot be classified

    Second cycle

    • A1N: has only first-cycle course/s as entry requirements
    • A1F: has second-cycle course/s as entry requirements
    • A1E: contains degree project for Master of Arts/Master of Science (60 credits)
    • A2E: contains degree project for Master of Arts/Master of Science (120 credits)
    • AXX: in-depth level of the course cannot be classified

  • Grading system: Fail (U), Pass (3), Pass with credit (4), Pass with distinction (5)
  • Established: 2020-02-27
  • Established by: The Faculty Board of Science and Technology
  • Applies from: Autumn 2020
  • Entry requirements:

    120 credits including Data Engineering I. Proficiency in English equivalent to the Swedish upper secondary course English 6.

  • Responsible department: Department of Information Technology

Learning outcomes

On completion of the course the student shall be able to:

  • describe pros and cons of modern systems for handling data streams and use them in practice to address application needs;
  • analyse properties of data intensive applications relying on streaming data and apply it to propose suitable solution architectures, including combination or batch and streaming data;
  • implement software where the analysis from the previous point and technology addressed in the course is used;
  • account for and handle practical aspects related to putting machine learning models into production;
  • use frameworks for large-scale distributed machine learning;
  • critically analyse, discuss and present solutions and implementations in writing and orally.


The aim of this course is to gain advanced knowledge in technology used for scalable analysis of streaming data, to understand processes and technologies for large-scale distributed machine learning, and practical knowledge in how to architect and automate pipelines and workflows to handle the chain from data ingestion to machine learning models in production. Advanced concepts in cloud computing such as container orchestration and automation. Theory and frameworks for streaming data such as Apache Spark and Apache Kafka. Deployment and use of frameworks for distributed machine learning. Software and systems for continuous analytics, monitoring and model serving. Lifecycle management of machine learning models.


Lectures, guest lectures, laboratory work, seminars and group supervision.


Active participation in seminars. Written and oral presentation of assignments, a software project and research papers.

If there are special reasons for doing so, an examiner may make an exception from the method of assessment indicated and allow a student to be assessed by another method. An example of special reasons might be a certificate regarding special pedagogical support from the disability coordinator of the university.

Syllabus Revisions

Reading list

Reading list

Applies from: Autumn 2020

Some titles may be available electronically through the University library.

Research papers, reports and tutorials.