Syllabus for Data Engineering I

Data engineering I

Syllabus

  • 5 credits
  • Course code: 1TD169
  • Education cycle: Second cycle
  • Main field(s) of study and in-depth level: Computer Science A1N, Data Science A1N, Technology A1N, Computational Science A1N
  • Grading system: Fail (U), Pass (3), Pass with credit (4), Pass with distinction (5)
  • Established: 2020-02-27
  • Established by: The Faculty Board of Science and Technology
  • Revised: 2022-10-18
  • Revised by: The Faculty Board of Science and Technology
  • Applies from: Autumn 2023
  • Entry requirements:

    120 credits in science/engineering including 50 credits in computer science and mathematics, of which at least 20 credits in computer science and 20 credits in mathematics. Computer science is to include at least 10 credits programming and participation in Database Design I. Mathematics is to include linear algebra and probability and statistics. Proficiency in English equivalent to the Swedish upper secondary course English 6.

  • Responsible department: Department of Information Technology

Learning outcomes

On completion of the course the student shall be able to:

  • use public and private cloud infrastructure;
  • discuss key concepts in cloud computing such as Infrastructure as a Service (IaaS), Platform as a service (PaaS) och Software as a Service (SaaS);
  • apply cloud security best practices in solutions;
  • use modern systems for handling massive datasets;
  • analyse properties of data-intensive applications and based on this suggest suitable strategies and architectures to meet application needs;
  • implement software based on analysis as in the previous point and using technology presented in the course;
  • critically analyse, discuss and present solutions and implementations in writing and orally.

Content

The course is an application-oriented introduction to cloud computing and data engineering. Basic concepts in cloud computing, such as virtualization, service layers, and basic security. Practical use of cloud infrastructure. Different storage management solutions and their advantages and disadvantages, including cloud-based dynamic allocation of volumes, object storage, distributed file systems and SQL and NoSQL databases. Design and development of batch analysis pipelines for large datasets. The MapReduce programming model and applications based on frameworks such as Apache Hadoop and Apache Spark. Evaluation and analysis of scalability, including concepts such as horizontal and vertical scaling, and strong and weak scaling.

Instruction

Lectures, seminars, guest lectures and laboratory work. Participants work both in groups and individually.

Assessment

Oral and written presentation on assignments. Written report on software project. Active participation in seminars.

If there are special reasons for doing so, an examiner may make an exception from the method of assessment indicated and allow a student to be assessed by another method. An example of special reasons might be a certificate regarding special pedagogical support from the disability coordinator of the university.

Syllabus Revisions

Reading list

Reading list

Applies from: Autumn 2023

Some titles may be available electronically through the University library.

Research papers, reports and tutorials.

Last modified: 2022-04-26