Data Engineering I
Syllabus, Master's level, 1TD069
- Education cycle
- Second cycle
- Main field(s) of study and in-depth level
- Computational Science A1N, Computer Science A1N, Data Science A1N, Technology A1N
- Grading system
- Fail (U), Pass (3), Pass with credit (4), Pass with distinction (5)
- Finalised by
- The Faculty Board of Science and Technology, 25 February 2020
- Responsible department
- Department of Information Technology
120 credits in science/engineering including 80 credits in computer science and mathematics, of which at least 20 credits in computer science and 30 credits in mathematics. Computer science is to include at least 10 credits programming, 5 credits scientific computing (or numerical methods/numerical analysis) and Database Design I. Mathematics is to include linear algebra and probability and statistics. Proficiency in English equivalent to the Swedish upper secondary course English 6.
On completion of the course the student shall be able to:
- Use public and private cloud infrastructure;
- Discuss key concepts in cloud computing such as Infrastructure as a Service (IaaS), Platform as a service (PaaS) och Software as a Service (SaaS);
- Apply cloud security best practices in solutions;
- Use modern systems for handling massive datasets;
- Analyze properties of data-intensive applications and based on this suggest suitable strategies and architectures to meet application needs;
- Implement software based on analysis as in the previous point and using technology presented in the course;
- Use container technology for automated deployment and continuous integration;
- Critically analyse, discuss and present solutions and implementations in writing and orally.
The course is an application-oriented introduction to cloud computing and data engineering. Basic concepts in cloud computing, such as virtualization, service layers, and basic security. Practical use of cloud infrastructure. Different storage management solutions and their advantages and disadvantages, including cloud-based dynamic allocation of volumes, object storage, distributed file systems and SQL and NoSQL databases. Design and development of batch analysis pipelines for large datasets. The MapReduce programming model and applications based on frameworks such as Apache Hadoop and Apache Spark. Evaluation and analysis of scalability, including concepts such as horizontal and vertical scaling, and strong and weak scaling. Deployment strategies using container technologies and an introduction to continuous integration and deployment.
Lectures and seminars, guest lectures and laboratory work. Participants work both in groups and individually.
Oral and written presentation on assignments. Written report on software project. Active participation in seminars.
If there are special reasons for doing so, an examiner may make an exception from the method of assessment indicated and allow a student to be assessed by another method. An example of special reasons might be a certificate regarding special pedagogical support from the disability coordinator of the university.