Large Datasets for Scientific Applications
Syllabus, Master's level, 1TD267
This course has been discontinued.
- Code
- 1TD267
- Education cycle
- Second cycle
- Main field(s) of study and in-depth level
- Computational Science A1F, Computer Science A1F, Technology A1F
- Grading system
- Pass with distinction (5), Pass with credit (4), Pass (3), Fail (U)
- Finalised by
- The Faculty Board of Science and Technology, 13 March 2014
- Responsible department
- Department of Information Technology
Entry requirements
120 credits including Computer Programming II or the equivalent (programming in Java or Python). Database Design II and Scientific Computing I or the equivalent.
Learning outcomes
To pass, the student should be able to
- use state-of-the art software platforms for management and processing of massive scientific datasets.
- analyse the characteristics of a data-intensive scientific application and propose suitable strategies to handle the data analytics aspects of the application.
- implement software to address an application’s data analysis needs using the technology presented in the course.
- critically analyse, discuss and present solutions and implementations in writing and orally.
Content
How to develop scientific applications utilizing methods and key concepts of large scale data processing platforms. Distributed file systems and cloud containers such as OpenStack Swift. Batch data processing with MapReduce based infrastructures such as Hadoop. Effective use of query languages such as Hive for scientific applications. Effective use of indexing. Array databases such as SciDB, data stream processing platforms such as Storm. Overview of techniques, concepts and tools for analysing massive data, such as NoSQL, NoDB, flat namespaces and ontology based data.
Instruction
Lectures, guest lectures, laboratory work and group supervision. Assignments with oral and written presentation.
Assessment
Mandatory assignments and the completion of a software project. Active participation in seminars. Written and oral discussion of assignments and research papers.