David Black-Schaffer
Professor at Department of Information Technology; Division of Computer Systems
- Telephone:
- +46 18 471 68 30
- Mobile phone:
- +46 76 824 20 17
- E-mail:
- david.black-schaffer@it.uu.se
- Visiting address:
- Hus 10, Lägerhyddsvägen 1
- Postal address:
- Box 337
751 05 UPPSALA
More information is available to staff who log in.
Short presentation
My research focuses on approaches for moving data more efficiently in computer systems, using both software and hardware techniques. Our results have been commercialized through a startup and incorporated in industry standards. Prior to joining Uppsala, I contributed to the OpenCL standard while working at Apple, Inc. I have received multiple teaching awards and lead a startup that helped over 80,000 students.
As of June 2023 I am the Dean of Research for the Faculty of Science and Technology.
Keywords
- active teaching
- commercialization
- computer architecture
- efficiency
- memory systems
- performance
- runtimes
- scheduling
- simulation
Biography
I received my PhD in Electrical Engineering from Stanford University in 2008. My PhD thesis was on programming for real-time embedded processing on many-core processors in the Concurrent VLSI Architecture Group working with William Dally. After my PhD I worked at Apple on the development of the first OpenCL implementation for heterogeneous parallel processing across CPUs and GPUs, and then as a postdoc researcher in computer architecture in the Dept. of Information Technology at Uppsala University. I was appointed assistant professor in 2010 in the architecture research group at Uppsala looking at parallel programming systems and optimizations as part of the UPMARC research center. I received the docent title in 2014 and a promotion to full professor in 2017.
At Uppsala University, I was the Research Responsible Professor for the Computer Architecture and Communications Systems program from 2020-2022, the head of the Division of Computer Systems from 2022, and the department representative to the faculty Advisory Committee for Research since 2021.
I have been very active in flipped-classroom teaching. In particular, I lead the ScalableLearning project from 2012-2020, which developed an online system to support at-home and in-class flipped classroom teaching used by over 80,000 students. My active teaching techniques have been recognized by the Uppsala Engineering and Science Student Union Pedagogical Prize (2012), the Uppsala University Pedagogical Prize (2016), and the Uppsala Technical Physics Students' Teaching Award (2019).
I have also worked to bring my research results into industry, both through startups and industrial collaboration. Together with my colleague Erik Hagersten, we commercialized our new power-efficient memory system designs, resulting in their being acquired by a major international corporation. I have also worked with my colleague Chang Hyun Park and collaborators at Arm Ltd., in the UK, to get our memory system designs into the specification for future Arm processors.
Grants and Awards
- Knut and Alice Wallenberg Foundation, Wallenberg Academy Fellowship Prolongation (2020-2025)
- Swedish Research Council (VR) Project Grant (2019-2024)
- European Research Council ERC Starting Grant (2017-2022)
- Uppsala University Pedaogical Prize (2016)
- Swedish Foundation for Strategic Research (SSF), Smart Systems Framework Grant (Co-PI, 2016-2021) Automating System SpEcific Model-Based LEarning (ASSEMBLE)
- Knut and Alice Wallenberg Foundation, Wallenberg Academy Fellow (2016-2021)
- Swedish Research Council (VR), Young Researcher Project Grant (2015-2018)
- Swedish Foundation for Strategic Research (SSF), Future Research Leaders (2013-2018)
- EU FP7, Addressing Energy in Parallel Technologies (Co-PI 2013-2016)
- Uppsala University, Pedagogical Development Grant for Flipped Classroom (2013)
- Swedish Research Council (VR), Framework Grant (Co-PI, 2012-2017)
- Uppsala Union of Engineering and Science Students, Teaching Award (2012)
- Stanford University, Centennial Teaching Assistant Award (2004)
- Stanford University, Hugh Hildreth Skilling Teaching Assistant Award (2003)
Teaching
- Computer Architecture 1 (To view the interactive online course lectures, register at ScalableLearning and join with the enrollment key YRLRX-25436.)
- Sample: Introduction to Digital Logic Design (88 minutes)
- Sample: Introduction to Virtual Memory (70 min)
- Parallel Programming for Efficiency (MSc level)
- Sample: Power and Energy in Computer Systems (52 min)
- Introduction to Computer Architecture Research (PhD level)
Presentations
- Predicting Next-Generation Multicore Performance in a Fraction of a Second (Keynote, SICS Multicore Day, 2015)
- GPUs: The Hype, The Reality, and the Future (Keynote, SICS Multicore Day, 2013) PDF (2011)
- Flipped Classroom Teaching in an Introductory CS Course (KTH, 2013) PDF
- Resource Sharing in Multicore Processors (Keynote, Ericsson Software Research Day 2011)
- Introduction to OpenCL PDF
- Optimizing OpenCL PDF
- GPU Architectures for Non-Graphics People PDF
Research
This text is not available in English, therefore the Swedish version is shown.
My research focuses on improving efficiency in computers by making the memory system more intelligent. Our work includes more clever ways of moving and placing data in the memory system, integrating data movement with the processor core itself, adapting runtime schedules for better data movement, and the analysis and modeling of data movement.
Publications
Recent publications
- Mark-Scavenge (2024)
- Mutator-Driven Object Placement using Load Barriers (2024)
- Mutator-Driven Object Placement using Load Barriers (2024)
- Faster FunctionalWarming with Cache Merging (2023)
- Large-scale Graph Processing on Commodity Systems (2023)
All publications
Articles
- Mark-Scavenge (2024)
- Exploring the Latency Sensitivity of Cache Replacement Policies (2023)
- Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores (2022)
- Early Address Prediction (2021)
- A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006 (2021)
- Maximizing limited resources (2019)
- Analyzing performance variation of task schedulers with TaskInsight (2018)
- Exploring scheduling effects on task performance with TaskInsight (2017)
- Analytical Processor Performance and Power Modeling Using Micro-Architecture Independent Characteristics (2016)
- Universiteten som försvann (2013)
Chapters
Conferences
- Mutator-Driven Object Placement using Load Barriers (2024)
- Mutator-Driven Object Placement using Load Barriers (2024)
- Faster FunctionalWarming with Cache Merging (2023)
- Large-scale Graph Processing on Commodity Systems (2023)
- Protean (2023)
- Every Walk's a Hit (2022)
- Delay and Bypass (2020)
- Efficient temporal and spatial load to load forwarding (2020)
- Architecturally-independent and time-based characterization of SPEC CPU 2017 (2020)
- Perforated Page (2020)
- Modeling and Optimizing NUMA Effects and Prefetching with Machine Learning (2020)
- FIFOrder MicroArchitecture (2019)
- Filter caching for free (2019)
- Freeway (2019)
- Efficient thread/page/parallelism autotuning for NUMA systems (2019)
- Dynamically Disabling Way-prediction to Reduce Instruction Replay (2018)
- Tail-PASS (2018)
- Behind the Scenes (2018)
- Addressing energy challenges in filter caches (2017)
- Adaptive cache warming for faster simulations (2017)
- TaskInsight (2017)
- Understanding the interplay between task scheduling, memory and performance (2017)
- Analyzing Graphics Workloads on Tile-based GPUs (2017)
- POSTER (2017)
- A graphics tracing framework for exploring CPU+GPU memory systems (2017)
- A split cache hierarchy for enabling data-oriented optimizations (2017)
- Spatial and Temporal Cache Sharing Analysis in Tasks (2016)
- Characterizing Task Scheduling Performance Based on Data Reuse (2016)
- Formalizing data locality in task parallel applications (2016)
- Partitioning GPUs for Improved Scalability (2016)
- Data placement across the cache hierarchy (2016)
- StatTask (2015)
- AREP (2015)
- Full speed ahead (2015)
- Long Term Parking (LTP) (2015)
- Micro-Architecture Independent Analytical Processor Performance and Power Modeling (2015)
- Fix the code. Don't tweak the hardware (2014)
- The Direct-to-Data (D2D) Cache (2014)
- Shared Resource Sensitivity in Task-Based Runtime Systems (2013)
- Bandwidth Bandit (2013)
- Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models (2013)
- Towards more efficient execution (2013)
- Modeling performance variation due to cache sharing (2013)
- TLC (2013)
- Bandwidth bandit (2012)
- Efficient techniques for predicting cache sharing and throughput (2012)
- Phase Guided Profiling for Fast Cache Modeling (2012)
- Phase Behavior in Serial and Parallel Applications (2012)
- Fast modeling of shared caches in multicore systems (2011)
- Cache Pirating (2011)
- A simple model for tuning tasks (2011)
- Using hardware transactional memory for high-performance computing (2011)
- A simple statistical cache sharing model for multicores (2011)
- Block-Parallel Programming for Real-time Embedded Applications (2010)
- StatCC (2010)
Reports
- Faster Functional Warming with Cache Merging (2022)
- Minimizing Replay under Way-Prediction (2019)
- Perf-Insight (2015)
- Full Speed Ahead (2014)
- Quantitative Characterization of Memory Contention (2012)
- Computing Systems: Research Challenges Ahead (2011)
- Cache Pirating (2011)