David Black-Schaffer
Professor vid Institutionen för informationsteknologi; Datorteknik
- Telefon:
- 018-471 68 30
- Mobiltelefon:
- 076-824 20 17
- E-post:
- david.black-schaffer@it.uu.se
- Besöksadress:
- Hus 10, Regementsvägen 10
- Postadress:
- Box 337
751 05 UPPSALA
- Akademiska meriter:
- Docent , Excellent lärare
- CV:
- Ladda ned CV
- ORCID:
- 0000-0002-8250-8574
Kort presentation
My research focuses on approaches for moving data more efficiently in computer systems, using both software and hardware techniques. Our results have been commercialized through a startup and incorporated in industry standards. Prior to joining Uppsala, I contributed to the OpenCL standard while working at Apple, Inc. I have received multiple teaching awards and lead a startup that helped over 80,000 students.
As of June 2023 I am the Dean of Research for the Faculty of Science and Technology.
Nyckelord
- performance
- computer architecture
- memory systems
- simulation
- runtimes
- scheduling
- efficiency
- active teaching
- commercialization
Biografi
I received my PhD in Electrical Engineering from Stanford University in 2008. My research was in programming for real-time embedded processing on many-core processors in the Concurrent VLSI Architecture Group working with William Dally. After my PhD I worked at Apple on the development of the first OpenCL implementation for heterogeneous parallel processing across CPUs and GPUs, and then as a postdoc researcher in computer architecture in the Dept. of Information Technology at Uppsala University. I was appointed assistant professor in 2010 in the architecture research group at Uppsala looking at parallel programming systems and optimizations as part of the UPMARC research project. In 2014 I was promoted to Associate Professor (docent, lektor) and in 2017 to Professor. I was the Research Program Responsible Professor for the Computer Architecture and Communications Research Program from 2020 through 2022 and the Head of the Division of Computer Systems from 2022.
In addition to research, I led the ScalableLearning project (2012-2020) which brought the benefits of active, flipped-classroom teaching to nearly one hundred thousand students in Sweden and abroad. I was a co-founder of Green Cache AB (2014-2018), which developed and sold advanced memory systems technology. I also worked as the head of interactivity design at Collegial AB from 2018-2019.
Grants and Awards
- Wallenberg Academy Fellowship Prolongation (2020-2025)
- The Lilly and Sven Thurés prize (2020, The Royal Society of Sciences, Uppsala)
- Swedish Research Council (VR), Project Grant (2019-2023)
- Uppsala Technical Physics Students' Teaching Award (2019)
- European Research Council ERC Starting Grant (2017-2022)
- Uppsala University Pedaogical Prize (2016)
- Swedish Foundation for Strategic Research (SSF), Smart Systems Framework Grant (Co-PI, 2016-2021)
- Knut and Alice Wallenberg Foundation, Wallenberg Academy Fellow (2016-2021)
- Swedish Research Council (VR), Young Researcher Project Grant (2015-2018)
- Swedish Foundation for Strategic Research (SSF), Future Research Leaders (2013-2018)
- EU FP7, Addressing Energy in Parallel Technologies (Co-PI 2013-2016)
- Uppsala University, Pedagogical Development Grant for Flipped Classroom (2013)
- Swedish Research Council (VR), Framework Grant (Co-PI, 2012-2017)
- Uppsala Union of Engineering and Science Students, Teaching Award (2012)
- Stanford University, Centennial Teaching Assistant Award (2004)
- Stanford University, Hugh Hildreth Skilling Teaching Assistant Award (2003)
Teaching
- Computer Architecture 1 (To view the interactive online course lectures, register at ScalableLearning and join with the enrollment key YRLRX-25436.)
- Sample: Introduction to Digital Logic Design (88 minutes)
- Sample: Introduction to Virtual Memory (70 min)
- Parallel Programming for Efficiency (MSc level)
- Sample: Power and Energy in Computer Systems (52 min)
- Introduction to Computer Architecture Research (PhD level)
Presentations
- Predicting Next-Generation Multicore Performance in a Fraction of a Second (Keynote, SICS Multicore Day, 2015)
- GPUs: The Hype, The Reality, and the Future (Keynote, SICS Multicore Day, 2013) PDF (2011)
- Flipped Classroom Teaching in an Introductory CS Course (KTH, 2013) PDF
- Resource Sharing in Multicore Processors (Keynote, Ericsson Software Research Day 2011)
- Introduction to OpenCL PDF
- Optimizing OpenCL PDF
- GPU Architectures for Non-Graphics People PDF
Forskning
My research focuses on improving efficiency in computers by making the memory system more intelligent. Our work includes more clever ways of moving and placing data in the memory system, integrating data movement with the processor core itself, adapting runtime schedules for better data movement, and the analysis and modeling of data movement.

Publikationer
Senaste publikationer
-
CoGraf: Fully Accelerating Graph Applications with Fine-Grained PIM
Ingår i Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2026
- DOI för CoGraf: Fully Accelerating Graph Applications with Fine-Grained PIM
- Ladda ner fulltext (pdf) av CoGraf: Fully Accelerating Graph Applications with Fine-Grained PIM
-
2026
-
Hiding Page Fault Latencies in Graph Processing Applications that Cannot Fit in Memory
Ingår i 40th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2026, 2026
-
Second-level Caches: Not for Instructions
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2025
- DOI för Second-level Caches: Not for Instructions
- Ladda ner fulltext (pdf) av Second-level Caches: Not for Instructions
-
Mark-Scavenge: Waiting for Trash to Take Itself Out
Ingår i Proceedings of the ACM on Programming Languages, 2024
- DOI för Mark-Scavenge: Waiting for Trash to Take Itself Out
- Ladda ner fulltext (pdf) av Mark-Scavenge: Waiting for Trash to Take Itself Out
Alla publikationer
Artiklar i tidskrift
-
Second-level Caches: Not for Instructions
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2025
- DOI för Second-level Caches: Not for Instructions
- Ladda ner fulltext (pdf) av Second-level Caches: Not for Instructions
-
Mark-Scavenge: Waiting for Trash to Take Itself Out
Ingår i Proceedings of the ACM on Programming Languages, 2024
- DOI för Mark-Scavenge: Waiting for Trash to Take Itself Out
- Ladda ner fulltext (pdf) av Mark-Scavenge: Waiting for Trash to Take Itself Out
-
Exploring the Latency Sensitivity of Cache Replacement Policies
Ingår i IEEE Computer Architecture Letters, s. 93-96, 2023
- DOI för Exploring the Latency Sensitivity of Cache Replacement Policies
- Ladda ner fulltext (pdf) av Exploring the Latency Sensitivity of Cache Replacement Policies
-
Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2022
-
A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2021
-
Early Address Prediction: Efficient Pipeline Prefetch and Reuse
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2021
- DOI för Early Address Prediction: Efficient Pipeline Prefetch and Reuse
- Ladda ner fulltext (pdf) av Early Address Prediction: Efficient Pipeline Prefetch and Reuse
-
Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
Ingår i Journal of Signal Processing Systems, s. 379-397, 2019
- DOI för Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
- Ladda ner fulltext (pdf) av Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
-
Analyzing performance variation of task schedulers with TaskInsight
Ingår i Parallel Computing, s. 11-27, 2018
-
Exploring scheduling effects on task performance with TaskInsight
Ingår i Supercomputing frontiers and innovations, s. 91-98, 2017
-
Ingår i IEEE Transactions on Computers, s. 3537-3551, 2016
-
Ingår i Svenska Dagbladet, 2013
Dataset
Kapitel i böcker, delar av antologi
-
Efficient cache modeling with sparse data
Ingår i Processor and System-on-Chip Simulation, s. 193-209, Springer, 2010
Konferensbidrag
-
CoGraf: Fully Accelerating Graph Applications with Fine-Grained PIM
Ingår i Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2026
- DOI för CoGraf: Fully Accelerating Graph Applications with Fine-Grained PIM
- Ladda ner fulltext (pdf) av CoGraf: Fully Accelerating Graph Applications with Fine-Grained PIM
-
2026
-
Hiding Page Fault Latencies in Graph Processing Applications that Cannot Fit in Memory
Ingår i 40th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2026, 2026
-
Mutator-Driven Object Placement using Load Barriers
Ingår i PROCEEDINGS OF THE 21ST ACM SIGPLAN INTERNATIONAL CONFERENCE ON MANAGED PROGRAMMING LANGUAGES AND RUNTIMES, MPLR 2024, s. 14-27, 2024
- DOI för Mutator-Driven Object Placement using Load Barriers
- Ladda ner fulltext (pdf) av Mutator-Driven Object Placement using Load Barriers
-
Mutator-Driven Object Placement using Load Barriers
Ingår i MPLR 2024: Proceedings of the 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes, 2024
-
Ingår i The International Symposium on Memory Systems (MEMSYS '23), s. 1-11, 2023
- DOI för Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of Swapping
- Ladda ner fulltext (pdf) av Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of Swapping
-
Protean: Resource-efficient Instruction Prefetching
Ingår i The International Symposium on Memory Systems (MEMSYS '23), s. 1-13, 2023
- DOI för Protean: Resource-efficient Instruction Prefetching
- Ladda ner fulltext (pdf) av Protean: Resource-efficient Instruction Prefetching
-
Faster FunctionalWarming with Cache Merging
Ingår i PROCEEDINGS OF SYSTEM ENGINEERING FOR CONSTRAINED EMBEDDED SYSTEMS, DRONESE AND RAPIDO 2023, s. 39-47, 2023
-
Every Walk's a Hit: Making Page Walks Single-Access Cache Hits
Ingår i Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28 – March 4, 2022, Lausanne, Switzerland, 2022
- DOI för Every Walk's a Hit: Making Page Walks Single-Access Cache Hits
- Ladda ner fulltext 1 (pdf) av Every Walk's a Hit: Making Page Walks Single-Access Cache Hits
- Ladda ner fulltext 2 (pdf) av Every Walk's a Hit: Making Page Walks Single-Access Cache Hits
-
Architecturally-independent and time-based characterization of SPEC CPU 2017
Ingår i 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), s. 107-109, 2020
- DOI för Architecturally-independent and time-based characterization of SPEC CPU 2017
- Ladda ner fulltext 1 (pdf) av Architecturally-independent and time-based characterization of SPEC CPU 2017
- Ladda ner fulltext 2 (pdf) av Architecturally-independent and time-based characterization of SPEC CPU 2017
-
Modeling and Optimizing NUMA Effects and Prefetching with Machine Learning
Ingår i ICS '20: Proceedings of the 34th ACM International Conference on Supercomputing, 2020
- DOI för Modeling and Optimizing NUMA Effects and Prefetching with Machine Learning
- Ladda ner fulltext (pdf) av Modeling and Optimizing NUMA Effects and Prefetching with Machine Learning
-
Perforated Page: Supporting Fragmented Memory Allocation for Large Pages
Ingår i Proceedings of the 47th Annual ACM/IEEE International Symposium on Computer Architecture (ISCA), s. 913-925, 2020
- DOI för Perforated Page: Supporting Fragmented Memory Allocation for Large Pages
- Ladda ner fulltext (pdf) av Perforated Page: Supporting Fragmented Memory Allocation for Large Pages
-
Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors
Ingår i 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), s. 424-434, 2020
-
Efficient temporal and spatial load to load forwarding
Ingår i Proc. 26th International Symposium on High-Performance and Computer Architecture, 2020
-
Efficient thread/page/parallelism autotuning for NUMA systems
Ingår i ICS '19, s. 342-353, 2019
- DOI för Efficient thread/page/parallelism autotuning for NUMA systems
- Ladda ner fulltext (pdf) av Efficient thread/page/parallelism autotuning for NUMA systems
-
Filter caching for free: The untapped potential of the store-buffer
Ingår i Proc. 46th International Symposium on Computer Architecture, s. 436-448, 2019
- DOI för Filter caching for free: The untapped potential of the store-buffer
- Ladda ner fulltext (pdf) av Filter caching for free: The untapped potential of the store-buffer
-
FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
Ingår i 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), s. 716-721, 2019
- DOI för FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
- Ladda ner fulltext (pdf) av FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
-
Freeway: Maximizing MLP for Slice-Out-of-Order Execution
Ingår i 2019 25th IEEE International Symposium On High Performance Computer Architecture (HPCA), s. 558-569, 2019
- DOI för Freeway: Maximizing MLP for Slice-Out-of-Order Execution
- Ladda ner fulltext (pdf) av Freeway: Maximizing MLP for Slice-Out-of-Order Execution
-
Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-based GPUs
Ingår i Proc. International Symposium on Performance Analysis of Systems and Software, s. 1-11, 2018
- DOI för Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-based GPUs
- Ladda ner fulltext (pdf) av Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-based GPUs
-
Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware
Ingår i Proc. 16th International Conference on Parallel and Distributed Processing with Applications, s. 55-63, 2018
-
Dynamically Disabling Way-prediction to Reduce Instruction Replay
Ingår i 2018 IEEE 36th International Conference on Computer Design (ICCD), s. 140-143, 2018
-
Adaptive cache warming for faster simulations
Ingår i Proc. 9th Workshop on Rapid Simulation and Performance Evaluation, 2017
- DOI för Adaptive cache warming for faster simulations
- Ladda ner fulltext (pdf) av Adaptive cache warming for faster simulations
-
A split cache hierarchy for enabling data-oriented optimizations
Ingår i Proc. 23rd International Symposium on High Performance Computer Architecture, s. 133-144, 2017
-
Addressing energy challenges in filter caches
Ingår i Proc. 29th International Symposium on Computer Architecture and High Performance Computing, s. 49-56, 2017
-
Understanding the interplay between task scheduling, memory and performance
Ingår i Proc. Companion 8th ACM International Conference on Systems, Programming, Languages, and Applications, s. 21-23, 2017
-
TaskInsight: Understanding task schedules effects on memory and performance
Ingår i Proc. 8th International Workshop on Programming Models and Applications for Multicores and Manycores, s. 11-20, 2017
-
Analyzing Graphics Workloads on Tile-based GPUs
Ingår i Proc. 20th International Symposium on Workload Characterization, s. 108-109, 2017
-
A graphics tracing framework for exploring CPU+GPU memory systems
Ingår i Proc. 20th International Symposium on Workload Characterization, s. 54-65, 2017
-
POSTER: Putting the G back into GPU/CPU Systems Research
Ingår i 2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), s. 130-131, 2017
-
Characterizing Task Scheduling Performance Based on Data Reuse
Ingår i Proc. 9th Nordic Workshop on Multi-Core Computing, 2016
-
Formalizing data locality in task parallel applications
Ingår i Algorithms and Architectures for Parallel Processing, s. 43-61, 2016
-
Spatial and Temporal Cache Sharing Analysis in Tasks
2016
-
Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement
Ingår i Proc. 34th International Conference on Computer Design, s. 117-124, 2016
-
Partitioning GPUs for Improved Scalability
Ingår i Proc. 28th International Symposium on Computer Architecture and High Performance Computing, s. 42-49, 2016
-
Long Term Parking (LTP): Criticality-aware Resource Allocation in OOO Processors
Ingår i Proc. 48th International Symposium on Microarchitecture, s. 334-346, 2015
-
StatTask: Reuse distance analysis for task-based applications
Ingår i Proc. 7th Workshop on Rapid Simulation and Performance Evaluation, s. 1-7, 2015
-
Full speed ahead: Detailed architectural simulation at near-native speed
Ingår i Proc. 18th International Symposium on Workload Characterization, s. 183-192, 2015
-
Micro-Architecture Independent Analytical Processor Performance and Power Modeling
Ingår i 2015 IEEE International Symposium on Performance Analysis and Software (ISPASS), s. 32-41, 2015
-
AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance
Ingår i Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, s. 367-378, 2015
- DOI för AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance
- Ladda ner fulltext (pdf) av AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance
-
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling
Ingår i Proc. 12th International Symposium on Code Generation and Optimization, s. 262-272, 2014
-
The Direct-to-Data (D2D) Cache: Navigating the cache hierarchy with a single lookup
Ingår i Proc. 41st International Symposium on Computer Architecture, s. 133-144, 2014
-
Bandwidth Bandit: Quantitative Characterization of Memory Contention
Ingår i Proc. 11th International Symposium on Code Generation and Optimization, s. 99-108, 2013
-
Shared Resource Sensitivity in Task-Based Runtime Systems
Ingår i Proc. 6th Swedish Workshop on Multi-Core Computing, 2013
-
TLC: A tag-less cache for reducing dynamic first level cache energy
Ingår i Proceedings of the 46th International Symposium on Microarchitecture, s. 49-61, 2013
-
Towards more efficient execution: a decoupled access-execute approach
Ingår i Proc. 27th ACM International Conference on Supercomputing, s. 253-262, 2013
- DOI för Towards more efficient execution: a decoupled access-execute approach
- Ladda ner fulltext (pdf) av Towards more efficient execution: a decoupled access-execute approach
-
Modeling performance variation due to cache sharing
Ingår i Proc. 19th IEEE International Symposium on High Performance Computer Architecture, s. 155-166, 2013
- DOI för Modeling performance variation due to cache sharing
- Ladda ner fulltext (pdf) av Modeling performance variation due to cache sharing
-
Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models
Ingår i PARMA 2013, 4th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2013
-
Phase Guided Profiling for Fast Cache Modeling
Ingår i International Symposium on Code Generation and Optimization (CGO'12), s. 175-185, 2012
-
Phase Behavior in Serial and Parallel Applications
Ingår i International Symposium on Workload Characterization (IISWC'12), 2012
-
Efficient techniques for predicting cache sharing and throughput
Ingår i Proc. 21st International Conference on Parallel Architectures and Compilation Techniques, s. 305-314, 2012
- DOI för Efficient techniques for predicting cache sharing and throughput
- Ladda ner fulltext (pdf) av Efficient techniques for predicting cache sharing and throughput
-
Bandwidth bandit: Quantitative characterization of memory contention
Ingår i Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, s. 457-458, 2012
-
Cache Pirating: Measuring the Curse of the Shared Cache
Ingår i Proc. 40th International Conference on Parallel Processing, s. 165-175, 2011
-
A simple statistical cache sharing model for multicores
Ingår i Proc. 4th Swedish Workshop on Multi-Core Computing, s. 31-36, 2011
-
A simple model for tuning tasks
Ingår i Proc. 4th Swedish Workshop on Multi-Core Computing, s. 45-49, 2011
-
Using hardware transactional memory for high-performance computing
Ingår i Proc. 25th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, s. 1660-1667, 2011
-
Fast modeling of shared caches in multicore systems
Ingår i Proc. 6th International Conference on High Performance and Embedded Architectures and Compilers, s. 147-157, 2011
-
StatCC: a statistical cache contention model
Ingår i Proc. 19th International Conference on Parallel Architectures and Compilation Techniques, s. 551-552, 2010
-
Block-Parallel Programming for Real-time Embedded Applications
Ingår i Proc. 39th International Conference on Parallel Processing, s. 297-306, 2010
- DOI för Block-Parallel Programming for Real-time Embedded Applications
- Ladda ner fulltext (pdf) av Block-Parallel Programming for Real-time Embedded Applications
Rapporter
-
Faster Functional Warming with Cache Merging
2022
-
Minimizing Replay under Way-Prediction
2019
-
Perf-Insight: A Simple, Scalable Approach to Optimal Data Prefetching in Multicores
2015
-
Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed
2014
- Ladda ner fulltext (pdf) av Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed
-
Quantitative Characterization of Memory Contention
2012
-
Cache Pirating: Measuring the curse of the shared cache
2011
-
Computing Systems: Research Challenges Ahead: The HiPEAC Vision 2011/2012
2011