Erik Hagersten
Professor emeritus at Department of Information Technology; Division of Computer Systems
- Mobile phone:
- +46 79 321 67 71
- E-mail:
- Erik.Hagersten@it.uu.se
- Visiting address:
- Hus 10, Regementsvägen 10
- Postal address:
- Box 337
751 05 UPPSALA
Publications
Recent publications
Directed Statistical Warming through Time Traveling
Part of MICRO'52, p. 1037-1049, 2019
Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation
2018
Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware
Part of Proc. 16th International Conference on Parallel and Distributed Processing with Applications, p. 55-63, 2018
2017
A split cache hierarchy for enabling data-oriented optimizations
Part of Proc. 23rd International Symposium on High Performance Computer Architecture, p. 133-144, 2017
All publications
Articles in journal
Exploring scheduling effects on task performance with TaskInsight
Part of Supercomputing frontiers and innovations, p. 91-98, 2017
Part of IEEE Transactions on Computers, p. 3537-3551, 2016
Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
Part of ACM Transactions on Architecture and Code Optimization (TACO), 2016
- DOI for Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
- Download full text (pdf) of Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
The effects of granularity and adaptivity on private/shared classification for coherence
Part of ACM Transactions on Architecture and Code Optimization (TACO), 2015
Reconsidering algorithms for iterative solvers in the multicore era
Part of International Journal of Computational Science and Engineering, p. 270-282, 2009
Fast Data-Locality Profiling of Native Execution
Part of ACM SIGMETRICS Performance Evaluation Review, p. 169-180, 2005
Parallella program ger paradigmskifte
Part of Elektroniktidningen, 2005
Chapters in book
Efficient cache modeling with sparse data
Part of Processor and System-on-Chip Simulation, p. 193-209, Springer, 2010
TImestamp-based Selective Cache Allocation
Part of High Performance Memory Systems, Springer-Verlag, 2003
Conference papers
Directed Statistical Warming through Time Traveling
Part of MICRO'52, p. 1037-1049, 2019
Tail-PASS: Resource-based Cache Management for Tiled Graphics Rendering Hardware
Part of Proc. 16th International Conference on Parallel and Distributed Processing with Applications, p. 55-63, 2018
A split cache hierarchy for enabling data-oriented optimizations
Part of Proc. 23rd International Symposium on High Performance Computer Architecture, p. 133-144, 2017
Understanding the interplay between task scheduling, memory and performance
Part of Proc. Companion 8th ACM International Conference on Systems, Programming, Languages, and Applications, p. 21-23, 2017
A graphics tracing framework for exploring CPU+GPU memory systems
Part of Proc. 20th International Symposium on Workload Characterization, p. 54-65, 2017
POSTER: Putting the G back into GPU/CPU Systems Research
Part of 2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), p. 130-131, 2017
Approximation: A New Paradigm also for Wireless Sensing
2016
CoolSim: Statistical Techniques to Replace Cache Warming with Efficient, Virtualized Profiling
Part of Proceedings Of 2016 International Conference On Embedded Computer Systems, p. 106-115, 2016
Formalizing data locality in task parallel applications
Part of Algorithms and Architectures for Parallel Processing, p. 43-61, 2016
CoolSim: Eliminating Traditional Cache Warming with Fast, Virtualized Profiling
Part of 2016 IEEE International Symposium On Performance Analysis Of Systems And Software ISPASS 2016, p. 149-150, 2016
Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement
Part of Proc. 34th International Conference on Computer Design, p. 117-124, 2016
Long Term Parking (LTP): Criticality-aware Resource Allocation in OOO Processors
Part of Proc. 48th International Symposium on Microarchitecture, p. 334-346, 2015
Effects of Granularity/Adaptivity on Private/Shared Classification for Coherence
2015
StatTask: Reuse distance analysis for task-based applications
Part of Proc. 7th Workshop on Rapid Simulation and Performance Evaluation, p. 1-7, 2015
Full speed ahead: Detailed architectural simulation at near-native speed
Part of Proc. 18th International Symposium on Workload Characterization, p. 183-192, 2015
Micro-Architecture Independent Analytical Processor Performance and Power Modeling
Part of 2015 IEEE International Symposium on Performance Analysis and Software (ISPASS), p. 32-41, 2015
AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance
Part of Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, p. 367-378, 2015
- DOI for AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance
- Download full text (pdf) of AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance
An efficient, self-contained, on-chip directory: DIR1-SISD
Part of Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, p. 317-330, 2015
Cost-effective speculative scheduling in high performance processors
Part of Proc. 42nd International Symposium on Computer Architecture, p. 247-259, 2015
Extending statistical cache models to support detailed pipeline simulators
Part of 2014 IEEE International Symposium On Performance Analysis Of Systems And Software (Ispass), p. 86-95, 2014
A software based profiling method for obtaining speedup stacks on commodity multi-cores
Part of 2014 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), p. 148-157, 2014
A case for resource efficient prefetching in multicores
Part of Proc. International Symposium on Performance Analysis of Systems and Software, p. 137-138, 2014
A case for resource efficient prefetching in multicores
Part of Proc. 43rd International Conference on Parallel Processing, p. 101-110, 2014
The Direct-to-Data (D2D) Cache: Navigating the cache hierarchy with a single lookup
Part of Proc. 41st International Symposium on Computer Architecture, p. 133-144, 2014
Resource conscious prefetching for irregular applications in multicores
Part of Proc. International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), p. 34-43, 2014
The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence
2014
Bandwidth Bandit: Quantitative Characterization of Memory Contention
Part of Proc. 11th International Symposium on Code Generation and Optimization, p. 99-108, 2013
TLC: A tag-less cache for reducing dynamic first level cache energy
Part of Proceedings of the 46th International Symposium on Microarchitecture, p. 49-61, 2013
Modeling performance variation due to cache sharing
Part of Proc. 19th IEEE International Symposium on High Performance Computer Architecture, p. 155-166, 2013
- DOI for Modeling performance variation due to cache sharing
- Download full text (pdf) of Modeling performance variation due to cache sharing
Low Overhead Instruction-Cache Modeling Using Instruction Reuse Profiles
Part of International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'12), p. 260-269, 2012
Phase Guided Profiling for Fast Cache Modeling
Part of International Symposium on Code Generation and Optimization (CGO'12), p. 175-185, 2012
Phase Behavior in Serial and Parallel Applications
Part of International Symposium on Workload Characterization (IISWC'12), 2012
Efficient techniques for predicting cache sharing and throughput
Part of Proc. 21st International Conference on Parallel Architectures and Compilation Techniques, p. 305-314, 2012
- DOI for Efficient techniques for predicting cache sharing and throughput
- Download full text (pdf) of Efficient techniques for predicting cache sharing and throughput
Bandwidth bandit: Quantitative characterization of memory contention
Part of Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, p. 457-458, 2012
Cache Pirating: Measuring the Curse of the Shared Cache
Part of Proc. 40th International Conference on Parallel Processing, p. 165-175, 2011
Fast modeling of shared caches in multicore systems
Part of Proc. 6th International Conference on High Performance and Embedded Architectures and Compilers, p. 147-157, 2011
A simple statistical cache sharing model for multicores
Part of Proc. 4th Swedish Workshop on Multi-Core Computing, p. 31-36, 2011
Efficient software-based online phase classification
Part of International Symposium on Workload Characterization (IISWC'11), p. 104-115, 2011
StatStack: Efficient modeling of LRU caches
Part of Proc. International Symposium on Performance Analysis of Systems and Software, p. 55-65, 2010
Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses
Part of Proc. International Conference for High Performance Computing, Networking, Storage and Analysis, p. 11, 2010
- DOI for Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses
- Download full text (pdf) of Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses
StatCC: a statistical cache contention model
Part of Proc. 19th International Conference on Parallel Architectures and Compilation Techniques, p. 551-552, 2010
A Software Technique for Reducing Cache Pollution
Part of Proc. 3rd Swedish Workshop on Multi-Core Computing, p. 59-62, 2010
Improving cache utilization using Acumem VPE
Part of Tools for High Performance Computing, p. 115-135, 2008
Conserving Memory Bandwidth in Chip Multiprocessors with Runahead Execution.
Part of 21st International Parallel and Distributed Processing Symposium, 2007
A case for low-complexity MP architectures
Part of Proc. Conference on Supercomputing, p. 559-570, 2007
A Statistical Multiprocessor Cache Model
Part of Proc. International Symposium on Performance Analysis of Systems and Software, p. 89-99, 2006
Modeling cache sharing on chip multiprocessor architectures
Part of Proc. International Symposium on Workload Characterization, p. 160-171, 2006
Multigrid and Gauss-Seidel smoothers revisited: Parallelization on chip multiprocessors
Part of Proc. 20th ACM International Conference on Supercomputing, p. 145-155, 2006
Exploiting Locality: A Flexible DSM Approach
Part of Proc. 20th IEEE International Parallel and Distributed Processing Symposium, 2006
TMA: A Trap-based Memory Architecture
Part of Proc. 20th ACM International Conference on Supercomputing, p. 259-268, 2006
Vasa: A Simulator Infrastructure with Adjustable Fidelity
Part of In Proceedings of the 17th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2005), Phoenix, Arizona, USA, November 2005., 2005
Exploring Processor Design Options for Java Based Middleware
Part of Proceedings of the 2005 International Conference on Parallel Processing (ICPP-05), 2005
Skewed Caches from a Low-Power Perspective
Part of Proceedings of Computing Frontiers, Ischia, Italy, May 2005, 2005
Exploiting Spatial Store Locality through Permission Caching in Software DSMs
Part of Proceedings of the 10th International Euro-Par Conference, p. 551, 2004
Bundling: Reducing the Overhead of Multiprocessor Prefetchers
Part of 18th International Parallel and Distributed Processing Symposium, 2004
StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis
Part of 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2004),, 2004
Hierarchical Backoff Locks for Nonuniform Communication Architectures
Part of Proceedings of the Ninth International Symposium on High Performance Computer Architecture (HPCA-9), Anaheim, California, USA, February 2003., 2003
THROOM — Supporting POSIX Multithreaded Binaries on a Cluster
Part of Euro-Par 2003, p. 760-769, 2003
Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors
Part of Proceedings of the 17th InternationalParallel and Distributed Processing Symposium (IPDPS 2003), Nice, France, 2003
Memory System Behavior of Java-Based Middleware
Part of Proceedings of the Ninth International Symposium on High Performance Computer Architecture, 2003
RH Lock: A Scalable Hierarchical Spin Lock
Part of Proceedings of the 2nd Annual Workshop on Memory Performance Issues (WMPI 2002), held in conjunction with the 29th International Symposium on Computer Architecture (ISCA29), Anchorage, Alaska, USA, 2002
Efficient Synchronization for Non-Uniform Communication Architectures
Part of Proceedings of Supercomputing 2002, Baltimore, Maryland, USA, 2002
WildFire: A Scalable Path for SMPs
Part of Proc. Fifth Int. Symp. on High-Performance Computer Architecture, p. 172-181, 1999
Patents
Multiprocessing computer system employing capacity prefetching
2007
Computer system employing bundled prefetching
2007
System and method for reducing shared memory write overhead in multiprocessor systems
2006
Computer system including a promise array
2006
Multiprocessing computer system employing capacity prefetching
2006
Multiprocessing systems employing hierarchical back-off locks
2006
2005
2005
Multi-node computer system employing multiple memory response states
2005
2004
Multi-node computer system employing a reporting mechanism for multi-node transactions
2004
Multi-node computer system implementing global access state dependent transactions
2004
Multiprocessing computer system employing capacity prefetching
2004
Multi-node system with split ownership and access right coherence mechanism
2004
Performing virtual to global address translation in processing subsystem
2004
Multi-node system with global access states
2004
Multiprocessing systems employing hierarchical back-off locks
2004
2004
System and method for reducing shared memory write overhead in multiprocessor systems
2004
Computer system employing bundled prefetching
2004
Computer system including a promise array
2004
Multi-node computer system with proxy transaction to read data from a non-owning memory device
2004
Selective address translation in coherent memory replication
2003
2003
2003
2003
Multiprocessing systems employing hierarchical spin locks
2003
Communication error reporting mechanism in a multiprocessing computer system
2003
2002
Hybrid memory access protocol in a distributed shared memory computer system
2002
Hierarchical SMP computer System
2002
Skewed finite hashing function
2002
Selective address translation in coherent memory replication
2002
Shared memory system for symmetric microprocessor systems
2001
Multiprocessing system configured to perform efficient block copy operations
2001
2001
Multiprocessing system configured to perform efficient block copy operations
2001
Skewed finite hashing function
2001
Shared memory system for symmetric multiprocessor systems
2001
Communication error reporting mechanism in a multiprocessing computer system
2001
Selective address translation in coherent memory replication
2001
Communication error reporting mechanism in a multiprocessing computer system
2001
Multiprocessing system configured to perform efficient block copy operations
2001
Skewed finite hashing function
2001
Cache-less address translation
2001
Hybrid memory access protocol in a distributed shared memory computer system
2001
Method for increasing the speed of data processing in a computer system
2000
2000
2000
Reports
Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation
2018
2017
The best of both works: A hybrid data-race-free cache coherence scheme
2017
A unified DVFS-cache resizing framework
2016
Perf-Insight: A Simple, Scalable Approach to Optimal Data Prefetching in Multicores
2015
Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed
2014
- Download full text (pdf) of Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed
Quantitative Characterization of Memory Contention
2012
Cache Pirating: Measuring the curse of the shared cache
2011
Multigrid and Gauss-Seidel smoothers revisited: Parallelization on chip multiprocessors
2006
TMA: A Trap-Based Memory Architecture
2005
Flexibility Implies Performance
2005
Adaptive Coherence Batching for Trap-Based Memory Architectures
2005
Low Power and Conflict Tolerant Cache Design
2004
Evaluation, Implementation and Performance of Write Permission Caching in the DSZOOM System
2004
StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis
2003
The Elbow Cache: A Power-Efficient Alternative to Highly Associative Caches
2003
Low-Overhead Spatial and Temporal Data Locality Analysis
2003
THROOM: Running POSIX Multithreaded Binaries on a Cluster
2003
Bundling: Reducing the Overhead of Multiprocessor Prefetchers
2003
Latency-hiding and Optimizations of the DSZOOM Instrumentation System
2003