Stefanos Kaxiras
Professor at Department of Information Technology; Division of Computer Systems
- Telephone:
- +46 18 471 29 74
- Mobile phone:
- +46 70 425 03 94
- E-mail:
- stefanos.kaxiras@it.uu.se
- Visiting address:
- Hus 10, Regementsvägen 10
- Postal address:
- Box 337
751 05 UPPSALA
Short presentation
IEEE Fellow, for contributions to high-performance and power-efficient memory hierarchies.
I am currently developing novel techniques and approaches in several computer architecture areas: non-speculative architectures to reduce the reliance on speculation while maintaining performance benefits; security at the architectural level; memory systems and memory hierarchies for novel computing paradigms.
Biography
Stefanos Kaxiras, IEEE Fellow, is Professor at Uppsala University, Sweden. He holds a PhD degree in Computer Science from the University of Wisconsin. In 1998, he joined the Computing Sciences Center at Bell Labs (Lucent) and later Agere Systems. In 2003 he joined the faculty of the ECE Department of the University of Patras, Greece and in 2010 became a full professor at Uppsala University, Sweden. Kaxiras’ research interests are in the areas of memory systems, and multiprocessor/multicore systems, with a focus on power efficiency. He has co-authored more than 90 research papers and 18 US patents, received three Swedish VR grants (main PI of a VR-Frame grant), participated in six major European research projects, and currently receives funding from Sweden’s business incubator and innovation agency VINNOVA. He is Fellow of the IEEE (for contributions to high-performance and power-efficient memory hierarchies) and ACM Distinguished Scientist.
Research
IEEE Fellow (2021)
ACM Distinguished Scientist (2009)
Research Interests & Contributions: Memory Systems (Highly-Scalable Cache Coherence, VIPS & Racer, Cache Management using Reuse Distances), Power (Cache Decay), Instruction-based prediction, Network processors (IPStash IP-Lookup memories), Memory/Processor Integration (Datascalar/Distributed Vector Architectures)
My most cited contribution, with Margaret Martonosi, is Cache Decay.
It is the most cited paper (by a wide margin) of ISCA 2001 (755 citations as of Apr. 2016):
I am currently working on VIPS coherence (12 papers in the period 2012-2016) with Alberto Ros and on Decoupled Access-Execute with Alexandra Jimborean. We have expanded into software distributed shared memory for HPC and Big Data with Kostis Sagonas.
Startups
Eta Scale manages distribution and dissemination of our research results: VIPS, ArgoDSM, and the DAE (Decoupled Access-Execute) compiler tools (Daedal).
Publications
Google scholar (sorted by citations)
dblp (sorted by year--partial list)
Researchgate (articles with text)
Linkedin (professional network and other info)
Recent papers (2015-2016)
2017
Check here for my 2017 papers (2 CGO papers accepted)
2016
1. M. F. Gonzalez-Zalba, F. Remacle, R.D. Levine, S. Rogge, S. Kaxiras, M. Sanquer, "Single Electron Devices and Circuits." ICT-Energy Letters, 2016.
2. Alberto Ros and Stefanos Kaxiras, "Racer: TSO Consistency via Race Detection." To appear: MICRO, 2016.
3. Alberto Ros, Carl Leonardsson, Chris Sakalis, and Stefanos Kaxiras, "POSTER: Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics." To appear: PACT, 2016.
4. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Stefanos Kaxiras, Carl Leonardsson, Alberto Ros and Yunyun Zhu, "Fencing Programs with Self-Invalidation and Self-Downgrade." 11th International Federated Conference on Distributed Computing Techniques, FORTE, 2016. '''Best Paper Award.'''
5. Magnus Själander, Gustaf Borgström, Stefanos Kaxiras, Mykhailo V. Klymenko and Françoise Remacle, "Techniques for Modulating Error Resilience in Emerging Multi-Value Technologies." ACM International Conference on Computing Frontiers, 2016.
6. Christos Sakalis, Alberto Ros, Carl Leonardsson, Stefanos Kaxiras, "Splash-3: A Properly Synchronized Benchmark Suite for Contemporary Research." In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2016. '''Open Source Software.'''
7. Konstantinos Koukos, Per Ekemark, Georgios Zacharopoulos, Vasileios Spiliopoulos, Stefanos Kaxiras, Alexandra Jimborean, "Multiversioned decoupled access-execute: the key to energy-efficient compilation of general-purpose programs." International Symposium on Compiler Construction, 2016. '''Best Paper Award.'''
8. T. Voigt, M. Själander, Frederik Hermans, Alexandra Jimborean, Erik Hagersten, Per Gunningberg, and Stefanos Kaxiras, "Poster: Approximation: A New Paradigm also for Wireless Sensing." Proceedings of the International Conference on Embedded Wireless Systems and Networks (EWSN), Graz, Austria, 15-17 Feb. 2016.
9. Jonatan Waern, Per Ekemark, Konstantinos Koukos, Stefanos Kaxiras and Alexandra Jimborean, "Profiling-Assisted Decoupled Access-Execute." HIP3ES: High Performance Energy Efficient Embedded Systems, 2016.
10. M. Själander, G. Borgström, and S. Kaxiras, "Improving Error-Resilience of Emerging Multi-Value Technologies." Workshop On Approximate Computing (WAPCO), 20 Jan. 2016.
11. Konstantinos Koukos, Alberto Ros, Erik Hagersten, Stefanos Kaxiras, "Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead." ACM Transactions on Architecture and Code Optimization (TACO), 2016. 13(1):1-22. DOI: 10.1145/2889488
2015
1. Mahdad Davari, A. Ros, E. Hagersten, S. Kaxiras, "An Efficient, Self-Contained, On-Chip Directory: DIR1-SISD." In IEEE Computer Society Parallel Architectures and Compilation Techniques (PACT), (pp. 317-330), 2015.
2. S. Kaxiras, D. Klaftenegger, M. Norgren, K. Sagonas, "Turning Centralized Coherence and Distributed Critical-Section Execution on their Head: A New Approach for Scalable Distributed Shared Memory." 24th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC-24), 2015. '''Nominated for Best Paper Award (1 out of 4 top papers).''' '''Open Source Software.'''
3. T. Carlson, S. Kaxiras, W. Heirman, L. Eeckhout, "The Load-Slice Core Microarchitecture." 42th International Symposium on Computer Architecture (ISCA-42) 2015.
4. A. Ros, S. Kaxiras, "Callback: Efficient Synchronization without Invalidation with a Directory Just for Spin-Waiting." 42th International Symposium on Computer Architecture (ISCA-42) 2015.
5. A. Ros, M. Davari, S. Kaxiras, "Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies." IEEE 21st High Performance Computer Architecture (HPCA-21), 2015.
6. A. Ros, S. Kaxiras, "Fast&Furious: A Tool for Detecting Covert Racing" PARMA-DITAM '15 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 4th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2015.
7. Andreas Sandberg, Nikos Nikoleris, Trevor E. Carlson, Erik Hagersten, Stefanos Kaxiras, David Black-Schaffer "Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed." IEEE International Symposium on Workload Characterization IISWC, 2015: 183-192.
8. Mahdad Davari, Alberto Ros, Erik Hagersten, Stefanos Kaxiras, "The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence." ACM Transactions on Architecture and Code Optimization (TACO), 2015.
== Book: ==
NEW! Our new book is out: "Power-Efficient Computer Architectures: Recent Advances" Paperback, Morgan and Claypool Publishers, January 1, 2015 by Magnus Sjalander, Margaret Martonosi, Stefanos Kaxiras.
Computer Architecture Techniques for Power-Efficiency
Stefanos Kaxiras and Margaret Martonosi
Synthesis Lectures on Computer Architecture
- Mark D. Hill Series Editor
- Paperback: 220 pages
- Publisher: Morgan and Claypool Publishers; 1 edition (June 13, 2008)
- ISBN-10: 1598292080 ISBN-13: 978-1598292084
PhD Students
- Mahdad Davari: Multicore Coherence
- Mehdi Alipur: Efficient cores
- Magnus Norgren: Software Coherence
Co-advising:
- Nikos Nikoleris: Cache modeling, fast simulation
- Ricardo Alves: Cache management
- David Klaftenegger: Efficient synchronization
Graduated
- Vasileios Spiliopoulos: Power, DVFS modeling, Power Tools, Cache management for power
- Konstantinos Koukos: Decoupled Access Execute, GPU Coherence
- Georgios Keramidas (TEI Messolonghi, Greece)
- Pavlos Petoumenos (Research Associate, University of Edinburgh)

Publications
Selection of publications
A unified DVFS-cache resizing framework
2016
Part of Proc. 25th International Conference on Compiler Construction, p. 121-131, 2016
- DOI for Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
- Download full text (pdf) of Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
Profiling-Assisted Decoupled Access-Execute
Part of Proc. 4th International Workshop on High Performance Energy Efficient Embedded Systems, 2016
Approximation: A New Paradigm also for Wireless Sensing
2016
Splash-3: A properly synchronized benchmark suite for contemporary research
Part of Proc. International Symposium on Performance Analysis of Systems and Software, p. 101-111, 2016
Fencing programs with self-invalidation and self-downgrade
Part of Formal Techniques for Distributed Objects, Components, and Systems, p. 19-35, 2016
Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
Part of ACM Transactions on Architecture and Code Optimization (TACO), 2016
- DOI for Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
- Download full text (pdf) of Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
Techniques for modulating error resilience in emerging multi-value technologies
Part of Proc. 13th International Conference on Computing Frontiers, p. 55-63, 2016
- DOI for Techniques for modulating error resilience in emerging multi-value technologies
- Download full text (pdf) of Techniques for modulating error resilience in emerging multi-value technologies
Full speed ahead: Detailed architectural simulation at near-native speed
Part of Proc. 18th International Symposium on Workload Characterization, p. 183-192, 2015
The effects of granularity and adaptivity on private/shared classification for coherence
Part of ACM Transactions on Architecture and Code Optimization (TACO), 2015
Part of Proc. 21st International Symposium on High Performance Computer Architecture, p. 186-197, 2015
An efficient, self-contained, on-chip directory: DIR1-SISD
Part of Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, p. 317-330, 2015
A tunable cache for approximate computing
Part of Proc. 10th International Symposium on Nanoscale Architectures, p. 88-89, 2014
Managing power constraints in a single-core scenario through power tokens
Part of Journal of Supercomputing, p. 414-442, 2014
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling
Part of Proc. 12th International Symposium on Code Generation and Optimization, p. 262-272, 2014
Power-Efficient Computer Architectures: Recent Advances
Morgan & Claypool Publishers, 2014
Efficient inter-core power and thermal balancing for multicore processors
Part of Computing, p. 537-566, 2013
Introducing DVFS-Management in a Full-System Simulator
Part of Proc. 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2013
Towards more efficient execution: a decoupled access-execute approach
Part of Proc. 27th ACM International Conference on Supercomputing, p. 253-262, 2013
- DOI for Towards more efficient execution: a decoupled access-execute approach
- Download full text (pdf) of Towards more efficient execution: a decoupled access-execute approach
A New Perspective for Efficient Virtual-Cache Coherence
Part of Proceedings of the 40th Annual International Symposium on Computer Architecture, p. 535-546, 2013
Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models
Part of PARMA 2013, 4th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2013
Complexity-effective multicore coherence
Part of Proc. 21st International Conference on Parallel Architectures and Compilation Techniques, p. 241-251, 2012
Power-Sleuth: A Tool for Investigating your Program's Power Behavior
Part of International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'12), p. 241-250, 2012
Efficient, snoopless, System-on-Chip coherence
Part of SOC Conference (SOCC), 2012 IEEE International, p. 230-235, 2012
Green governors: A framework for continuously adaptive DVFS
Part of Proc. International Green Computing Conference and Workshops, p. 1-8, 2011
Power-performance adaptation in Intel core i7
Part of Proc. 2nd Workshop on Computer Architecture and Operating System co-design, p. 10, 2011
Power Token Balancing: Adapting CMPs to power constraints for parallel multithreaded workloads
Part of Proc. 25th International Parallel and Distributed Processing Symposium, p. 431-442, 2011
Leakage-efficient design of value predictors through state and non-state preserving techniques
Part of Journal of Supercomputing, p. 28-50, 2011
SARC coherence: Scaling directory cache coherence in performance and power
Part of IEEE Micro, p. 54-65, 2010
Parallelizing multicore cache simulations on GPUs
Part of Proc. 3rd Swedish Workshop on Multi-Core Computing, p. 3-8, 2010
Interval-based models for run-time DVFS orchestration in superscalar processors
Part of Proc. 7th International Conference on Computing Frontiers, p. 287-296, 2010
Where replacement algorithms fail: a thorough analysis
Part of Proc. 7th International Conference on Computing Frontiers, p. 141-150, 2010
MLP-aware instruction queue resizing: The key to power-efficient performance
Part of Architecture of Computing Systems – ARCS 2010, p. 113-125, 2010
Improving Error-Resilience of Emerging Multi-Value Technologies
Recent publications
Bounding Speculative Execution of Atomic Regions to a Single Retry
2025
- DOI for Bounding Speculative Execution of Atomic Regions to a Single Retry
- Download full text (pdf) of Bounding Speculative Execution of Atomic Regions to a Single Retry
TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
Part of SenSys '24, p. 409-421, 2024
- DOI for TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
- Download full text (pdf) of TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
Hardware Cache Locking for All Memory Updates
Part of 2024 IEEE 42nd International Conference on Computer Design (ICCD), p. 566-574, 2024
JANUS: A Simple and Efficient Speculative Defense using Reinforcement Learning
Part of 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), p. 25-36, 2024
A First Exploration of Fine-Grain Coherence for Integrity Metadata
Part of 2024 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN, SEED 2024, p. 62-72, 2024
All publications
Articles in journal
Speculative inter-thread store-to-load forwarding in SMT architectures
Part of Journal of Parallel and Distributed Computing, p. 94-106, 2023
Analysing software prefetching opportunities in hardware transactional memory
Part of Journal of Supercomputing, p. 919-944, 2022
Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks
Part of ACM Transactions on Architecture and Code Optimization (TACO), 2022
- DOI for Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks
- Download full text (pdf) of Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks
Part of IEEE COMPUTER ARCHITECTURE LETTERS, p. 162-165, 2021
Early Address Prediction: Efficient Pipeline Prefetch and Reuse
Part of ACM Transactions on Architecture and Code Optimization (TACO), 2021
- DOI for Early Address Prediction: Efficient Pipeline Prefetch and Reuse
- Download full text (pdf) of Early Address Prediction: Efficient Pipeline Prefetch and Reuse
Evaluating the Potential Applications of Quaternary Logic for Approximate Computing
Part of ACM Journal on Emerging Technologies in Computing Systems, 2020
Understanding Selective Delay as a Method for Efficient Secure Speculative Execution
Part of IEEE Transactions on Computers, p. 1584-1595, 2020
Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
Part of Journal of Signal Processing Systems, p. 379-397, 2019
- DOI for Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
- Download full text (pdf) of Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
Mending fences with self-invalidation and self-downgrade
Part of Logical Methods in Computer Science, 2018
Automatic Detection of Large Extended Data-Race-Free Regions with Conflict Isolation
Part of IEEE Transactions on Parallel and Distributed Systems, p. 527-541, 2018
Static instruction scheduling for high performance on limited hardware
Part of IEEE Transactions on Computers, p. 513-527, 2018
Non-Speculative Load Reordering in Total Store Ordering
Part of IEEE Micro, p. 48-57, 2018
Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics
Part of IEEE Transactions on Parallel and Distributed Systems, p. 3413-3425, 2017
Transcending hardware limits with software out-of-order processing
Part of IEEE Computer Architecture Letters, p. 162-165, 2017
Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
Part of ACM Transactions on Architecture and Code Optimization (TACO), 2016
- DOI for Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
- Download full text (pdf) of Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
The effects of granularity and adaptivity on private/shared classification for coherence
Part of ACM Transactions on Architecture and Code Optimization (TACO), 2015
Managing power constraints in a single-core scenario through power tokens
Part of Journal of Supercomputing, p. 414-442, 2014
Efficient inter-core power and thermal balancing for multicore processors
Part of Computing, p. 537-566, 2013
Leakage-efficient design of value predictors through state and non-state preserving techniques
Part of Journal of Supercomputing, p. 28-50, 2011
SARC coherence: Scaling directory cache coherence in performance and power
Part of IEEE Micro, p. 54-65, 2010
Books
Power-Efficient Computer Architectures: Recent Advances
Morgan & Claypool Publishers, 2014
Conference papers
Bounding Speculative Execution of Atomic Regions to a Single Retry
2025
- DOI for Bounding Speculative Execution of Atomic Regions to a Single Retry
- Download full text (pdf) of Bounding Speculative Execution of Atomic Regions to a Single Retry
TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
Part of SenSys '24, p. 409-421, 2024
- DOI for TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
- Download full text (pdf) of TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
Hardware Cache Locking for All Memory Updates
Part of 2024 IEEE 42nd International Conference on Computer Design (ICCD), p. 566-574, 2024
JANUS: A Simple and Efficient Speculative Defense using Reinforcement Learning
Part of 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), p. 25-36, 2024
A First Exploration of Fine-Grain Coherence for Integrity Metadata
Part of 2024 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN, SEED 2024, p. 62-72, 2024
ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
Part of 56th IEEE/ACM International Symposium on Microarchitecture, MICRO 2023, p. 828-842, 2023
- DOI for ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
- Download full text (pdf) of ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes
Part of ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
- DOI for Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes
- Download full text (pdf) of Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes
Part of 2023 IEEE International ymposium on Workload Characterization, IISWC, p. 223-225, 2023
Silent Stores in the Battery-less Internet of Things: A Good Idea?
2023
Splash-4: A Modern Benchmark Suite with Lock-Free Constructs
Part of 2022 IEEE International Symposium on Workload Characterization (IISWC), p. 51-64, 2022
Clueless: A Tool Characterising Values Leaking as Addresses
Part of Proceedings of the 11th International Workshop on Hardware and Architectural Support for Security And Privacy, HASP 2022, p. 27-34, 2022
- DOI for Clueless: A Tool Characterising Values Leaking as Addresses
- Download full text (pdf) of Clueless: A Tool Characterising Values Leaking as Addresses
Free Atomics: Hardware Atomic Operations without Fences
Part of PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), p. 14-26, 2022
Data-Out Instruction-In (DOIN!): Leveraging Inclusive Caches to Attack Speculative Delay Schemes
Part of 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED 2022), p. 49-60, 2022
Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations
Part of Proceedings of 54th Annual IEEE/ACM International Symposium on Microarchitecture, Micro 2021, p. 337-349, 2021
ITSLF: Inter-Thread Store-to-Load Forwarding in Simultaneous Multithreading
Part of Proceedings of 54th Annual IEEE/ACM International Symposium on Microarchitecture, Micro 2021, p. 1296-1308, 2021
TSOPER: Efficient Coherence-Based Strict Persistency
Part of 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), p. 125-138, 2021
Part of 2021 International Symposium on Secure and Private Execution Environment Design (SEED), p. 89-100, 2021
Part of 2021 International Symposium on Secure and Private Execution Environment Design (SEED), p. 101-107, 2021
Splash-4: Improving Scalability with Lock-Free Constructs
Part of 2021 IEEE International Symposium On Performance Analysis Of Systems And Software (ISPASS 2021), p. 235-236, 2021
Speculative Enforcement of Store Atomicity
Part of 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), p. 555-567, 2020
- DOI for Speculative Enforcement of Store Atomicity
- Download full text (pdf) of Speculative Enforcement of Store Atomicity
Boosting Store Buffer Efficiency with Store-Prefetch Bursts
Part of 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), p. 568-580, 2020
- DOI for Boosting Store Buffer Efficiency with Store-Prefetch Bursts
- Download full text (pdf) of Boosting Store Buffer Efficiency with Store-Prefetch Bursts
Part of PACT ’20, p. 241-254, 2020
Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors
Part of 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), p. 424-434, 2020
Efficient temporal and spatial load to load forwarding
Part of Proc. 26th International Symposium on High-Performance and Computer Architecture, 2020
Ghost Loads: What is the cost of invisible speculation?
Part of Proceedings of the 16th ACM International Conference on Computing Frontiers, p. 153-163, 2019
- DOI for Ghost Loads: What is the cost of invisible speculation?
- Download full text (pdf) of Ghost Loads: What is the cost of invisible speculation?
Efficient invisible speculative execution through selective delay and value prediction
Part of Proc. 46th International Symposium on Computer Architecture, p. 723-735, 2019
- DOI for Efficient invisible speculative execution through selective delay and value prediction
- Download full text (pdf) of Efficient invisible speculative execution through selective delay and value prediction
Filter caching for free: The untapped potential of the store-buffer
Part of Proc. 46th International Symposium on Computer Architecture, p. 436-448, 2019
- DOI for Filter caching for free: The untapped potential of the store-buffer
- Download full text (pdf) of Filter caching for free: The untapped potential of the store-buffer
FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
Part of 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), p. 716-721, 2019
- DOI for FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
- Download full text (pdf) of FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores
Part of Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, p. 328-343, 2018
- DOI for SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores
- Download full text (pdf) of SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores
Dynamically Disabling Way-prediction to Reduce Instruction Replay
Part of 2018 IEEE 36th International Conference on Computer Design (ICCD), p. 140-143, 2018
Non-Speculative Store Coalescing in Total Store Order
Part of Proc.45th International Symposium on Computer Architecture, p. 221-234, 2018
- DOI for Non-Speculative Store Coalescing in Total Store Order
- Download full text (pdf) of Non-Speculative Store Coalescing in Total Store Order
Part of 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), p. 95-107, 2018
Exploring the performance limits of out-of-order commit
Part of Proc. 14th Computing Frontiers Conference, p. 211-220, 2017
Automatic detection of extended data-race-free regions
Part of Proc. 15th International Symposium on Code Generation and Optimization, p. 14-26, 2017
Decoupled Access-Execute on ARM big.LITTLE
Part of Proc. 5th Workshop on High Performance Energy Efficient Embedded Systems, 2017
A Taxonomy of Out-of-Order Instruction Commit
Part of 2017 Ieee International Symposium On Performance Analysis Of Systems And Software (Ispass), p. 135-136, 2017
Clairvoyance: Look-ahead compile-time scheduling
Part of Proc. 15th International Symposium on Code Generation and Optimization, p. 171-184, 2017
Addressing energy challenges in filter caches
Part of Proc. 29th International Symposium on Computer Architecture and High Performance Computing, p. 49-56, 2017
Non-speculative load-load reordering in TSO
Part of Proc. 44th International Symposium on Computer Architecture, p. 187-200, 2017
Part of Proc. 25th International Conference on Compiler Construction, p. 121-131, 2016
- DOI for Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
- Download full text (pdf) of Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
Profiling-Assisted Decoupled Access-Execute
Part of Proc. 4th International Workshop on High Performance Energy Efficient Embedded Systems, 2016
Approximation: A New Paradigm also for Wireless Sensing
2016
Splash-3: A properly synchronized benchmark suite for contemporary research
Part of Proc. International Symposium on Performance Analysis of Systems and Software, p. 101-111, 2016
Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics
Part of Proc. International Conference on Parallel Architectures and Compilation, p. 433-434, 2016
Fencing programs with self-invalidation and self-downgrade
Part of Formal Techniques for Distributed Objects, Components, and Systems, p. 19-35, 2016
Racer: TSO Consistency via Race Detection
Part of 2016 49Th Annual IEEE/ACM International Symposium On Microarchitecture (MICRO), 2016
Techniques for modulating error resilience in emerging multi-value technologies
Part of Proc. 13th International Conference on Computing Frontiers, p. 55-63, 2016
- DOI for Techniques for modulating error resilience in emerging multi-value technologies
- Download full text (pdf) of Techniques for modulating error resilience in emerging multi-value technologies
Effects of Granularity/Adaptivity on Private/Shared Classification for Coherence
2015
The Load Slice Core Microarchitecture
Part of 2015 ACM/IEEE 42Nd Annual International Symposium On Computer Architecture (ISCA), p. 272-284, 2015
Callback: Efficient Synchronization without Invalidation with a Directory Just for Spin-Waiting
Part of 2015 ACM/IEEE 42Nd Annual International Symposium On Computer Architecture (ISCA), p. 427-438, 2015
Full speed ahead: Detailed architectural simulation at near-native speed
Part of Proc. 18th International Symposium on Workload Characterization, p. 183-192, 2015
Part of Proc. 21st International Symposium on High Performance Computer Architecture, p. 186-197, 2015
An efficient, self-contained, on-chip directory: DIR1-SISD
Part of Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, p. 317-330, 2015
A tunable cache for approximate computing
Part of Proc. 10th International Symposium on Nanoscale Architectures, p. 88-89, 2014
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling
Part of Proc. 12th International Symposium on Code Generation and Optimization, p. 262-272, 2014
The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence
2014
Introducing DVFS-Management in a Full-System Simulator
Part of Proc. 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2013
Towards more efficient execution: a decoupled access-execute approach
Part of Proc. 27th ACM International Conference on Supercomputing, p. 253-262, 2013
- DOI for Towards more efficient execution: a decoupled access-execute approach
- Download full text (pdf) of Towards more efficient execution: a decoupled access-execute approach
A New Perspective for Efficient Virtual-Cache Coherence
Part of Proceedings of the 40th Annual International Symposium on Computer Architecture, p. 535-546, 2013
Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models
Part of PARMA 2013, 4th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2013
Complexity-effective multicore coherence
Part of Proc. 21st International Conference on Parallel Architectures and Compilation Techniques, p. 241-251, 2012
Power-Sleuth: A Tool for Investigating your Program's Power Behavior
Part of International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'12), p. 241-250, 2012
Efficient, snoopless, System-on-Chip coherence
Part of SOC Conference (SOCC), 2012 IEEE International, p. 230-235, 2012
Green governors: A framework for continuously adaptive DVFS
Part of Proc. International Green Computing Conference and Workshops, p. 1-8, 2011
Power-performance adaptation in Intel core i7
Part of Proc. 2nd Workshop on Computer Architecture and Operating System co-design, p. 10, 2011
Power Token Balancing: Adapting CMPs to power constraints for parallel multithreaded workloads
Part of Proc. 25th International Parallel and Distributed Processing Symposium, p. 431-442, 2011
Parallelizing multicore cache simulations on GPUs
Part of Proc. 3rd Swedish Workshop on Multi-Core Computing, p. 3-8, 2010
Interval-based models for run-time DVFS orchestration in superscalar processors
Part of Proc. 7th International Conference on Computing Frontiers, p. 287-296, 2010
Where replacement algorithms fail: a thorough analysis
Part of Proc. 7th International Conference on Computing Frontiers, p. 141-150, 2010
MLP-aware instruction queue resizing: The key to power-efficient performance
Part of Architecture of Computing Systems – ARCS 2010, p. 113-125, 2010