Stefanos Kaxiras
Professor i datorteknik med inriktning mot datorarkitektur vid Institutionen för informationsteknologi; Datorteknik
- Telefon:
- 018-471 29 74
- Mobiltelefon:
- 070-425 03 94
- E-post:
- stefanos.kaxiras@it.uu.se
- Besöksadress:
- Hus 10, Regementsvägen 10
- Postadress:
- Box 337
751 05 UPPSALA
Biografi
Stefanos Kaxiras is a full professor at Uppsala University, Sweden. He holds a PhD degree in Computer Science from the University of Wisconsin. In 1998, he joined the Computing Sciences Center at Bell Labs (Lucent) and later Agere Systems. In 2003 he joined the faculty of the ECE Department of the University of Patras, Greece and in 2010 became a full professor at Uppsala University, Sweden. Kaxiras’ research interests are in the areas of memory systems, and multiprocessor/multicore systems, with a focus on power efficiency. He has co-authored more than 90 research papers and 18 US patents, received two Swedish VR grants (main PI of a VR-Frame grant), participated in five major European research projects, and currently receives funding from Sweden’s business incubator and innovation agency VINNOVA. He is a Distinguished ACM Scientist and IEEE member.
Forskning
ACM Distinguished Scientist (2009)
Research Interests & Contributions: Memory Systems (Highly-Scalable Cache Coherence, VIPS & Racer, Cache Management using Reuse Distances), Power (Cache Decay), Instruction-based prediction, Network processors (IPStash IP-Lookup memories), Memory/Processor Integration (Datascalar/Distributed Vector Architectures)
My most cited contribution, with Margaret Martonosi, is Cache Decay.
It is the most cited paper (by a wide margin) of ISCA 2001 (755 citations as of Apr. 2016):
I am currently working on VIPS coherence (12 papers in the period 2012-2016) with Alberto Ros and on Decoupled Access-Execute with Alexandra Jimborean. We have expanded into software distributed shared memory for HPC and Big Data with Kostis Sagonas.
Startups
Eta Scale manages distribution and dissemination of our research results: VIPS, ArgoDSM, and the DAE (Decoupled Access-Execute) compiler tools (Daedal).
Publications
Google scholar (sorted by citations)
dblp (sorted by year--partial list)
Researchgate (articles with text)
Linkedin (professional network and other info)
Recent papers (2015-2016)
2017
Check here for my 2017 papers (2 CGO papers accepted)
2016
1. M. F. Gonzalez-Zalba, F. Remacle, R.D. Levine, S. Rogge, S. Kaxiras, M. Sanquer, "Single Electron Devices and Circuits." ICT-Energy Letters, 2016.
2. Alberto Ros and Stefanos Kaxiras, "Racer: TSO Consistency via Race Detection." To appear: MICRO, 2016.
3. Alberto Ros, Carl Leonardsson, Chris Sakalis, and Stefanos Kaxiras, "POSTER: Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics." To appear: PACT, 2016.
4. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Stefanos Kaxiras, Carl Leonardsson, Alberto Ros and Yunyun Zhu, "Fencing Programs with Self-Invalidation and Self-Downgrade." 11th International Federated Conference on Distributed Computing Techniques, FORTE, 2016. '''Best Paper Award.'''
5. Magnus Själander, Gustaf Borgström, Stefanos Kaxiras, Mykhailo V. Klymenko and Françoise Remacle, "Techniques for Modulating Error Resilience in Emerging Multi-Value Technologies." ACM International Conference on Computing Frontiers, 2016.
6. Christos Sakalis, Alberto Ros, Carl Leonardsson, Stefanos Kaxiras, "Splash-3: A Properly Synchronized Benchmark Suite for Contemporary Research." In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2016. '''Open Source Software.'''
7. Konstantinos Koukos, Per Ekemark, Georgios Zacharopoulos, Vasileios Spiliopoulos, Stefanos Kaxiras, Alexandra Jimborean, "Multiversioned decoupled access-execute: the key to energy-efficient compilation of general-purpose programs." International Symposium on Compiler Construction, 2016. '''Best Paper Award.'''
8. T. Voigt, M. Själander, Frederik Hermans, Alexandra Jimborean, Erik Hagersten, Per Gunningberg, and Stefanos Kaxiras, "Poster: Approximation: A New Paradigm also for Wireless Sensing." Proceedings of the International Conference on Embedded Wireless Systems and Networks (EWSN), Graz, Austria, 15-17 Feb. 2016.
9. Jonatan Waern, Per Ekemark, Konstantinos Koukos, Stefanos Kaxiras and Alexandra Jimborean, "Profiling-Assisted Decoupled Access-Execute." HIP3ES: High Performance Energy Efficient Embedded Systems, 2016.
10. M. Själander, G. Borgström, and S. Kaxiras, "Improving Error-Resilience of Emerging Multi-Value Technologies." Workshop On Approximate Computing (WAPCO), 20 Jan. 2016.
11. Konstantinos Koukos, Alberto Ros, Erik Hagersten, Stefanos Kaxiras, "Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead." ACM Transactions on Architecture and Code Optimization (TACO), 2016. 13(1):1-22. DOI: 10.1145/2889488
2015
1. Mahdad Davari, A. Ros, E. Hagersten, S. Kaxiras, "An Efficient, Self-Contained, On-Chip Directory: DIR1-SISD." In IEEE Computer Society Parallel Architectures and Compilation Techniques (PACT), (pp. 317-330), 2015.
2. S. Kaxiras, D. Klaftenegger, M. Norgren, K. Sagonas, "Turning Centralized Coherence and Distributed Critical-Section Execution on their Head: A New Approach for Scalable Distributed Shared Memory." 24th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC-24), 2015. '''Nominated for Best Paper Award (1 out of 4 top papers).''' '''Open Source Software.'''
3. T. Carlson, S. Kaxiras, W. Heirman, L. Eeckhout, "The Load-Slice Core Microarchitecture." 42th International Symposium on Computer Architecture (ISCA-42) 2015.
4. A. Ros, S. Kaxiras, "Callback: Efficient Synchronization without Invalidation with a Directory Just for Spin-Waiting." 42th International Symposium on Computer Architecture (ISCA-42) 2015.
5. A. Ros, M. Davari, S. Kaxiras, "Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies." IEEE 21st High Performance Computer Architecture (HPCA-21), 2015.
6. A. Ros, S. Kaxiras, "Fast&Furious: A Tool for Detecting Covert Racing" PARMA-DITAM '15 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 4th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2015.
7. Andreas Sandberg, Nikos Nikoleris, Trevor E. Carlson, Erik Hagersten, Stefanos Kaxiras, David Black-Schaffer "Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed." IEEE International Symposium on Workload Characterization IISWC, 2015: 183-192.
8. Mahdad Davari, Alberto Ros, Erik Hagersten, Stefanos Kaxiras, "The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence." ACM Transactions on Architecture and Code Optimization (TACO), 2015.
== Book: ==
NEW! Our new book is out: "Power-Efficient Computer Architectures: Recent Advances" Paperback, Morgan and Claypool Publishers, January 1, 2015 by Magnus Sjalander, Margaret Martonosi, Stefanos Kaxiras.
Computer Architecture Techniques for Power-Efficiency
Stefanos Kaxiras and Margaret Martonosi
Synthesis Lectures on Computer Architecture
- Mark D. Hill Series Editor
- Paperback: 220 pages
- Publisher: Morgan and Claypool Publishers; 1 edition (June 13, 2008)
- ISBN-10: 1598292080 ISBN-13: 978-1598292084
PhD Students
- Mahdad Davari: Multicore Coherence
- Mehdi Alipur: Efficient cores
- Magnus Norgren: Software Coherence
Co-advising:
- Nikos Nikoleris: Cache modeling, fast simulation
- Ricardo Alves: Cache management
- David Klaftenegger: Efficient synchronization
Graduated
- Vasileios Spiliopoulos: Power, DVFS modeling, Power Tools, Cache management for power
- Konstantinos Koukos: Decoupled Access Execute, GPU Coherence
- Georgios Keramidas (TEI Messolonghi, Greece)
- Pavlos Petoumenos (Research Associate, University of Edinburgh)

Publikationer
Urval av publikationer
A unified DVFS-cache resizing framework
2016
Ingår i Proc. 25th International Conference on Compiler Construction, s. 121-131, 2016
- DOI för Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
- Ladda ner fulltext (pdf) av Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
Profiling-Assisted Decoupled Access-Execute
Ingår i Proc. 4th International Workshop on High Performance Energy Efficient Embedded Systems, 2016
Approximation: A New Paradigm also for Wireless Sensing
2016
Splash-3: A properly synchronized benchmark suite for contemporary research
Ingår i Proc. International Symposium on Performance Analysis of Systems and Software, s. 101-111, 2016
Fencing programs with self-invalidation and self-downgrade
Ingår i Formal Techniques for Distributed Objects, Components, and Systems, s. 19-35, 2016
Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2016
- DOI för Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
- Ladda ner fulltext (pdf) av Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
Techniques for modulating error resilience in emerging multi-value technologies
Ingår i Proc. 13th International Conference on Computing Frontiers, s. 55-63, 2016
- DOI för Techniques for modulating error resilience in emerging multi-value technologies
- Ladda ner fulltext (pdf) av Techniques for modulating error resilience in emerging multi-value technologies
Full speed ahead: Detailed architectural simulation at near-native speed
Ingår i Proc. 18th International Symposium on Workload Characterization, s. 183-192, 2015
The effects of granularity and adaptivity on private/shared classification for coherence
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2015
Ingår i Proc. 21st International Symposium on High Performance Computer Architecture, s. 186-197, 2015
An efficient, self-contained, on-chip directory: DIR1-SISD
Ingår i Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, s. 317-330, 2015
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling
Ingår i Proc. 12th International Symposium on Code Generation and Optimization, s. 262-272, 2014
Managing power constraints in a single-core scenario through power tokens
Ingår i Journal of Supercomputing, s. 414-442, 2014
A tunable cache for approximate computing
Ingår i Proc. 10th International Symposium on Nanoscale Architectures, s. 88-89, 2014
Power-Efficient Computer Architectures: Recent Advances
Morgan & Claypool Publishers, 2014
Efficient inter-core power and thermal balancing for multicore processors
Ingår i Computing, s. 537-566, 2013
Introducing DVFS-Management in a Full-System Simulator
Ingår i Proc. 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2013
Towards more efficient execution: a decoupled access-execute approach
Ingår i Proc. 27th ACM International Conference on Supercomputing, s. 253-262, 2013
- DOI för Towards more efficient execution: a decoupled access-execute approach
- Ladda ner fulltext (pdf) av Towards more efficient execution: a decoupled access-execute approach
A New Perspective for Efficient Virtual-Cache Coherence
Ingår i Proceedings of the 40th Annual International Symposium on Computer Architecture, s. 535-546, 2013
Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models
Ingår i PARMA 2013, 4th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2013
Power-Sleuth: A Tool for Investigating your Program's Power Behavior
Ingår i International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'12), s. 241-250, 2012
Complexity-effective multicore coherence
Ingår i Proc. 21st International Conference on Parallel Architectures and Compilation Techniques, s. 241-251, 2012
Efficient, snoopless, System-on-Chip coherence
Ingår i SOC Conference (SOCC), 2012 IEEE International, s. 230-235, 2012
Green governors: A framework for continuously adaptive DVFS
Ingår i Proc. International Green Computing Conference and Workshops, s. 1-8, 2011
Power-performance adaptation in Intel core i7
Ingår i Proc. 2nd Workshop on Computer Architecture and Operating System co-design, s. 10, 2011
Power Token Balancing: Adapting CMPs to power constraints for parallel multithreaded workloads
Ingår i Proc. 25th International Parallel and Distributed Processing Symposium, s. 431-442, 2011
Leakage-efficient design of value predictors through state and non-state preserving techniques
Ingår i Journal of Supercomputing, s. 28-50, 2011
SARC coherence: Scaling directory cache coherence in performance and power
Ingår i IEEE Micro, s. 54-65, 2010
Parallelizing multicore cache simulations on GPUs
Ingår i Proc. 3rd Swedish Workshop on Multi-Core Computing, s. 3-8, 2010
Interval-based models for run-time DVFS orchestration in superscalar processors
Ingår i Proc. 7th International Conference on Computing Frontiers, s. 287-296, 2010
Where replacement algorithms fail: a thorough analysis
Ingår i Proc. 7th International Conference on Computing Frontiers, s. 141-150, 2010
MLP-aware instruction queue resizing: The key to power-efficient performance
Ingår i Architecture of Computing Systems – ARCS 2010, s. 113-125, 2010
Improving Error-Resilience of Emerging Multi-Value Technologies
Senaste publikationer
TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
Ingår i SenSys '24, s. 409-421, 2024
- DOI för TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
- Ladda ner fulltext (pdf) av TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
Hardware Cache Locking for All Memory Updates
Ingår i 2024 IEEE 42nd International Conference on Computer Design (ICCD), s. 566-574, 2024
JANUS: A Simple and Efficient Speculative Defense using Reinforcement Learning
Ingår i 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), s. 25-36, 2024
A First Exploration of Fine-Grain Coherence for Integrity Metadata
Ingår i 2024 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN, SEED 2024, s. 62-72, 2024
ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
Ingår i 56th IEEE/ACM International Symposium on Microarchitecture, MICRO 2023, s. 828-842, 2023
- DOI för ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
- Ladda ner fulltext (pdf) av ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
Alla publikationer
Artiklar i tidskrift
Speculative inter-thread store-to-load forwarding in SMT architectures
Ingår i Journal of Parallel and Distributed Computing, s. 94-106, 2023
Analysing software prefetching opportunities in hardware transactional memory
Ingår i Journal of Supercomputing, s. 919-944, 2022
Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2022
- DOI för Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks
- Ladda ner fulltext (pdf) av Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks
Ingår i IEEE COMPUTER ARCHITECTURE LETTERS, s. 162-165, 2021
Early Address Prediction: Efficient Pipeline Prefetch and Reuse
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2021
- DOI för Early Address Prediction: Efficient Pipeline Prefetch and Reuse
- Ladda ner fulltext (pdf) av Early Address Prediction: Efficient Pipeline Prefetch and Reuse
Evaluating the Potential Applications of Quaternary Logic for Approximate Computing
Ingår i ACM Journal on Emerging Technologies in Computing Systems, 2020
Understanding Selective Delay as a Method for Efficient Secure Speculative Execution
Ingår i IEEE Transactions on Computers, s. 1584-1595, 2020
Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
Ingår i Journal of Signal Processing Systems, s. 379-397, 2019
- DOI för Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
- Ladda ner fulltext (pdf) av Maximizing limited resources: A limit-based study and taxonomy of out-of-order commit
Mending fences with self-invalidation and self-downgrade
Ingår i Logical Methods in Computer Science, 2018
Automatic Detection of Large Extended Data-Race-Free Regions with Conflict Isolation
Ingår i IEEE Transactions on Parallel and Distributed Systems, s. 527-541, 2018
Static instruction scheduling for high performance on limited hardware
Ingår i IEEE Transactions on Computers, s. 513-527, 2018
Non-Speculative Load Reordering in Total Store Ordering
Ingår i IEEE Micro, s. 48-57, 2018
Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics
Ingår i IEEE Transactions on Parallel and Distributed Systems, s. 3413-3425, 2017
Transcending hardware limits with software out-of-order processing
Ingår i IEEE Computer Architecture Letters, s. 162-165, 2017
Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2016
- DOI för Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
- Ladda ner fulltext (pdf) av Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
The effects of granularity and adaptivity on private/shared classification for coherence
Ingår i ACM Transactions on Architecture and Code Optimization (TACO), 2015
Managing power constraints in a single-core scenario through power tokens
Ingår i Journal of Supercomputing, s. 414-442, 2014
Efficient inter-core power and thermal balancing for multicore processors
Ingår i Computing, s. 537-566, 2013
Leakage-efficient design of value predictors through state and non-state preserving techniques
Ingår i Journal of Supercomputing, s. 28-50, 2011
SARC coherence: Scaling directory cache coherence in performance and power
Ingår i IEEE Micro, s. 54-65, 2010
Böcker
Power-Efficient Computer Architectures: Recent Advances
Morgan & Claypool Publishers, 2014
Konferensbidrag
TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
Ingår i SenSys '24, s. 409-421, 2024
- DOI för TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
- Ladda ner fulltext (pdf) av TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
Hardware Cache Locking for All Memory Updates
Ingår i 2024 IEEE 42nd International Conference on Computer Design (ICCD), s. 566-574, 2024
JANUS: A Simple and Efficient Speculative Defense using Reinforcement Learning
Ingår i 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), s. 25-36, 2024
A First Exploration of Fine-Grain Coherence for Integrity Metadata
Ingår i 2024 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN, SEED 2024, s. 62-72, 2024
ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
Ingår i 56th IEEE/ACM International Symposium on Microarchitecture, MICRO 2023, s. 828-842, 2023
- DOI för ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
- Ladda ner fulltext (pdf) av ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes
Ingår i ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
- DOI för Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes
- Ladda ner fulltext (pdf) av Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes
Ingår i 2023 IEEE International ymposium on Workload Characterization, IISWC, s. 223-225, 2023
Silent Stores in the Battery-less Internet of Things: A Good Idea?
2023
Splash-4: A Modern Benchmark Suite with Lock-Free Constructs
Ingår i 2022 IEEE International Symposium on Workload Characterization (IISWC), s. 51-64, 2022
Clueless: A Tool Characterising Values Leaking as Addresses
Ingår i Proceedings of the 11th International Workshop on Hardware and Architectural Support for Security And Privacy, HASP 2022, s. 27-34, 2022
- DOI för Clueless: A Tool Characterising Values Leaking as Addresses
- Ladda ner fulltext (pdf) av Clueless: A Tool Characterising Values Leaking as Addresses
Free Atomics: Hardware Atomic Operations without Fences
Ingår i PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), s. 14-26, 2022
Data-Out Instruction-In (DOIN!): Leveraging Inclusive Caches to Attack Speculative Delay Schemes
Ingår i 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED 2022), s. 49-60, 2022
Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations
Ingår i Proceedings of 54th Annual IEEE/ACM International Symposium on Microarchitecture, Micro 2021, s. 337-349, 2021
ITSLF: Inter-Thread Store-to-Load Forwarding in Simultaneous Multithreading
Ingår i Proceedings of 54th Annual IEEE/ACM International Symposium on Microarchitecture, Micro 2021, s. 1296-1308, 2021
TSOPER: Efficient Coherence-Based Strict Persistency
Ingår i 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), s. 125-138, 2021
Ingår i 2021 International Symposium on Secure and Private Execution Environment Design (SEED), s. 89-100, 2021
Ingår i 2021 International Symposium on Secure and Private Execution Environment Design (SEED), s. 101-107, 2021
Splash-4: Improving Scalability with Lock-Free Constructs
Ingår i 2021 IEEE International Symposium On Performance Analysis Of Systems And Software (ISPASS 2021), s. 235-236, 2021
Speculative Enforcement of Store Atomicity
Ingår i 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), s. 555-567, 2020
- DOI för Speculative Enforcement of Store Atomicity
- Ladda ner fulltext (pdf) av Speculative Enforcement of Store Atomicity
Boosting Store Buffer Efficiency with Store-Prefetch Bursts
Ingår i 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), s. 568-580, 2020
- DOI för Boosting Store Buffer Efficiency with Store-Prefetch Bursts
- Ladda ner fulltext (pdf) av Boosting Store Buffer Efficiency with Store-Prefetch Bursts
Ingår i PACT ’20, s. 241-254, 2020
Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors
Ingår i 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), s. 424-434, 2020
Efficient temporal and spatial load to load forwarding
Ingår i Proc. 26th International Symposium on High-Performance and Computer Architecture, 2020
Ghost Loads: What is the cost of invisible speculation?
Ingår i Proceedings of the 16th ACM International Conference on Computing Frontiers, s. 153-163, 2019
- DOI för Ghost Loads: What is the cost of invisible speculation?
- Ladda ner fulltext (pdf) av Ghost Loads: What is the cost of invisible speculation?
Efficient invisible speculative execution through selective delay and value prediction
Ingår i Proc. 46th International Symposium on Computer Architecture, s. 723-735, 2019
- DOI för Efficient invisible speculative execution through selective delay and value prediction
- Ladda ner fulltext (pdf) av Efficient invisible speculative execution through selective delay and value prediction
Filter caching for free: The untapped potential of the store-buffer
Ingår i Proc. 46th International Symposium on Computer Architecture, s. 436-448, 2019
- DOI för Filter caching for free: The untapped potential of the store-buffer
- Ladda ner fulltext (pdf) av Filter caching for free: The untapped potential of the store-buffer
FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
Ingår i 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), s. 716-721, 2019
- DOI för FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
- Ladda ner fulltext (pdf) av FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors
SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores
Ingår i Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, s. 328-343, 2018
- DOI för SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores
- Ladda ner fulltext (pdf) av SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores
Dynamically Disabling Way-prediction to Reduce Instruction Replay
Ingår i 2018 IEEE 36th International Conference on Computer Design (ICCD), s. 140-143, 2018
Non-Speculative Store Coalescing in Total Store Order
Ingår i Proc.45th International Symposium on Computer Architecture, s. 221-234, 2018
- DOI för Non-Speculative Store Coalescing in Total Store Order
- Ladda ner fulltext (pdf) av Non-Speculative Store Coalescing in Total Store Order
Ingår i 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), s. 95-107, 2018
Exploring the performance limits of out-of-order commit
Ingår i Proc. 14th Computing Frontiers Conference, s. 211-220, 2017
Automatic detection of extended data-race-free regions
Ingår i Proc. 15th International Symposium on Code Generation and Optimization, s. 14-26, 2017
Decoupled Access-Execute on ARM big.LITTLE
Ingår i Proc. 5th Workshop on High Performance Energy Efficient Embedded Systems, 2017
A Taxonomy of Out-of-Order Instruction Commit
Ingår i 2017 Ieee International Symposium On Performance Analysis Of Systems And Software (Ispass), s. 135-136, 2017
Clairvoyance: Look-ahead compile-time scheduling
Ingår i Proc. 15th International Symposium on Code Generation and Optimization, s. 171-184, 2017
Addressing energy challenges in filter caches
Ingår i Proc. 29th International Symposium on Computer Architecture and High Performance Computing, s. 49-56, 2017
Non-speculative load-load reordering in TSO
Ingår i Proc. 44th International Symposium on Computer Architecture, s. 187-200, 2017
Ingår i Proc. 25th International Conference on Compiler Construction, s. 121-131, 2016
- DOI för Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
- Ladda ner fulltext (pdf) av Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
Profiling-Assisted Decoupled Access-Execute
Ingår i Proc. 4th International Workshop on High Performance Energy Efficient Embedded Systems, 2016
Approximation: A New Paradigm also for Wireless Sensing
2016
Splash-3: A properly synchronized benchmark suite for contemporary research
Ingår i Proc. International Symposium on Performance Analysis of Systems and Software, s. 101-111, 2016
Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics
Ingår i Proc. International Conference on Parallel Architectures and Compilation, s. 433-434, 2016
Fencing programs with self-invalidation and self-downgrade
Ingår i Formal Techniques for Distributed Objects, Components, and Systems, s. 19-35, 2016
Racer: TSO Consistency via Race Detection
Ingår i 2016 49Th Annual IEEE/ACM International Symposium On Microarchitecture (MICRO), 2016
Techniques for modulating error resilience in emerging multi-value technologies
Ingår i Proc. 13th International Conference on Computing Frontiers, s. 55-63, 2016
- DOI för Techniques for modulating error resilience in emerging multi-value technologies
- Ladda ner fulltext (pdf) av Techniques for modulating error resilience in emerging multi-value technologies
Effects of Granularity/Adaptivity on Private/Shared Classification for Coherence
2015
The Load Slice Core Microarchitecture
Ingår i 2015 ACM/IEEE 42Nd Annual International Symposium On Computer Architecture (ISCA), s. 272-284, 2015
Callback: Efficient Synchronization without Invalidation with a Directory Just for Spin-Waiting
Ingår i 2015 ACM/IEEE 42Nd Annual International Symposium On Computer Architecture (ISCA), s. 427-438, 2015
Full speed ahead: Detailed architectural simulation at near-native speed
Ingår i Proc. 18th International Symposium on Workload Characterization, s. 183-192, 2015
Ingår i Proc. 21st International Symposium on High Performance Computer Architecture, s. 186-197, 2015
An efficient, self-contained, on-chip directory: DIR1-SISD
Ingår i Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, s. 317-330, 2015
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling
Ingår i Proc. 12th International Symposium on Code Generation and Optimization, s. 262-272, 2014
A tunable cache for approximate computing
Ingår i Proc. 10th International Symposium on Nanoscale Architectures, s. 88-89, 2014
The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence
2014
Introducing DVFS-Management in a Full-System Simulator
Ingår i Proc. 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2013
Towards more efficient execution: a decoupled access-execute approach
Ingår i Proc. 27th ACM International Conference on Supercomputing, s. 253-262, 2013
- DOI för Towards more efficient execution: a decoupled access-execute approach
- Ladda ner fulltext (pdf) av Towards more efficient execution: a decoupled access-execute approach
A New Perspective for Efficient Virtual-Cache Coherence
Ingår i Proceedings of the 40th Annual International Symposium on Computer Architecture, s. 535-546, 2013
Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models
Ingår i PARMA 2013, 4th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2013
Power-Sleuth: A Tool for Investigating your Program's Power Behavior
Ingår i International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'12), s. 241-250, 2012
Complexity-effective multicore coherence
Ingår i Proc. 21st International Conference on Parallel Architectures and Compilation Techniques, s. 241-251, 2012
Efficient, snoopless, System-on-Chip coherence
Ingår i SOC Conference (SOCC), 2012 IEEE International, s. 230-235, 2012
Green governors: A framework for continuously adaptive DVFS
Ingår i Proc. International Green Computing Conference and Workshops, s. 1-8, 2011
Power-performance adaptation in Intel core i7
Ingår i Proc. 2nd Workshop on Computer Architecture and Operating System co-design, s. 10, 2011
Power Token Balancing: Adapting CMPs to power constraints for parallel multithreaded workloads
Ingår i Proc. 25th International Parallel and Distributed Processing Symposium, s. 431-442, 2011
Parallelizing multicore cache simulations on GPUs
Ingår i Proc. 3rd Swedish Workshop on Multi-Core Computing, s. 3-8, 2010
Interval-based models for run-time DVFS orchestration in superscalar processors
Ingår i Proc. 7th International Conference on Computing Frontiers, s. 287-296, 2010
Where replacement algorithms fail: a thorough analysis
Ingår i Proc. 7th International Conference on Computing Frontiers, s. 141-150, 2010
MLP-aware instruction queue resizing: The key to power-efficient performance
Ingår i Architecture of Computing Systems – ARCS 2010, s. 113-125, 2010