Stefanos Kaxiras
Professor at Department of Information Technology; Division of Computer Systems
- Telephone:
- +46 18 471 29 74
- Mobile phone:
- +46 70 425 03 94
- E-mail:
- stefanos.kaxiras@it.uu.se
- Visiting address:
- Hus 10, Lägerhyddsvägen 1
- Postal address:
- Box 337
751 05 UPPSALA
More information is available to staff who log in.
Short presentation
IEEE Fellow, for contributions to high-performance and power-efficient memory hierarchies.
I am currently developing novel techniques and approaches in several computer architecture areas: non-speculative architectures to reduce the reliance on speculation while maintaining performance benefits; security at the architectural level; memory systems and memory hierarchies for novel computing paradigms.
Biography
Stefanos Kaxiras, IEEE Fellow, is Professor at Uppsala University, Sweden. He holds a PhD degree in Computer Science from the University of Wisconsin. In 1998, he joined the Computing Sciences Center at Bell Labs (Lucent) and later Agere Systems. In 2003 he joined the faculty of the ECE Department of the University of Patras, Greece and in 2010 became a full professor at Uppsala University, Sweden. Kaxiras’ research interests are in the areas of memory systems, and multiprocessor/multicore systems, with a focus on power efficiency. He has co-authored more than 90 research papers and 18 US patents, received three Swedish VR grants (main PI of a VR-Frame grant), participated in six major European research projects, and currently receives funding from Sweden’s business incubator and innovation agency VINNOVA. He is Fellow of the IEEE (for contributions to high-performance and power-efficient memory hierarchies) and ACM Distinguished Scientist.
Research
IEEE Fellow (2021)
ACM Distinguished Scientist (2009)
Research Interests & Contributions: Memory Systems (Highly-Scalable Cache Coherence, VIPS & Racer, Cache Management using Reuse Distances), Power (Cache Decay), Instruction-based prediction, Network processors (IPStash IP-Lookup memories), Memory/Processor Integration (Datascalar/Distributed Vector Architectures)
My most cited contribution, with Margaret Martonosi, is Cache Decay.
It is the most cited paper (by a wide margin) of ISCA 2001 (755 citations as of Apr. 2016):
I am currently working on VIPS coherence (12 papers in the period 2012-2016) with Alberto Ros and on Decoupled Access-Execute with Alexandra Jimborean. We have expanded into software distributed shared memory for HPC and Big Data with Kostis Sagonas.
Startups
Eta Scale manages distribution and dissemination of our research results: VIPS, ArgoDSM, and the DAE (Decoupled Access-Execute) compiler tools (Daedal).
Publications
Google scholar (sorted by citations)
dblp (sorted by year--partial list)
Researchgate (articles with text)
Linkedin (professional network and other info)
Recent papers (2015-2016)
2017
Check here for my 2017 papers (2 CGO papers accepted)
2016
1. M. F. Gonzalez-Zalba, F. Remacle, R.D. Levine, S. Rogge, S. Kaxiras, M. Sanquer, "Single Electron Devices and Circuits." ICT-Energy Letters, 2016.
2. Alberto Ros and Stefanos Kaxiras, "Racer: TSO Consistency via Race Detection." To appear: MICRO, 2016.
3. Alberto Ros, Carl Leonardsson, Chris Sakalis, and Stefanos Kaxiras, "POSTER: Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics." To appear: PACT, 2016.
4. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Stefanos Kaxiras, Carl Leonardsson, Alberto Ros and Yunyun Zhu, "Fencing Programs with Self-Invalidation and Self-Downgrade." 11th International Federated Conference on Distributed Computing Techniques, FORTE, 2016. '''Best Paper Award.'''
5. Magnus Själander, Gustaf Borgström, Stefanos Kaxiras, Mykhailo V. Klymenko and Françoise Remacle, "Techniques for Modulating Error Resilience in Emerging Multi-Value Technologies." ACM International Conference on Computing Frontiers, 2016.
6. Christos Sakalis, Alberto Ros, Carl Leonardsson, Stefanos Kaxiras, "Splash-3: A Properly Synchronized Benchmark Suite for Contemporary Research." In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2016. '''Open Source Software.'''
7. Konstantinos Koukos, Per Ekemark, Georgios Zacharopoulos, Vasileios Spiliopoulos, Stefanos Kaxiras, Alexandra Jimborean, "Multiversioned decoupled access-execute: the key to energy-efficient compilation of general-purpose programs." International Symposium on Compiler Construction, 2016. '''Best Paper Award.'''
8. T. Voigt, M. Själander, Frederik Hermans, Alexandra Jimborean, Erik Hagersten, Per Gunningberg, and Stefanos Kaxiras, "Poster: Approximation: A New Paradigm also for Wireless Sensing." Proceedings of the International Conference on Embedded Wireless Systems and Networks (EWSN), Graz, Austria, 15-17 Feb. 2016.
9. Jonatan Waern, Per Ekemark, Konstantinos Koukos, Stefanos Kaxiras and Alexandra Jimborean, "Profiling-Assisted Decoupled Access-Execute." HIP3ES: High Performance Energy Efficient Embedded Systems, 2016.
10. M. Själander, G. Borgström, and S. Kaxiras, "Improving Error-Resilience of Emerging Multi-Value Technologies." Workshop On Approximate Computing (WAPCO), 20 Jan. 2016.
11. Konstantinos Koukos, Alberto Ros, Erik Hagersten, Stefanos Kaxiras, "Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead." ACM Transactions on Architecture and Code Optimization (TACO), 2016. 13(1):1-22. DOI: 10.1145/2889488
2015
1. Mahdad Davari, A. Ros, E. Hagersten, S. Kaxiras, "An Efficient, Self-Contained, On-Chip Directory: DIR1-SISD." In IEEE Computer Society Parallel Architectures and Compilation Techniques (PACT), (pp. 317-330), 2015.
2. S. Kaxiras, D. Klaftenegger, M. Norgren, K. Sagonas, "Turning Centralized Coherence and Distributed Critical-Section Execution on their Head: A New Approach for Scalable Distributed Shared Memory." 24th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC-24), 2015. '''Nominated for Best Paper Award (1 out of 4 top papers).''' '''Open Source Software.'''
3. T. Carlson, S. Kaxiras, W. Heirman, L. Eeckhout, "The Load-Slice Core Microarchitecture." 42th International Symposium on Computer Architecture (ISCA-42) 2015.
4. A. Ros, S. Kaxiras, "Callback: Efficient Synchronization without Invalidation with a Directory Just for Spin-Waiting." 42th International Symposium on Computer Architecture (ISCA-42) 2015.
5. A. Ros, M. Davari, S. Kaxiras, "Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies." IEEE 21st High Performance Computer Architecture (HPCA-21), 2015.
6. A. Ros, S. Kaxiras, "Fast&Furious: A Tool for Detecting Covert Racing" PARMA-DITAM '15 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 4th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2015.
7. Andreas Sandberg, Nikos Nikoleris, Trevor E. Carlson, Erik Hagersten, Stefanos Kaxiras, David Black-Schaffer "Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed." IEEE International Symposium on Workload Characterization IISWC, 2015: 183-192.
8. Mahdad Davari, Alberto Ros, Erik Hagersten, Stefanos Kaxiras, "The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence." ACM Transactions on Architecture and Code Optimization (TACO), 2015.
== Book: ==
NEW! Our new book is out: "Power-Efficient Computer Architectures: Recent Advances" Paperback, Morgan and Claypool Publishers, January 1, 2015
by Magnus Sjalander, Margaret Martonosi, Stefanos Kaxiras.
Computer Architecture Techniques for Power-Efficiency
Stefanos Kaxiras and Margaret Martonosi
Synthesis Lectures on Computer Architecture
* Mark D. Hill Series Editor
* Paperback: 220 pages
* Publisher: Morgan and Claypool Publishers; 1 edition (June 13, 2008)
* ISBN-10: 1598292080 ISBN-13: 978-1598292084
PhD Students
Mahdad Davari: Multicore Coherence
Mehdi Alipur: Efficient cores
Magnus Norgren: Software Coherence
Co-advising:
Nikos Nikoleris: Cache modeling, fast simulation
Ricardo Alves: Cache management
David Klaftenegger: Efficient synchronization
Graduated
Vasileios Spiliopoulos: Power, DVFS modeling, Power Tools, Cache management for power
Konstantinos Koukos: Decoupled Access Execute, GPU Coherence
Georgios Keramidas (TEI Messolonghi, Greece)
Pavlos Petoumenos (Research Associate, University of Edinburgh)
Publications
Selection of publications
- Fencing programs with self-invalidation and self-downgrade (2016)
- Multiversioned decoupled access-execute (2016)
- Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead (2016)
- Splash-3 (2016)
- Techniques for modulating error resilience in emerging multi-value technologies (2016)
- A unified DVFS-cache resizing framework (2016)
- Approximation: A New Paradigm also for Wireless Sensing (2016)
- Profiling-Assisted Decoupled Access-Execute (2016)
- An efficient, self-contained, on-chip directory (2015)
- The effects of granularity and adaptivity on private/shared classification for coherence (2015)
- Hierarchical private/shared classification (2015)
- Full speed ahead (2015)
- Managing power constraints in a single-core scenario through power tokens (2014)
- Fix the code. Don't tweak the hardware (2014)
- Power-Efficient Computer Architectures (2014)
- A tunable cache for approximate computing (2014)
- Efficient inter-core power and thermal balancing for multicore processors (2013)
- A New Perspective for Efficient Virtual-Cache Coherence (2013)
- Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models (2013)
- Towards more efficient execution (2013)
- Introducing DVFS-Management in a Full-System Simulator (2013)
- Efficient, snoopless, System-on-Chip coherence (2012)
- Complexity-effective multicore coherence (2012)
- Power-Sleuth (2012)
- Leakage-efficient design of value predictors through state and non-state preserving techniques (2011)
- Power Token Balancing (2011)
- Green governors (2011)
- Power-performance adaptation in Intel core i7 (2011)
- SARC coherence (2010)
- Where replacement algorithms fail (2010)
- Interval-based models for run-time DVFS orchestration in superscalar processors (2010)
- MLP-aware instruction queue resizing (2010)
- Parallelizing multicore cache simulations on GPUs (2010)
- Improving Error-Resilience of Emerging Multi-Value Technologies
Recent publications
- ReCon (2023)
- How addresses are made (2023)
- Speculative inter-thread store-to-load forwarding in SMT architectures (2023)
- Doppelganger Loads (2023)
- Silent Stores in the Battery-less Internet of Things: A Good Idea? (2023)
All publications
Articles
- Speculative inter-thread store-to-load forwarding in SMT architectures (2023)
- Delay-on-Squash (2022)
- Analysing software prefetching opportunities in hardware transactional memory (2022)
- Reorder Buffer Contention (2021)
- Early Address Prediction (2021)
- Evaluating the Potential Applications of Quaternary Logic for Approximate Computing (2020)
- Understanding Selective Delay as a Method for Efficient Secure Speculative Execution (2020)
- Maximizing limited resources (2019)
- Mending fences with self-invalidation and self-downgrade (2018)
- Automatic Detection of Large Extended Data-Race-Free Regions with Conflict Isolation (2018)
- Non-Speculative Load Reordering in Total Store Ordering (2018)
- Static instruction scheduling for high performance on limited hardware (2018)
- Transcending hardware limits with software out-of-order processing (2017)
- Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics (2017)
- Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead (2016)
- The effects of granularity and adaptivity on private/shared classification for coherence (2015)
- Managing power constraints in a single-core scenario through power tokens (2014)
- Efficient inter-core power and thermal balancing for multicore processors (2013)
- Leakage-efficient design of value predictors through state and non-state preserving techniques (2011)
- SARC coherence (2010)
- Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks
- Improving Error-Resilience of Emerging Multi-Value Technologies
- Clearing the Shadows: Recovering Lost Performance for Invisible Speculative Execution through HW/SW Co-Design
Books
Conferences
- ReCon (2023)
- How addresses are made (2023)
- Doppelganger Loads (2023)
- Silent Stores in the Battery-less Internet of Things: A Good Idea? (2023)
- Data-Out Instruction-In (DOIN!) (2022)
- Free Atomics (2022)
- Clueless (2022)
- Splash-4 (2022)
- TSOPER (2021)
- ITSLF (2021)
- Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations (2021)
- Splash-4 (2021)
- Do Not Predict – Recompute! (2021)
- Seeds of SEED (2021)
- Delay and Bypass (2020)
- Efficient temporal and spatial load to load forwarding (2020)
- Boosting Store Buffer Efficiency with Store-Prefetch Bursts (2020)
- Speculative Enforcement of Store Atomicity (2020)
- Clearing the Shadows (2020)
- FIFOrder MicroArchitecture (2019)
- Filter caching for free (2019)
- Ghost Loads (2019)
- Efficient invisible speculative execution through selective delay and value prediction (2019)
- Dynamically Disabling Way-prediction to Reduce Instruction Replay (2018)
- Non-Speculative Store Coalescing in Total Store Order (2018)
- The Superfluous Load Queue (2018)
- SWOOP (2018)
- A Taxonomy of Out-of-Order Instruction Commit (2017)
- Exploring the performance limits of out-of-order commit (2017)
- Addressing energy challenges in filter caches (2017)
- Automatic detection of extended data-race-free regions (2017)
- Non-speculative load-load reordering in TSO (2017)
- Clairvoyance (2017)
- Decoupled Access-Execute on ARM big.LITTLE (2017)
- Fencing programs with self-invalidation and self-downgrade (2016)
- Multiversioned decoupled access-execute (2016)
- Racer (2016)
- Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics (2016)
- Splash-3 (2016)
- Techniques for modulating error resilience in emerging multi-value technologies (2016)
- Approximation: A New Paradigm also for Wireless Sensing (2016)
- Profiling-Assisted Decoupled Access-Execute (2016)
- The Load Slice Core Microarchitecture (2015)
- An efficient, self-contained, on-chip directory (2015)
- Effects of Granularity/Adaptivity on Private/Shared Classification for Coherence (2015)
- Hierarchical private/shared classification (2015)
- Callback (2015)
- Full speed ahead (2015)
- The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence (2014)
- Fix the code. Don't tweak the hardware (2014)
- A tunable cache for approximate computing (2014)
- A New Perspective for Efficient Virtual-Cache Coherence (2013)
- Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models (2013)
- Towards more efficient execution (2013)
- Introducing DVFS-Management in a Full-System Simulator (2013)
- Efficient, snoopless, System-on-Chip coherence (2012)
- Complexity-effective multicore coherence (2012)
- Power-Sleuth (2012)
- Power Token Balancing (2011)
- Green governors (2011)
- Power-performance adaptation in Intel core i7 (2011)
- Where replacement algorithms fail (2010)
- Interval-based models for run-time DVFS orchestration in superscalar processors (2010)
- MLP-aware instruction queue resizing (2010)
- Parallelizing multicore cache simulations on GPUs (2010)