Eduardo José Gómez Hernández: Advancements towards non-speculative concurrent execution of critical sections
- Date: 3 June 2025, 16:00
- Location: Salón de Grados, Facultad de Informatica (Building 32), University of Murcia, Murcia (Spain)
- Type: Thesis defence
- Thesis author: Eduardo José Gómez Hernández
- External reviewer: Daniel Sorin
- Supervisors: Stefanos Kaxiras, Alberto Ros Bardisa
- Research subject: Computer Science
- DiVA
Abstract
Parallel programs require, besides the cache orchestration, another mechanism that guarantees synchronization among other threads of the same program.These synchronization mechanisms will induce overheads, by slowing down certain operations and stalling threads, among many others, to comply with the requirements established by the programmer.
The thesis's objective is the efficient execution of critical sections, that is, regions of code that must be executed atomically.The most efficient method is the concurrent and non-speculative executions of these sections.To achieve this, we present the 3 steps we have taken:1) single-atomic instructions can be used to implement non-speculative critical sections, therefore, we develop an updated version of the well-known Splash benchmark suite that uses single-address atomic instructions to implement most of the critical sections (Splash-4);2) a new set of multi-address atomic instructions, and a methodology on how to efficiently implement them, that can be used for small critical sections (MADs);3) without the direct intervention of the programmer, a more generic method that limits the retries required to execute contended critical regions (CLEAR).
For an efficient evaluation of the results, we have used the most up-to-date tools possible in each case, and even, when possible, real machines instead of simulations.For the simulations, we have used the gem5 simulator, at all times performing multiple runs.The simulator has been configured to emulate, as reliably as possible, processors based on the latest intel generations.
In our first step, Splash-4, we have managed to reduce the execution time by using 64-cores by 50%, while maintaining the original structure and algorithms.In the second objective (MADs), the new atomic instructions implemented, reduce execution time by 80% compared to the classical lock mechanism, and by 60% by using a transitional memory technique similar to intel TSX, adding only 68 bytes per core.Finally, CLEAR is able to limit the number of re-executions of critical sections executed under speculative methods, increasing by 35% the number of sections that complete on the first retry, and reducing from 37% to 15% the number of sections that need to reach fallback. All this improving the execution time by 35% against an Intel TSX implementation and 23% against PowerTM.