-
Engineering and technology
- Computer architecture and organisation
- Performance evaluation, testing and simulation of reliability
- Processor architectures
The goal of this research proposal is to propose time-proportional speedup stacks which attribute the reasons for limited performance scalability of multi-threaded software on multi-core hardware to individual instructions such that software developers can act upon them and effectively optimize performance. We will specifically focus on synchronization-heavy multi-threaded workloads, which are most challenging to optimize, and which require instruction-level granularity and precision to fully understand why performance is not scaling as expected. Many critical workloads, including operating system kernels, multi-tenant virtual machines, and database management systems, are synchronization-heavy for which performance scalability crucially depends on synchronization performance. Analyzing and understanding how different synchronization primitives (locks) behave pave the way towards selecting the optimum lock implementation for a given software application on a given hardware platform.