The Eighteenth International Conference on
Raleigh, North Carolina. September 12-16, 2009.
Improving Hardware Cache Performance Through Software-Controlled Object-Level Cache Partitioning
Qingda Lu, Jiang Lin, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang and P. Sadayappan
Exploiting data locality in on-chip caches has become increasingly challenging. In this paper we leverage software and operating system utilities to identify locality patterns of data objects and allocate them accordingly with different priorities in caches. This data object locality guided caching strategy is mainly designed to address the inability of LRU replacement to effectively handle memory intensive programs with weak locality (such as streaming accesses) and contention among strong locality data objects in caches, so that sub-optimal replacement decisions can be avoided. To achieve our goal, we present a system software framework in this paper. We first collect object-relative stack histograms and inter-object interference histograms via memory trace sampling. With several low-cost training runs, we are able to determine the locality patterns of data objects. For the actual runs, we categorize data objects into different locality types and partition the cache space among data objects with a heuristic algorithm, in order to reduce cache misses through segregation of contending objects. The object-level cache partitioning framework has been implemented through modification of a Linux kernel, and tested on a commodity multi-core processor. Experimental results show that in comparison with a standard L2 cache based on the LRU policy, our software method provides significant speedup and L2 cache miss reductions across inputs for a set of single- and multi-threaded programs from the SPEC CPU2000 benchmark suite, NAS benchmarks and a computational kernel set.