Using Aggressor Thread Information to Improve Shared Cache Management for CMPs

Wanli Liu and Donald Yeung

Shared cache allocation policies play an important role in determining CMP performance. The simplest policy, LRU, allocates cache implicitly as a consequence of its replacement decisions. But under high cache interference, LRUperforms poorly because memory-intensive threads, or aggressor threads, allocate more cache then they can efficiently use. Techniques like cache partitioning can address this problem by performing explicit allocation to preventaggressor threads from taking over the cache.Whether implicit or explicit, the key factor controllingcache allocation is victim thread selection. The choice ofvictim thread relative to the cache-missing thread determines each cache miss’s impact on cache allocation: if the two are the same, allocation doesn’t change, but if the two are different, then one cache block shifts from the victimthread to the cache-missing thread. In this paper, we studyan omniscient policy, called ORACLE-VT, that uses off-line information to always select the best victim thread, and hence, maintain the best per-thread cache allocation at all times. We analyze ORACLE-VT, and find it victimizesaggressor threads about 80% of the time. To see if we canapproximate ORACLE-VT, we develop AGGRESSOR-VT,a policy that probabilistically victimizes aggressor threadswith strong bias. Our results show AGGRESSOR-VTcomes close to ORACLE-VT’s miss rate, achieving three-quarters of its gain over LRU and roughly half of its gain over an ideal cache partitioning technique.To make AGGRESSOR-VT feasible for real systems,we develop a sampling algorithm that “learns” the identity of aggressor threads via runtime performance feed-back. We also modify AGGRESSOR-VT to permit adjusting the probability for victimizing aggressor threads, anduse our sampling algorithm to learn the per-thread victimization probabilities that optimize system performance (e.g., weighted IPC). We call this policy AGGRESSORpr-VT. Our results show AGGRESSORpr-VT outperformsLRU, UCP, and an ideal cache partitioning techniqueby 4.86%, 3.15%, and 1.09%, respectively.

Back to Program