The Eighteenth International Conference on
Raleigh, North Carolina. September 12-16, 2009.
DDCache: Decoupled and Delegable Cache Data and Metadata
Hemayet Hossain, Sandhya Dwarkadas and Michael C. Huang
In order to harness the full compute power of many-core processors, future designs must focus on effective utilization of on-chip cache and bandwidth resources for both truly parallel and multiprogrammed workloads. For truly parallel workloads, support for fine-grained sharing between subsets of processors is essential. For multiprogrammed workloads, interference across applications must be minimized to achieve performance isolation.In this paper, we address the dual goals of (1) reducing on-chip communication overheads and (2) improving on-chip cache space utilization resulting in larger effective cache capacity. We present a new cache coherence protocol that decouples the logical binding between data and meta-data in a cache set. This decoupling allows data and metadata for a cache line to be independently delegated to any location on chip. By delegating metadata to the current owner/modifier of a cache line, communication overhead for metadata maintenance is avoided and communication can be effectively localized between interacting processes. By decoupling metadata from data, data space in the cache can be more efficiently utilized by avoiding data replication. Using full system simulation, we demonstrate that our decoupled protocol achieves a 22% (29% with microbenchmarks) performance (speedup) improvement on average (geometric mean) over a base statically mapped directory-based non-uniform cache access protocol, with a 31% reduction in on-chip bandwidth, a 26% reduction in off-chip bandwidth, and 26% reduction in energy (5% power) consumption.