Programming Models and Compiler Optimizations for GPUs and Multi-Core Processors

J. Ramanujam, Louisiana State University
P. Sadayappan, The Ohio State University

On-chip parallelism with multiple cores is now ubiquitous. Because of power and cooling constraints, recent performance improvements in both general-purpose and special-purpose processors have come primarily from increased on-chip parallelism rather than increased clock rates. Parallelism is therefore of considerable interest to a much broader group than developers of parallel applications for high-end supercomputers. Several programming environments have recently emerged in response to the need to develop applications for GPUs, the Cell processor, and multi-core processors from AMD, IBM, Intel etc. As commodity computing platforms all go parallel, programming these platforms in order to attain high performance has become an extremely important issue. There has been considerable recent interest in two complementary approaches: This tutorial will provide an introductory survey covering both these aspects. In contrast to conventional multicore architectures, GPUs and the Cell processor have to exploit parallelism while managing the physical memory on the processor (since there are no caches) by explicitly orchestrating the movement of data between large off-chip memories and the limited on-chip memory. This tutorial will address the issue of explicit memory management in detail.