Research
A high-performance microprocessor is at the heart of every general-purpose computer, from servers, to desktop and laptop PCs, to open cell-phone platforms such as the iPhone. Its job is to execute software programs correctly and as quickly as possible, within challenging cost and power constraints.
Research in microprocessor architecture investigates ways to increase the speed at which the microprocessor executes programs. All approaches have in common the goal of exposing and exploiting parallelism hidden within programs. A program consists of a long sequence of instructions. The microprocessor maintains the illusion of executing one instruction at a time, but under the covers it attempts to overlap the execution of hundreds of instructions at a time. Overlapping instructions is challenging due to interactions among them (data and control dependencies). A prevailing theme, speculation, encompasses a wide range of approaches for overcoming the performance-debilitating effects of instruction interactions. They include branch prediction and speculation for expanding the parallelism scope of the microprocessor to hundreds or thousands of instructions, dynamic scheduling for extracting instructions that may execute in parallel and overlapping their execution with long-latency memory accesses, caching and prefetching to collapse the latency of memory accesses, and value prediction and speculation for parallelizing the execution of data-dependent instructions, to mention a few.
Within this speculation framework, there is room for exposing and exploiting different styles of parallelism. Instruction-level parallelism (ILP) pertains to concurrency among individual instructions. Such fine-grained parallelism is the most flexible but not necessarily the most efficient. Data-level parallelism (DLP) pertains to performing the same operation on many data elements at once. This style of fine-grained parallelism is very efficient, but only applies when such regularity exists in the application. Thread-level parallelism (TLP) involves identifying large tasks within the program, each comprised of many instructions, that are conjectured to be independent or semi-independent and whose parallel execution may be attempted speculatively. Such coarse-grained parallelism is well-suited to emerging multi-core microprocessors (multiple processing cores on a single chip). With the advent of multi-core microprocessors, robust mixtures of ILP, DLP, and TLP are likely.
Microprocessor architecture research has always been shaped by underlying technology trends, making it a rapidly changing and vigorous field. As technology advances, previously discarded approaches are revisited with dramatic commercial success (e.g., superscalar processing became possible with ten-million transistor integration). By the same token, technology limitations cause a rethinking of the status quo (e.g., deeper pipelinining seems unsustainable due to increasing power consumption).
The latest trend, multi-core microprocessors, challenges a new generation of researchers to accelerate sequential programs by harnessing multiple heterogeneous and homogeneous cores. Current NC State research projects along these lines include:
pipelining, caches, branch and value prediction, static and dynamic scheduling, speculation, instruction-level parallelism, data-level parallelism, thread-level parallelism, microarchitecture, superscalar processor, VLIW processor, vector processor, multithreading, multi-core and many-core, heterogeneous multi-core, workload characterization, processor performance modeling