Microprocessor Architecture

Computer Architecture and Systems

A high-performance microprocessor is at the heart of every general-purpose computer, from servers, to desktop and laptop PCs, to open cell-phone platforms such as the iPhone. Its job is to execute software programs correctly and as quickly as possible, within challenging cost and power constraints.




Associated Area Faculty

No faculty are currently associated with this research area

Research in microprocessor architecture investigates ways to increase the speed at which the microprocessor executes programs. All approaches have in common the goal of exposing and exploiting parallelism hidden within programs. A program consists of a long sequence of instructions. The microprocessor maintains the illusion of executing one instruction at a time, but under the covers it attempts to overlap the execution of hundreds of instructions at a time. Overlapping instructions is challenging due to interactions among them (data and control dependencies). A prevailing theme, speculation, encompasses a wide range of approaches for overcoming the performance-debilitating effects of instruction interactions. They include branch prediction and speculation for expanding the parallelism scope of the microprocessor to hundreds or thousands of instructions, dynamic scheduling for extracting instructions that may execute in parallel and overlapping their execution with long-latency memory accesses, caching and prefetching to collapse the latency of memory accesses, and value prediction and speculation for parallelizing the execution of data-dependent instructions, to mention a few.

Within this speculation framework, there is room for exposing and exploiting different styles of parallelism. Instruction-level parallelism (ILP) pertains to concurrency among individual instructions. Such fine-grained parallelism is the most flexible but not necessarily the most efficient. Data-level parallelism (DLP) pertains to performing the same operation on many data elements at once. This style of fine-grained parallelism is very efficient, but only applies when such regularity exists in the application. Thread-level parallelism (TLP) involves identifying large tasks within the program, each comprised of many instructions, that are conjectured to be independent or semi-independent and whose parallel execution may be attempted speculatively. Such coarse-grained parallelism is well-suited to emerging multi-core microprocessors (multiple processing cores on a single chip). With the advent of multi-core microprocessors, robust mixtures of ILP, DLP, and TLP are likely.

Microprocessor architecture research has always been shaped by underlying technology trends, making it a rapidly changing and vigorous field. As technology advances, previously discarded approaches are revisited with dramatic commercial success (e.g., superscalar processing became possible with ten-million transistor integration). By the same token, technology limitations cause a rethinking of the status quo (e.g., deeper pipelinining seems unsustainable due to increasing power consumption).

The latest trend, multi-core microprocessors, challenges a new generation of researchers to accelerate sequential programs by harnessing multiple heterogeneous and homogeneous cores. Current NC State research projects along these lines include:

  • FabScalar Project. A promising way to increase performance and reduce power is to integrate multiple differently-designed cores on a chip, each customized to a different class of programs. Heterogeneity poses some unprecedented challenges: (1) designing, verifying, and fabricating many different cores with one design team, (2) architecting the optimal hetereogeneous multi-core chip when faced with boundless design choices and imperfect knowledge of the workload space, (3) quickly evaluating core designs in a vast design space, (4) automatically steering applications and phases of applications to the most suitable cores at run-time. The FabScalar Project comprehensively meets these challenges with a verilog toolset for automatically assembling arbitrary superscalar cores, guiding principles for architecting heterogeneous multi-core chips, analytical methods for quickly customizing cores to workloads, and novel approaches for steering application phases to the most suitable cores.
  • MemoryFlow Project. This project explores a new microarchitecture that distributes a program's data and corresponding computation among many cores.
  • Slipstream Project. This mature project pioneered the use of dual threads/cores in a leader-follower arrangement for improving performance and providing fault tolerance.
KEYWORDS:

pipelining, caches, branch and value prediction, static and dynamic scheduling, speculation, instruction-level parallelism, data-level parallelism, thread-level parallelism, microarchitecture, superscalar processor, VLIW processor, vector processor, multithreading, multi-core and many-core, heterogeneous multi-core, workload characterization, processor performance modeling

Associated Courses