PhD Prelim: Timely Dependence-Based Prefetcher Design

SpeakerChungsoo Lim
Organization NC State University
LocationPartners I, Room 2311
DateDecember 7, 2007 9:00 AM

This work proposes an architecture that efficiently prefetches for loads whose effective addresses are linearly dependent on previously-loaded values.  This dependence-based prefetching scheme covers most frequently-missed loads.  To make the dependence identification step efficient, profile-assisted compiler generates dependence information, and this information is loaded into memory-mapped prefetch table in the initialization phase of applications.

The dependence relationship is first identified, and producer loads issue prefetches for consumer loads.  For timely prefetches, memory access patterns of producing loads are dynamically learned and harnessed.  A producer load may be a pointer load in LDS or have a stride pattern in its effective address.  For a pointer load, the address of a future instance can be prefetched if the relation between the address of current instance and the address of future instance is captured.  For loads with stride pattern in their effective addresses, future addresses can be computed by multiplying a small constant with the stride before adding it to the current effective address.  Each producer load is associated with multiple consumer loads, so that one pointer or a stride can effectively prefetch for multiple consumers.

Proposed prefetcher provides a general framework for dependence-based prefetch, which utilizes any pattern of producer loads to make the prefetch engine run ahead of the processor core.  We also examine how to capture pointers in LDS with pure hardware implementation.  We found that the space requirement can be reduced, compared to previous work, if we selectively record patterns.  We explore the design space of the pattern storage to find reasonable tradeoff points between storage requirements and performance.

We take one step further in order to eliminate the additional storage needed for pointers.  We propose a mechanism that utilizes L2 cache for storing the pointers.  With this mechanism, impractically huge on-chip storage for pointers, which is sometimes a waste of silicon, can be removed.

 

  December 2007
Sun Mon Tues Wed Thu Fri Sat
      1
2345678
9101112131415
16171819202122
23242526272829
3031