NC State University ECE Dept. |
|
Architecture Research for PErformance, Reliability, and Security |

|
Research |
|
For an overview of current (more up-to-date) research activities at ARPERS, here is the list of abstracts from recent publications.
Intelligent Memory Hierarchy Optimization
FAIR CACHING: hardware/software support that enforces fairness when several threads in a chip multiprocessor (CMP) system share a cache [PACT '04]. PRIME CACHE INDEXING. We reduce conflict misses in applications by using cache hashing functions that utilize prime numbers. Prime modulo hashing uses a prime number of cache sets. We show that prime modulo indexing can be performed fast, without the use of integer division, and with a set of narrow addition and shift operations. We also propose prime displacement indexing, the cache index is calculated as the traditional index added with a displacement. The displacement is calculated as a prime number times some tag bits from the address. What is unique about the prime hashing is that while they eliminate conflict misses, unlike competing techniques, they do not cause extra conflict misses in applications with sequential access patterns. This is important because although a non-trivial fraction of applications suffer from conflict misses, majority of applications do not suf fer from much conflict misses, and should not be penalized by an alternative cache hashing function [TC¡¯05, HPCA'04]. INTELLIGENT MEMORY PREFETCHING: Our approach focuses in distributing the computation across the main processor and the processors in memory (PIM). This approach utilizes embedded DRAM technology, which has recently been introduced into high-volume products such as the Sony Playstation2 and Nintendo GameCube. The advantage of PIM is that it provides low latency and high bandwidth access to the memory. Here, the memory processor runs a software handler that observes and learns the cache miss patterns of the main processor. It uses a correlation table to predict the future cache misses and prefetch those blocks into the cache of the main processor. Because the main processor finds the data in its cache, it does not need to access the memory for the data [ISCA'02, TDPS'03]. MEMORY CO-EXECUTION: Application code is partitioned into sections. We propose a partitioning algorithm and a scheduling algorithm that schedules the compute-intensive code sections to run on the main processor, and the memory-intensive code sections to run on the memory processor [HPCA'01, TC'01].
CMP CACHE CONTENTION PREDICTION: Sharing the L2 or L3 cache with multiple cores in a CMP is common in order to save precious die area and increase cache utilization. Unfortunately, such sharing can also lead to severe performance degradation for some applications because they are not able to obtain sufficient cache space for their computation. However, we found that the impact of cache sharing on performance is highly application-specific as well as thread mix-specific. To understand precisely when an application suffers from cache sharing, we create a cache contention model that captures the contention behavior of shared caches. The model is able to give deep insights into the relationship of the impact of cache sharing on performance with the temporal reuse behavior of the affected applications.[HPCA¡¯05]. SCAL-TOOL: A tool to pinpoint and quantify scalability bottlenecks of shared memory programs. It breaks the execution time of a program into 4 components: actual computation, and extra time due to synchronization, load imbalance, and insufficient cache size. Scal-tool was released in 1999 and is now part of NCSA's software repository [SC'99].
Architecture support for computer security and software reliability HEAPMON: HeapMon is a helper thread that performs heap bug detection similar to Purify. Since the helper thread runs in parallel to the application, it offloads much of the overheads of run-time bug detection that is interleaved with the application execution. In addition, efficient filtering mechanisms significantly reduce the workload of the helper thread, resulting in an average slowdown of less than 5%. [IBM Journal¡¯06]. MEMORY ENCRYPTION: more to come...
|