Abstract


Optimal Clocking of Wave Pipelined Systems and CMOS Applications

Wave pipelining (also known as maximal rate pipelining) is a timing methodology used in digital systems to increase the number of effective pipeline stages without increasing the number of physical registers in the pipeline. Using this technique, new data is applied to the inputs of a combinational logic block before the previous outputs are available thus effectively pipelining the combinational logic and maximizing the utilization of the logic. Realization of practical systems using this technique requires accurate system level timing analysis as well as accurate circuit level timing analysis. At the system level, generalized timing constraints for the correct clocking of wave pipelined circuits are presented. Both single stage and multiple stage systems including feedback are considered. This work shows that the sizes of valid regions of operation are dependent on the clock period, the intentional clock skew, and the global clock latency. The minimum clock period is obtained by clock skew optimization formulated as a linear program. In addition to the generalized system, important special cases are examined, and their relative performance limits are analyzed.

At the circuit level, since performance is determined by the maximum circuit delay difference, highly accurate estimates of both maximum and minimum delays are needed. It is shown that, for design methodologies such as wave pipelining where tight control is required on worst case circuit delays, traditional timing analysis based on gate delay models assuming single delay values for gates is not sufficient. For example, the delay of a two input CMOS NAND gate can vary by as much as a factor of two based on whether one input is changing or both inputs are changing. This implies that, for accurate detection of maximum and minimum overall delay, data dependent gate delay models must be used. To ascertain feasibility of multiple simultaneous inputs changing at individual gate inputs, simultaneous sensitization of multiple paths is performed. While this problem is NP-complete for general circuits, efficient heuristics are demonstrated for exact analysis of small to medium sized circuits. The algorithm presented is implemented in a prototype timing analyzer XTV and results are given for a set of benchmark circuits.

Using the presented theoretical base, practical techniques for the design of high speed wave pipelined circuits in CMOS are given. This includes a discussion of noise, power, process variation, and the unique problems of data dependent delays in standard CMOS gates. Techniques to combat these problems are discussed and a CMOS logic family is presented that reduces data dependency of delay. The validity of these techniques and of the concept of wave pipelining in CMOS is shown in two circuits that have been implemented and tested. These are a 250 MHz wave pipelined 16-bit adder in MOSIS 2um CMOS and a digital sampling circuit with 1 GBit/s bandwidth and 25 ps resolution in MOSIS 1.2um CMOS.