25.5 Multi-GigaHertz Low-Power Low-Skew Rotary Clock Scheme

John Wood, Steve Lips, Paul Franton, Michael Steer

*Multigig Corporation Ltd.
*North Carolina State University

On-chip clock frequencies in the gigaHertz range require generators with low skew and low jitter to avoid timing problems. Traditional approaches to the clock distribution problem start to become untenable in the gigaHertz range. For example, H-trees require careful balancing and are difficult to implement for multi-gigaHertz operation even for submicron CMOS processes. Other systems, such as asynchronous distribution [1] and distributed amplifiers [2], provide a sinusoidal clock, making fast edge rates difficult to achieve. This rotary clock distribution architecture provides low-skew low-jitter, gigaHertz-rate clocking with high edge rates and low power consumption, works over a wide power supply range and is completely scalable. The frequency is limited only by f0 of the integrated circuit technology used, an f0 of approximately 30GHz produces square waves with 20ps transition times. In addition, there is no limit to the size of the chip that can be clocked, and both multiphase and non-overlapping noise-immune differential clocking are supported.

The basic architecture is shown in Figure 25.5.1. This is a layout of a 2.5GHz rotary clock with 25 interconnected rings. Each ring consists of a differential line driven by shunt-connected anti-parallel inverters, which are distributed around the ring. This arrangement produces a clock wave that rotates around the ring at a rate that depends primarily on the electrical length of the ring. Rotation is locked and amplitude is maintained by the switching transistors, in spite of conductor losses.

Unlike a ring oscillator, the energy that goes into charging and discharging inverter inputs becomes transmission line energy which is recirculated in the closed electromagnetic path, providing a significant power savings as losses are due only to IR dissipation in the wires and not CV^2 related dissipation. The power savings are further enhanced when copper metallization is used.

Figure 25.5.2 illustrates the theory behind the rotary clock architecture. Figure 25.5.2a shows an open loop of differential conductors connected to a battery through an ideal switch. When the switch is closed, a voltage wave begins to travel counter-clockwise around the loop. Figure 25.5.2b shows a similar loop, with the voltage source replaced by a cross-connection of the inner and outer conductors. If there are no losses, a wave travelling on this ring will continue indefinitely, providing a full clock cycle every other round trip of the edge. The inversion occurs at the crossover. To overcome losses and provide a start-up signal, at least one anti-parallel inverter pair is required. Power supply ramp up or any other noise event initiates start-up of the rotary wave. Once the wave is established it takes little power to sustain it. Also, since there is exactly 180° phase shift for each rotation around the ring, the relative phase and therefore clock skew at any point on the ring is well known.

Interconnected rings, as in Figure 25.5.1a, must run in lock step. This ensures that the same signal appears on each ring and that the relative phase at all points on all the rings is well known. Thus by choosing the correct pick-off point on each ring, it is possible to use a large array of interconnected rings to distribute a clock signal over an arbitrarily large die area with minimal clock skew. For example, referring to Figure 25.5.1a, all the points marked with the equals sign (=) have the same relative phase. By choosing a pick-off point that is diametrically opposite to a given pick-off point, it is possible to obtain the opposite phase, and in principle an arbitrary number of phases can be extracted.

The rotary clock is modelled as short lengths of transmission line between inverter pairs which present substantial capacitive loading. Figure 25.5.3a shows the transmission line model consisting of the L_{m,in} inductance and the C_{m,lin} line-to-line capacitance surrounding the inverters. Figure 25.5.3b shows the full model of the transmission line element with all of the transistor capacitances broken out. Given that L and C are the inductance and capacitance per unit length of the differential line, C_i is the total input capacitance of each inverter, and that there are N inverters per unit length around the ring, the effective parameters describing the loaded ring are: L_{eff} = L + C_i/N; Z_{off} = \sqrt{L_{eff}/C_i}; v_p = 1/\sqrt{L_{eff}/C_i} Thus the clock frequency is approximately f_c = v_p/2π where L is the length of the ring. Nominal clock frequency is selected by varying L and C, which can be accomplished by measuring the lines and by adding gate-channel capacitances along the lines.

Figure 25.5.4 shows a die micrograph of a prototype built using a 0.25um 2.5V CMOS process with 1um AlCu. The prototype features a large ring that is completely independent of five interconnected smaller rings. The 12800µm outer ring uses 66um conductors on a 128µm pitch, with 128 62µm/25µm inverter pairs distributed along its length. Interconnect segments are modeled using a 20-pole equivalent L R matrix generated using FASTHENRY [3]. Inverters are modeled using BSIM3v3 non-quasi-static transistor models. Simulations predict a clock frequency of approximately 925MHz. Measurements of the actual performance of the large ring with Vss=2.5V vs. simulation results are shown in Figure 25.5.5. The oscillation frequency is 965MHz. Jitter is measured at 5.5ps rms using a Tektronix 11801A oscilloscope with an SD-26 sampling head. Figure 25.5.6 shows that the oscillation frequency is flat over a wide Vdd range and that total chip power consumption is low. Clock generator simultaneous switching transients are eliminated by the distributed switching times of each inverter, allowing operation with just 10pF of on-chip capacitance and no off-chip decoupling while driving multiple 10Ω impedance lines. Figure 25.5.7 shows the measured waveform on one of the smaller rings, which is not yet fully characterized. Oscillation frequency is 3.38GHz vs. a simulated frequency of 3.42GHz.

Acknowledgements:
This work was supported by Multigig Corp. Ltd., and partially supported by the NSF under award EIA-31332.

References:
Figure 25.5.1: Basic rotary clock architecture. The designs denote points with equivalent phase.

Figure 25.5.2: Genesis of the wave on the ring.

Figure 25.5.3: Development of the rotary clock model. (a) Shows the macromodel of a transmission line segment. (b) Shows capacitances of the transistors broken out.

Figure 25.5.4: Die micrograph.

Figure 25.5.5: Measurement vs. simulation of large ring.

Figure 25.5.6: Clock frequency vs. Vdd for the large ring and ldd vs. Vdd for the entire chip with all six rings.

Continued on Page 470
Figure 25.3.6: Single phase jitter histogram at 250MHz.

Figure 25.4.5: Chip micrograph (a) parallel receiver (1PLL+8Rx), (b) Rx (1ch).

Figure 25.5.7: Measured output on the 3.42612GHz ring.

Figure 25.3.7: Skew measurement result with Wavecrest OTS2075 time interval analyzer.

Figure 25.4.7: Measured results; output waveforms of 2 channels.

Figure 26.1.7: Die micrograph.