Mobile QR Code QR CODE

  1. (Department of Electronic Engineering, Kwangwoon University, 615, Bima, 20, Gwangun-ro, Nowon-gu, Seoul 139-701, Korea)



CDR, CMOS, data rate, HDMI, high-speed, integrated circuit, IO, loop, majority vote

I. INTRODUCTION

From requirement of high per-pin data speed between chip-to-chip communications, CDR circuits generate an optimized clock timing aligned to incoming random data with a small unit interval (UI). A CDR loop has been traditionally designed using charge pump-based loop filters along with analog voltage-controlled oscillators (VCO) (1-3). Without careful design on the supply node, the control voltage can be coupled to the supply and it can be modulated to phase noise at the VCO output. In the digital-friendly circuits, the process-voltage-temperature variation and the effect of supply noise can be mitigated, and portability to a newer process can be improved as well (4). The digital CDRs shown in (5-7) that operate with digital loop filters (DLF) have a good portability. IO schemes for many applications, such as DP, HDMI, PCI express, should support multi-channel timing recovery in receivers where frequency offsets between channels are zero. Implementing multiple PIs for each channel and sharing one oscillator can allow to achieve efficient power and area performances (8,9). PI-based CDRs should be able to catch up instantaneous frequency offsets between incoming data and recovered clock for resilient timing alignment for each channel. The integral path gain of each CDR with a bang-bang phase detector (PD) decides the frequency catch-up speed (10). In this paper, we propose a 6 Gbit/s PI-based all-digital CDR that can support multi-channel implementation for HDMI 2.0 standard. The suggested majority voting logics update the digital loop filter without wasting deserialized edge information and the smooth movement of PI control signal can be achieved. As a result, a low jitter performance has been measured when the loop locks. Due to the usage of two’s complement format for the voted data, the counter blocks that only accumulates the vote can be obviated. Binary shifting gain extender at the end of the integration accumulators allows us to reduce the size of adders and to achieve a wide gain control range, while the previous scheme accumulate the data using counter (11,12). In the circuit’s perspective, we have made a new slew rate control approach for PI to improve the DNL of the output clock phases. The better linearity between input digital control of the PI and output phase shifting steps contributes a stable loop gain. A Gray mapping can remove the abrupt transition of the codes.

The rest of the paper is organized as follows. Section II reviews the overall architecture of the proposed PI-based digital CDR. Section III presents the majority voting logics used for deserialized data. The section shows our DLF scheme, gain range extending structure, circuit diagram of our PI and the slew rate control scheme as well. Section IV shows the measurement results of our IP and Section V concludes this paper.

II. ARCHITECTURE

Fig. 1(a) presents the proposed 6 Gbit/s majority voting-assisted CDR architecture. As shown in Fig. 1(b), three half-rate strong-arm latches triggered by 3 GHz clocks at 0°, 180° and 90° timing, sample 3 Gbit/s ODD, EVEN, EDGE data, respectively. The early/late info is updated from the EDGE data when there is a transition in the DATAIN signal. In that case, either ODD or EVEN data is sampled as data 1 and thus the XOR gating of these two signals is one. 3:24 de-serializers transform 3×3 Gbit/s data into 24×375 Mbit/s parallel data and mitigate the power consumption and timing aligning difficulty from high-speed operation. The low-speed parallel CDR logics generate 8 EARLY and 8 LATE data from the previously sampled and deserialized ODD, EVEN, EDGE data. Integration of EARLY [7:0] and LATE [7:0] without losing EARLY/LATE info, requires 8 independent DLFs which increase power and chip area considerably. The majority voter reduces the parallel early/late data into 1-bit early/late data and thus the effect of parallel early/late data from parallel CDR logics can be statistically reflected on the 1-bit early/late signal (11,12). In the DLF, the programmable gain KP and KI control the closed loop bandwidth and jitter tolerance of CDR. The 5 LSB (least significant bit) bits out of 7-bit binary DLF outputs are encoded as a thermometer format in the binary-to-thermometer (B2T) block. The phase interpolators rotate the recovered clock phases from 0° to 360° for an optimal timing alignment. In front of the PI, the slew rate of clock signals coming from the oscillator is controlled by slew control blocks and the linearity of phase shifting steps is improved significantly. For a real-time measurement purpose, the 7-bit DAC monitors the DLF output in the analog data format. The PRBS checker with a BER calculator confirms if the received data accord with the transmitted data.

Fig. 1. (a) Architecture of the proposed PI-based CDR, (b) Illustration of the waveforms for 3 half-rate sampler latches at the front-end.

../../Resources/ieie/JSTS.2021.21.3.199/fig1.png

III. CIRCUIT DESCRIPTION

1. Voting Logics

Fig. 2(a) shows a logic circuit to decide the updating sign and Fig. 2(b) shows the majority voting logics. The EARLY [n] and LATE [n] signals are transformed to the voting number (+1:UP / 0:HOLD / ̶ 1:DOWN) by using Sign[n] and Mag[n] signals, where n is the integer number from 0 to 7. When both EARLY [n] and LATE [n] are identical in which case clock signal and data signal are aligned, the output Sign [n] Mag [n] = 00 and no vote is contributed on Sum [4:0] signal. In the majority voting logics, the 8-bit voting numbers (+1 / 0 / ̶ 1) are summed up and the result spans the range from +8 to ̶ 8, which requires the Sum signal to have 5-bit two’s complement representation. +1 ~ +8 / 0 / ̶ 1 ~ ̶ 8 finally make the VEARLY, VLATE to be [10], [00], [01], respectively. When the CDR loop locks and Sum [4:0] stay on [00000], no update is made on the input of the following DLF. The EARLY [7:0] and LATE [7:0] signals are the recovered deserialized data [15:0] in our half-rate scheme. The built-in PRBS checker can measure BER to the range of 2-40 ≈ 10-12 level (13).

Fig. 2. Majority voting logic (a) binary mapping, (b) logic diagram.

../../Resources/ieie/JSTS.2021.21.3.199/fig2.png

2. Digital Loop Filter Scheme

Fig. 3presents our DLF circuit schematic. VEARLY and VLATE bits mapped in two’s complement form are scaled by proportional gain (KP) and integral gain (KI) and the values are controlled via binary shifting as shown in Fig. 3. Expanding gain range of KP and KI has advantages of wide catch-up speed options for both phase offset and frequency offset in presence of a non-linear bang-bang PD (14). Increasing KI gain enables the loop to catch large frequency offset but concurrently widens the closed loop bandwidth of CDR and input jitter suppression effect is mitigated. The gain extender located in a proper position along the path does additional binary-shifting to increase the KI range without concentrating large sized adders on the 1st accumulator only. If data and recovered clock have a frequency offset, the gain shifting degree in the gain extender controls follow-up speed as well. In a low frequency offset and large jitter environment at the input, the loop bandwidth is reduced by decreasing KI gain. As like the scheme shown in (15,16), a control signal of the PI (PI_CODE [6:0]) is monitored via the 7-bit DAC and the control patterns are sent out to the measurement equipment through an on-chip pad in the analog format.

Fig. 3. Digital loop filter (DLF) circuit diagram and illustration of gain control procedure.

../../Resources/ieie/JSTS.2021.21.3.199/fig3.png

3. Phase Interpolator and Slew Rate Control

Fig. 4(a) shows the circuit schematic diagram of our 7-bit PI that recovers the required timing by manipulating quadrature input clocks from 3 GHz oscillator. A current-steering PI can provide a highly linear phase shift (17,18). The resolution of phase interpolators depends on the binary bit number of PI_CODE [6:0] and increasing bit number aggravates the circuit complexity. As shown in the illustration of Fig. 4(b) our PI shows the resolution of 2.8 degree/LSB. To reduce a glitch, QUAD [1:0] are generated from 2 MSB bits of PI_CODE [6:0] using Gray-mapping. The rest PI_CODE [4:0], the binary LSB codes, are transformed to a thermometer code THERM [30:0] for fine DAC switching (IODD, IEVEN). Fig. 4(c) shows a block diagram of overall PI circuits for half-rate CDR. 3 GHz quadrature clock signals are generated and provided from an oscillator for phase interpolation. The slew control blocks improve the linearity by increasing the slew rate, but the power performance should be traded-off. The half-rate CDR uses both CLK 0° / CLK 180° and CLK 90° for ODD/EVEN data sampling and EDGE data sampling, respectively. Since one PI can generate only two clock phases - 0°/180° or 90°/270°, two PIs are required to operate concurrently to all the required recovered clocks (0°, 180°, 90°). For initial CDR lock, CLK 0° and CLK 90° move simultaneously and 90° phase difference between 0°/180° and 90°/270° is maintained by adding 32 to the PI mapping block in the front.

Fig. 4. (a) Circuit schematic of 7-bit phase interpolator, (b) Quadrature mapping of phase interpolator, (c) Description of a scheme for 2 phase interpolators for 0°, 180° and 90° clock timing recovery.

../../Resources/ieie/JSTS.2021.21.3.199/fig4.png

Fig. 5(a) presents an architecture description of 2 slew rate control blocks for CLK 0°/180° PI, CLK 90°/270° PI. For slew rate control blocks to have equivalent loadings/timing delays, the clock signal CLK 0°/90°/180° /270° drive equally distributed gate loadings at PI inputs. The PI control code and shifted phase at the PI output have a non-linear relation due to the non-linear characteristics of devices and signals. Maintaining a good linearity for all range of control code results in a constant loop gain and stabilizes the loop transfer function. As shown in Fig. 5(b), 2-bit slew rate control blocks have been designed and placed at the input of PI block. The slew rate is controlled by turning on and off the segment unit. When the segmented block is disabled, the current mirror is switched off by S1 and the loadings by S2 and S3. As the number of enabled block increases, gm grows. Thus, the output clock signal makes a transition sharply. The graph on Fig. 5(b) shows the simulation results of the mapped PI_CODE [6:0] versus shifted output phase DNL in LSB unit for various slew options. Where N ranges from 0 to 127. As SLEW_THERM [3:0] increases from 0001 to 1111, the standard deviations of the DNLs are improved as 0.95, 0.86, 0.75 and 0.54 LSB. Using a fast slew for incoming quadrature clocks improves the linearity. However, the power increases as the number of enabled blocks increases. In our CDR, each segment unit consumes 0.285 mW. To achieve 0.86 DNL performance and 0.6 mW concurrently, we enable 2 segment units in the slew rate control blocks. Since the input of the block, SLEW_THERM [3:0], is thermometer coded. We enable 2 units out of 4 units.

Fig. 5. (a) Circuit description of 2 slew rate control blocks and CLK 0°/180° PI, CLK 90°/270° PI, (b) Schematic for slew rate control, (c) DNL simulation results of phase shifting linearity.

../../Resources/ieie/JSTS.2021.21.3.199/fig5.png

IV. MEASUREMENT RESULTS

Fig. 1(a) shows the measurement nodes of our proposed CDR. To test our CDR, 6 Gbit/s 231-1 PRBS NRZ signal with 2.05 ps RMS jitter and 1 Vdpp swing at the input is generated by Synthesis Research BERT 7500B, as shown in Fig. 6(a). Fig. 6(b) presents the recovered clock jitter when the loop is locked and Tektronix TDS 6154C oscilloscope is used for measurement. The measured peak-to-peak and RMS jitter of the recovered clock (divided by 16) are 12.2 ps and 1.826 ps, respectively. The phase noise of recovered clock is measured by HP E4401B and is measured as -114.72 dBc/Hz at 1 MHz, as shown in Fig. 6(c). The lock pattern of the digital control of PI input is measured via a 7-bit DAC output via a signal from on-chip pad when the loop is initially turned on and finds the lock position, as shown in Fig. 6(d). The measured lock time of the loop is 54.5 ns. The built-in PRBS checker shows the BER of under 10-12 at 6 Gbit/s at the centre of data eye. Fig. 7 shows the die photograph of the proposed PI-based all digital CDR. The prototype has been fabricated in 65 nm CMOS process and occupies 0.073 mm2 chip area (excluding PRBS checker). In Table 1, the measurement results of the proposed CDR are summarized and compared to the prior arts. (3) presents a CDR scheme charge pump-based loop filter and it uses an analog Vcontrol. The chip area is comparably large to reduce the ripple on the control voltage. (6,8) have shown digital-type CDRs with digital loop filters and (8) generates the clock source from an analog charge pump PLL with a ring VCO. Our CDR is a digital filter-based and aligns the linearized PI output phase to data timing with a low jitter performance by an assistance from majority voting logics. The proposed CDR shows 17.4 mW of power consumption at 6 Gbit/s and the best jitter performances among the results of compared papers.

Fig. 6. (a) 6 Gbit/s input NRZ signal used for the measurement, (b) Recovered output clock jitter during steady state, (c) Measured output phase noise of the recovered clock, (d) Measured lock time of the CDR loop.

../../Resources/ieie/JSTS.2021.21.3.199/fig6.png

Fig. 7. Die photograph of the proposed CDR.

../../Resources/ieie/JSTS.2021.21.3.199/fig7.png

Table 1. Performance comparison table

(3)

(6)

(8)

This work

Technology

65 nm

CMOS

130 nm

CMOS

180 nm

CMOS

65 nm

CMOS

Architecture

Analog filter-based CDR

Digital filter-based CDR

Digital filter-based CDR

Digital filter-based CDR

Supply (V)

1.0

1.2

1.4

1.0

Data rate

(Gbit/s)

0.75-3.0

1.0-4.0

0.2 - 4.0

6.0

Peak-to-peak

Jitter (ps)

37.2 (@ 3.0 Gbit/s)

29.2 (@ 3.0 Gbit/s)

115.1 (@ 2.0 Gbit/s)

12.2 (@ 6 Gbit/s)

RMS

Jitter (ps)

5.69(@ 3.0 Gbit/s)

3.58(@ 3.0 Gbit/s)

28(@ 2.0 Gbit/s)

1.83(@ 6 Gbit/s)

BER

< 10-12

< 10-14

< 10-12

< 10-12

Power (mW)

15.5

11.4

14

17.4

FoM

(mW/Gbit/s)

5.1

3.8

7

2.9

Area (mm2)

0.35

0.074

0.8

0.073

V. CONCLUSIONS

A half-rate PI-based all-digital CDR has been proposed. The segmented slew rate scheme improves the linearity of the phase steps. The proposed CDR is designed with all digital scheme and can be ported to other processes with reduced manpower. Our CDR consumes 17.4 mW power from 1.0 V supply at 6 Gbit/s. The prototype CDR occupies 0.073 mm2 chip area and has been fabricated in 65 nm CMOS process.

ACKNOWLEDGMENTS

The work reported in this paper was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2020R1F1A1057497) and present research has been conducted by the Excellent researcher support project of Kwangwoon University in 2021. The EDA Tool was supported by the IC Design Education Center.

REFERENCES

1 
Shivnaraine R., July 2014, An 8-11 Gb/s Reference-Less Bang-Bang CDR Enabled by "Phase Reset", IEEE Trans. on Circuits and Systems-I, Vol. 61, pp. 2129-2138DOI
2 
Kiaei A., Sep 2009, A 10 Gb/s NRZ receiver with feedforward equalizer and glitch-free phase-frequency detector, Proceeding of European Solid-State Circuits Conference, pp. 372-375DOI
3 
Jin J., Oct 2018, A 0.75-3.0-Gb/s Dual-Mode Temperature-Tolerant Referenceless CDR With a Deadzone-Compensated Frequency Detector, IEEE J. Solid-State Circuits, Vol. 53, pp. 2994-3003DOI
4 
Elshazly A., A 0.4-to-3 GHz Digital PLL With PVT Insensitive Supply Noise Cancellation Using Deterministic Background Calibration, IEEE J. Solid-State Circuits, Vol. 46, No. 12, pp. 2759-2771DOI
5 
Shu G., feb 2014, A 4-to-10.5 Gb/s 2.2 mW/Gb/s continuous rate digital CDR with automatic frequency acquisition in 65nm CMOS, in IEEE Int. Solid-State Circuits Conf. Tech. Dig., San Francisco, CA, pp. 150-151DOI
6 
Song H., Oct 2010, A 1.0-4.0-Gb/s all-digital CDR with 1.0-ps resolution DCO and adaptive proportional gain control, IEEE J. Solid-State Circuits, Vol. 46, pp. 424-434DOI
7 
Sonntag J.L., Stonick J., July 2006, A Digital Clock and Data Recovery Architecture for Multi-Gigabit/s Binary Links, IEEE J. Solid-State Circuits, Vol. 41, pp. 1867-1875DOI
8 
Hanumolu P. K., Wei G., Moon U., Jan 2008, A Wide-Tracking Range Clock and Data Recovery Circuit, IEEE J. Solid-State Circuits, Vol. 43, pp. 425-439DOI
9 
Kromer C., Nov 2006, A 25-Gb/s CDR in 90-nm CMOS for High-Density Interconnects, IEEE J. Solid-State Circuits, Vol. 41, pp. 2921-2929DOI
10 
Wenjing Yin , A 0.7-to-3.5 GHz 0.6-to-2.8 mW Highly Digital Phase-Locked Loop With Bandwidth Tracking, IEEE Journal of Solid-State Circuits, Vol. 46, pp. 1870-1880DOI
11 
Bueren R., Holzer D., Schmatz M., Nov 2008, 5.75 to 44 Gb/s quarter rate CDR with data rate selection in 90nm bulk CMOS, European Solid-State Circuits Conference, Edinburgh, pp. 166-169DOI
12 
Chen M., Dec 2011, A Fully-Integrated 40-Gb/s Transceiver in 65-nm CMOS Technology, IEEE J. Solid-State Circuits, Vol. 47, pp. 627-640DOI
13 
Piplani S., Nov 2017, Test and Debug Strategy for High Speed JESD204B Rx PHY, IEEE 26th Asian Test Symp., Taipei, pp. 184-188DOI
14 
Wenjing Yin , Sept 2010, A 1.6mW 1.6ps-rms-jitter 2.5GHz digital PLL with 0.7-to-3.5GHz frequency range in 90nm CMOS, in IEEE Custom Integrated Circuits Conference 2010 San FranciscoDOI
15 
Tokonami K., Kohira K., Ishikuro H., Aug 2015, Wave monitor for glitch detection and skew adjusting in high-speed DAC, IEEE Int. Symp. on Radio Frequency Integration Technology, Sendai, pp. 175-177DOI
16 
Huang S., Cao J., Green M. M., Feb 2014, An 8.2-to-10.3 Gb/s Full-Rate Linear Reference-less CDR Without Frequency Detector in 0.18 μm CMOS, Int. Solid-State Circuits Conf. Tech. Dig., San Francisco, CA, pp. 152-153DOI
17 
Francese P. A., Aug 2014, A 16 Gb/s 3.7 mW/Gb/s 8-Tap DFE Receiver and Baud-Rate CDR With 31 kppm Tracking Bandwidth, IEEE J. Solid-State Circuits, Vol. 49, pp. 2490-2502DOI
18 
Gangasani G. R., July 2012, A 16-Gb/s Backplane Transceiver With 12-Tap Current Integrating DFE and Dynamic Adaptation of Voltage Offset and Timing Drifts in 45-nm SOI CMOS Technology, IEEE J. Solid-State Circuits, Vol. 47, pp. 1828-1841DOI

Author

Kyunghwan Min
../../Resources/ieie/JSTS.2021.21.3.199/au1.png

Kyunghwan Min received the Bachelor of Science (B.S.) in 2017. His research is focused on high-speed wireline interface circuit design. During his M.S. degree, he has been researching clock gene-ration circuits such as phase-locked loop, clock and data recovery schemes and IO interface transceivers related to HDMI standard.

Sanggeun Lee
../../Resources/ieie/JSTS.2021.21.3.199/au2.png

Sanggeun Lee received the B.S. degree in the department of electro-nic engineering from Kwangwoon university, Korea, in 2020. He is currently pursuing the M.S. degree in Kwangwoon university, Korea. His research interests include PLL, clock recovery and high-speed IO circuits.

Taehyoun Oh
../../Resources/ieie/JSTS.2021.21.3.199/au3.png

Taehyoun Oh (S’05) received the Bachelor of Science (B.S.) and Master of Science (M.S.) degrees in Electrical Engineering from Seoul National University in 2005 and 2007, respectively. He received his Ph.D. degree in Electrical Engine-ering from the University of Minne-sota, Minneapolis under the supervision of Dr. Ramesh Harjani. His doctoral research is focused on high-speed I/O circuits and architectures. During the summer of 2010, he worked on I/O channel modeling at AMD Boston Design Center, MA. In the fall semester of 2011, he researched on I/O architecture and jitter budgeting of the link at Intel Corp., CA. From fall of 2012, he joined the IBM system technology group, NY. and worked on performance verification of high-speed decision feedback equalizer for server processors. Since spring of 2013, he joined at the department of electronic engineering in Kwangwoon university in Seoul, Korea as an assistant professor. His current research interest is focused on clock generation and high-speed interface IC design.