Mobile QR Code QR CODE


I. INTRODUCTION

  Analog-to-digital converters (ADCs) are primarily used in IoT sensors and communication systems [1]-[3]. In particular, tens of GS/s ADCs are essential for PAM4-based wireline and 5G wireless communication systems [4]-[7]. However, because the digital logic at the back end of an ADC typically only operates at a few GHz, a single-channel ADC is limited in its ability to achieve over Giga-Hertz sampling rates. Therefore, time-interleaved (TI) ADC architectures, which use multiple ADCs in parallel and merge their outputs to achieve higher sampling rates, are often used [8]-[10].
Fig. 1 shows a conventional architecture using decimation, assuming N channels and a sampling frequency of fs [11]-[14]. Each sub-ADC operates by sampling at a rate of fs/N, while the MUX that merges their outputs operates at a rate of fs. The output, DOUT, can also be processed using a decimation circuit with a frequency of fdeci, which is in the Mega-Hertz range, allowing for real-time measurements. Note that the TI ADC conducts digital operations by parallelizing the outputs of each channel and merging them into one using a multiplexer (MUX). Therefore, the MUX must output data at the same rate as the sampling frequency of the TI ADC. As the sampling rate of the TI ADC increases, the speed burden on the MUX also increases, making implementation challenging.
To address this, high-speed TI ADCs use measurement memory to store the digital output signals from each channel and then output the stored signals externally at a slower clock rate, as shown in Fig. 2 [15]-[20]. The digital output of each sub-ADC is stored in the subsequent memory block at a rate of fs/N. The stored signals are then transmitted to the external environment of the chip through a parallel-to-serial memory interface operating at a slower speed. The utilization of memory requires a certain capacity (e.g., 6144 bits for a 1024 samples fast Fourier transform (FFT) of 6-bit resolution) to ensure the accuracy of ADC performance evaluation. As a result, it demands substantial area, and real-time performance evaluation is more challenging compared to decimation circuit-based measurement technique.
In this paper, we propose a two-rank decimation technique where the digital outputs of each channel are first decimated to a speed corresponding to the number of channels in the TI ADC before being merged using the MUX. The output of the MUX is then decimated once more to a speed that can be accommodated by the external measurement equipment. This approach aims to reduce the speed burden on the MUX to a level comparable to the conversion speed of the sub-ADC, minimize the area requirements, and enable real-time measurements. We validated this technique by applying it to a 6-bit, 20GS/s time-interleaved two-step Flash ADC designed using a 40nm CMOS process [21]. This paper discusses the same IC presented in [21] to validate the proposed technique but primarily focuses on explaining its architecture and operating principle. The proposed technique can also be applied to TI ADCs using different types of sub-ADCs, reducing the design burden on the MUX of the TI ADC.
This paper is organized as follows. Section II describes the proposed decimation architecture and timing diagram. The detailed implementation of the MUX and decimation circuit is described in Section III. Section IV describes the measurement results such as die photo, FFT, and PCB. Finally, Section V describes the conclusion.

Fig. 1.

Conventional decimation logic-based digital output technique in TI ADCs [11]-[14].

Fig. 2.

Conventional memory-based digital output technique in TI ADCs [15]-[20].

Fig. 3.

Fig. 3. Proposed two-rank decimation technique.

II. ARCHITECTURE

 Fig. 3 shows the proposed two-rank decimation technique. Before multiplexing the digital outputs of multi-channel ADCs, the digital output of the sub-ADC with a frequency of fs/N passes through 1/(N+1) decimation filters. The digital outputs from each decimation filter are merged in a MUX operating at a frequency of fs/(N+1). As shown in Fig. 3, the conventional decimation technique illustrated in Fig. 1 requires the MUX to operate at a speed of fs, whereas the proposed technique allows it to operate at a speed of fs/(N+1). Moreover, even if the operating speed of the TI ADC increases by using a greater number of interleaving channels, the proposed technique ensures that the MUX always operates at a frequency reduced by a factor of N+1. This allows it to maintain the operating speed level of a single-channel ADC, thereby significantly reducing the speed burden compared to conventional architecture.
Fig. 4 shows the conceptual block diagram of the proposed two-rank decimation technique. The actual verified architecture is 16-channel interleaved ADC. However, to simplify the explanation, we assumed only 4-channel architecture. The overall architecture consists of 4-channel ADCs, a DFF array, a 1/5 clock divider for the first stage decimation, a timing aligner that determines the start time of the 1/5 divider, and a second stage decimation circuit consisting of DECI, MUX, and MUX CLKGEN. The sequence of operations for the overall circuit is as follows: the ADCs that sample the input signal $V_{IN}$ operate at a frequency of fs/4 to match the respective ADC clocks ($Φ_{1~4}$) to generate digital outputs $D_{A1~A4}$. The generated outputs DA1~A4 pass through the DFF array, merge in the MUX, and pass through the second stage decimation circuit to output a measurably slow 6-bit signal $D_{OUT}$. The timing aligner, 1/5 clock divider and MUX CLK GEN circuits that generate the clocks used by the DFF array and MUX in this process are considered as the core circuits of the proposed architecture.
Fig. 5 shows the timing diagram of the circuit from Fig. 4. The clock operation is described assuming four channels for explanation, however, it may vary depending on the number of channels and their divided levels. When the RST is high, the circuit remains in the reset state, and when the RST goes low, EN goes high on the second falling edge of $Φ_1$. This synchronizes EN to the falling edge of $Φ_1$, ensuring that the first falling edge of $Φ_3$ is always ahead of the first falling edge of $Φ_1$ after EN goes high. In this way, the clocks come in the order of $Φ_{D3}$[2], $Φ_{D3}$[1], $Φ_{D1}$[3], and $Φ_{D1}$[4] are output, starting with the data on channel 2. EN must be synchronized with RST and $Φ_1$, while the 1/5 clock divider is synchronized by EN to divide the clock. Section III describes the detailed circuit implementation and operation of each block.

Fig. 4.

Conceptual block diagram of the proposed two-rank decimation technique illustrated with a 4-channel TI ADC example.

Fig. 5.
Fig. 6.

Fig. 6. Block and timing diagram of the timing aligner.

III. CIRCUIT IMPLEMENTATION

1. First Stage Decimation

  The first stage decimation consists of the timing aligner and the DFF array. Fig. 6 shows a block diagram of the timing aligner. The timing aligner operates by receiving two inputs: the reset signal, RST, which controls the on-off state of the test mode, and one of the sampling clocks from the sub-ADCs, designated as the reference clock for the decimation circuit, $Φ_1$. It outputs an enable signal, EN, which is synchronized with $Φ_1$. When RST transitions to a low state, EN is generated after two falling edges of $Φ_1$. This EN signal is critical in the proposed two-rank decimation technique, as it determines the digital output sequence.
Fig. 7 shows the block diagram of the 1/5 clock divider circuit using the EN signal shown in Fig. 6. The clock divider is implemented using 5-bit ring counters. It uses five DFFs and utilizes $Φ_1$, $Φ_3$ for input clock signals and EN for set and reset signals. It consists of two slices that receive $Φ_1$ and $Φ_3$ as inputs, and generates clocks $Φ_{D1}$[3], $Φ_{D1}$[4], $Φ_{D3}$[1], and $Φ_{D3}$[2] used for division. Because it is assumed to be 4 channels, it is divided into 1/5, and if the number of channels is N, it is designed to be the 1/(N+1) clock divider. As shown in Fig. 6, because EN is synchronized with $Φ_1$, whenever EN becomes high, the falling edge of $Φ_3$ occurs first, followed by the falling edge of $Φ_1$. It is synchronized to each of these falling edge to produce $Φ_{D1}$[3], $Φ_{D1}$[4], $Φ_{D3}$[1], and $Φ_{D3}$[2]. These clocks are used in DFF array to decimate the output of ADC.
Note that, slice 1 shown in Fig. 7 is first operated by the EN synchronized with $Φ_1$ to generate $Φ_{D3}$[1] and $Φ_{D3}$[2]. This allows the sub-ADC outputs to be multiplexed sequentially. If EN becomes “high” after falling edge of $Φ_3$, as shown in Fig. 5, there is a problem in which DM is output as 5, 6, 11, and 20 instead of 1, 6, 11, and 16. Therefore, sufficient clearance was designed between EN and $Φ_3$ to ensure that EN always operates first. If the interval between EN and $Φ_1$ is assumed to be $T_d$, the interval between the falling edge of $Φ_1$ and the falling edge of $Φ_3$ is assumed to be $T_s$, and the setup time of DFF is assumed to be $T_{set}$, it is designed to satisfy the condition of $T_d$ + $T_{set}$ < Ts/2. Therefore, the design synchronizes EN with two falling edges of $Φ_1$ to ensure a more stable generation of the EN signal.

Fig. 7.

Block and timing diagram of the 1/5 clock divider.

Fig. 8.

Block and timing diagrams for the MUX CLKGEN.

Fig. 9.

2. MUX CLK GEN and MUX

  Fig. 8 shows a block and timing diagram of the MUX CLKGEN. The MUX CLKGEN can generate $Φ_M$ which is used as the master clock for the MUX. As shown in Fig. 4, the MUX sequentially merges $D_{D1~D4}$, the output of the DFF array obtained by the first stage decimation filter, into $D_M$ by synchronizing with $Φ_M$. As shown in Fig. 8, $Φ_M$ can be generated using the $Φ_{D1}$ and $Φ_{D3}$ signals, which are obtained by dividing $Φ_1$ and $Φ_3$ by 1/5 as shown in Fig. 7. Since $Φ_M$ generates regions with different periods every four cycles, the MUX outputs a wide digital code every four cycles. This characteristic does not cause issues when the digital code, $D_M$, and the capture clock, $Φ_M$, are synchronized.
Fig. 9 shows a block diagram of a 4 to 1 MUX. In the actual verification, the conversion speed of the TI ADC is 20GS/s, while the proposed two-rank decimation technique allows the MUX to operate at around 1 GHz. The design comprises four slices that receive the outputs of the 4 channels and the selection signal as the input. The 4-channel output data are connected to the OR gate, and the final output $D_M$[6:1] is generated.

Fig. 10.

Block and timing diagrams of the second stage decimation circuit (DECI).

Fig. 11.

Block diagram of the proposed two-rank decimation technique illustrated with a 16-channel TI ADC.

3. Second Stage Decimation

  The second stage decimation circuit can use the same method used in conventional TI ADC architectures thanks to the MUX operation speed is greatly slowed down by the first stage decimation. Fig. 10 shows a block and timing diagrams of the second stage decimation circuit (DECI) used in this design. As shown in Fig. 10, in this design, $D_M$ is transferred to $D_{OUT}$ by the $Φ_{CAP}$ divided by 1/27. Keep in mind that the decimation factor of 1/27 can be selectively applied according to the desired frequency of $Φ_{CAP}$. However, the decimation factor must be chosen that allows the digital output codes of all sub-ADCs to be obtained. In conclusion, the proposed two-rank decimation technique, comprising digital blocks 1 through 3 as described in Section III, reduces the operating speed of the MUX to a level comparable to that of the sub-ADC. This improvement enhances channel scalability and facilitates real-time measurements with a reduced area compared to memory.

Fig. 12.

IV. MEASUREMENT RESULTS

 Fig. 11 shows a prototype 6-bit 16-channel TI two-step flash ADC designed with 20GS/s using the 40nm CMOS process [21]. Comparing Fig. 11 and Fig. 4, the actual verification circuit uses 16 sub-ADCs and 1/17 clock dividers, and four clocks for the clock divider. Therefore, the conditions mentioned in Fig. 6 are applied as $T_d+T_{set}$<$T_s$/4 rather than $T_d+T_{set}$<$T_s$/2. Td was designed to be about 37ps, $T_s$/4 was designed to be about 200ps, and the set time $T_{set}$ of DFF was designed to be about 163ps. In conclusion, it is designed to have sufficient timing margin. The following content describes the measurement results of the 16-channel ADC.
Fig. 12 shows a die photograph. The conventional DFF-based memory and the proposed two-rank decimation circuit are implemented on-chip for performance evaluation of the 16-channel TI ADC. The memory occupies an active area of 0.09mm$^2$ and can store 1024-points FFT. On the other hand, the proposed two-rank decimation logic occupies an active area of 0.007mm$^2$, which is much smaller than the memory-based data acquisition.
Fig. 13 shows a measurement setup. The input and clock signals are differentially applied. The digital output of the ADC is measured in two ways to compare the results measured by the decimator and memory. In the case of decimation, real-time data is decimated and transmitted to the PC through the NI-PCI-6552 board, and in the case of memory, data is stored according to the frequency of the memory, and this data is transmitted to the PC through the NI-PCI-6552 board like decimation to measure the final performance.
Fig. 14 shows the FFT results measured using the two-rank decimation technique. The signal-to-noise ratio (SNR), spurious-free dynamic range (SFDR), and signal-to-noise and distortion ratio (SNDR) are measured to be 31.02, 40.23, and 30.12 dB, respectively, at a sampling rate of 20GS/s and an input of 9.042GHz. The SNR shown in Fig 14 is limited by sampling clock jitter noise. Considering the designed noises such as comparator noise, quantization noise, and KT/C noise, it is expected that the sampling clock jitter noise is approximately 360 fs,rms.
Fig. 15 shows the power breakdown. The power consumption of the proposed circuit is 0.78mW in total. The DECI, MUX, MUX CLKGEN, 1/17 Clock divider, DFF array and timing aligner consume 0.29mW, 0.1mW, 0.05mW, 0.17mW, 0.11mW, and 0.06mW, respectively.
Table 1 summarizes the performance of the TI ADC applied to the proposed two-rank decimation technique and compares its performances with a 6-8 bits TI ADC architecture using memory with measured sampling frequencies greater than 20GS/s. In structures using memory, the memory area has a size of at least 0.156mm$^2$ up to 0.994mm$^2$. The proposed two-rank decimation technique can also be applied to TI ADC architectures with 64 or more channels [17]-[20]. When applying the proposed technique to a 64-channel implementation, the condition $T_d + T_{set}$ < $T_s$/8 is satisfied, ensuring the stable generation of the EN signal. Additionally, each circuit can be designed according to the number of channels, enabling sufficient scalability. By comparison, using two-rank decimation has a much smaller 0.013mm$^2$ than the memory area. When the prototype 6-bit TI two-step flash ADC was designed using two methods of memory and decimation, the memory area was designed to be 0.09mm$^2$ and the decimation area was 0.007mm$^2$, and the decimation area was designed to be about 8% of the memory area.

Fig. 13.

Measurement setup for the 6-bit 20GS/s 16-channel TI ADC.

Fig. 14.

Measured FFT spectra using the proposed two-rank decimation technique at 20GS/s sampling rate with a 9.042GHz input.

Fig. 15.

Power breakdown of the proposed two-rank decimator.

표 1.

V. CONCLUSIONS

  In this paper, we propose a two-rank decimation technique that can perform decimation in two stages to alleviate the speed burden of the MUX required for performance evaluation of TI ADCs and to facilitate real-time measurement environments. The proposed technique is verified using a 6-bit 20Gs/s 16-channel TI ADC architecture using a 40-nm CMOS process. The proposed decimation technique enables real-time data analysis without the need to accumulate and store data, unlike memory-based data acquisition methods, which require data accumulation and storage. Therefore, the proposed two-rank decimation technique could be implemented in an area of 0.007mm$^2$, compared to 0.09mm$^2$ of the memory-based data acquisition method implemented together in the same chip.

ACKNOWLEDGMENTS

  This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT)(RS-2023-00274028). The EDA tool was supported by IC Design Education Center (IDEC), Korea. This work was supported by the Korea Basic Science Institute (National research Facilities and Equipment Center) grant funded by the Korea government (MSIT)(No. RS-2024-00404783).

References

[1]

E. -S. Lee, S. -H. Lee, C. -H. Pyo, H. -S. Kim and J. -D. Han, “A 2-GS/s 6-bit Single-channel Speculative Loop-unrolled SAR ADC with Low-overhead Comparator Offset Calibration in 28-nm CMOS” J. Semiconductor Technology and Science, Vol. 24, No. 4, pp. 355-364, May. 2024. [CrossRef]

[2]

H. -Y. Jung, W. -K. Do, C.-W. Park, J. -H. Ko, Y. -C. Jang, “A 12-bit 10-MS/s Pipelined SAR ADC Sharing Flash ADC and Residue Amplifier of Multiplying DAC” J. Semiconductor Technology and Science, Vol. 24, No. 2, pp. 128-137, Jan. 2024. [CrossRef]

[3]

K. -H. Kim, J. -H. Baek, J. -H. Kim, H. -I. Chae, “Time-interleaved Noise-shaping SAR ADC based on CIFF Architecture with Redundancy Error Correction Technique” J. Semiconductor Technology and Science, Vol. 21, No. 5, pp. 297-303, Oct. 2021. [CrossRef]

[4]

Y. Krupnik, Y. Perelman, I. Levin, Y. Sanhedrai, R. Eitan, and A. Khairi, “112-Gb/s PAM4 ADC-based SERDES receiver with resonant AFE for long-reach channels,” IEEE Journal of Solid-State Circuits, vol. 55, no. 4, pp. 1077- 1085, April 2020. [CrossRef]

[5]

A. Khairi, Y. Krupnik, A. Laufer, Y. Segal, M. Cusmai, and I. Levin, “A 1.41-pJ/b 224-Gb/s PAM4 6-bit ADC-based SerDes receiver with hybrid AFE capable of supporting long reach channels,” IEEE Journal of Solid-State Circuits, vol. 58, no. 1, pp. 8-18, January 2023. [CrossRef]

[6]

J. Im, K. Zheng, C.-H. A. Chou, L. Zhou, J.W. Kim, and S. Chen, “A 112-Gb/s PAM-4 long-reach wireline transceiver using a 36-way time-interleaved SAR ADC and inverterbased RX analog front-end in 7-nm FinFET,” IEEE Journal of Solid-State Circuits, vol. 56, no. 1, pp. 7-18, January 2021. [CrossRef]

[7]

B.-J. Yoo, D.-H. Lim, H. Pang, J.-H. Lee, S.-Y. Baek, and N. Kim, “6.4 A 56Gb/s 7.7mW/Gb/s PAM-4 wireline transceiver in 10 nm FinFET using MM-CDR-based ADC timing skew control and low-power DSP with approximate multiplier,” Proc. of 2020 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, pp. 122-124, 2020. [CrossRef]

[8]

V. H.-C. Chen and L. Pileggi, “22.2 A 69.5mW 20GS/s 6b time-interleaved ADC with embedded time-to-digital calibration in 32nm CMOS SOI,” Proc. of 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA, pp. 380-381, 2014. [CrossRef]

[9]

Y. Cao, M. Zhang, Y. Zhu, R. P. Martins, and C.-H. Chan, “A 12-GS/s 12-b 4× time-interleaved ADC using inputindependent timing skew calibration with global dither injection and linearized input buffer,” IEEE Journal of Solid- State Circuits, vol. 59, no. 12, pp. 4211-4224, December 2024. [CrossRef]

[10]

B. Xu, Y. Zhou, and Y. Chiu, “A 23-mW 24-GS/s 6- bit voltage-time hybrid time-interleaved ADC in 28-nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 1091-1100, April 2017. [CrossRef]

[11]

D.-R. Oh, M.-J. Seo, and S.-T. Ryu, “A 7-bit two-step flash ADC with sample-and-hold sharing technique,” IEEE Journal of Solid-State Circuits, vol. 57, no. 9, pp. 2791- 2801, September 2022. [CrossRef]

[12]

D.-R. Oh, J.-I. Kim, M.-J. Seo, J.-G. Kim, and S.-T. Ryu, “A 6-bit 10-GS/s 63-mW 4× TI time-domain interpolating flash ADC in 65-nm CMOS,” Proc. of ESSCIRC Conference 2015 - 41st European Solid-State Circuits Conference (ESSCIRC), Graz, Austria, 2015. [CrossRef]

[13]

Y. Zhu, T. Liu, S. K. Kaile, S. Kiran, I.-M. Yi, and R. Liu, “A 38-GS/s 7-bit pipelined-SAR ADC with speed - Enhanced bootstrapped switch and output level shifting technique in 22-nm FinFET,” IEEE Journal of Solid-State Circuits, vol. 58, no. 8, pp. 2300-2313, August 2023. [CrossRef]

[14]

S. Cai, E. Z. Tabasy, A. Shafik, S. Kiran, S. Hoyos, and S. Palermo, “A 25 GS/s 6b TI two-stage multi-bit search ADC with soft-decision selection algorithm in 65 nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 52, no. 8, pp. 2168-2179, August 2017. [CrossRef]

[15]

Y. Duan and E. Alon, “A 6b 46GS/s ADC with > 23GHz BW and sparkle-code error correction,” Proc. of 2015 Symposium on VLSI Circuits (VLSI Circuits), Kyoto, Japan, pp. C162-C163, 2015. [CrossRef]

[16]

K. Sun, G. Wang, Q. Zhang, S. Elahmadi, and P. Gui, “A 56-GS/s 8-bit time-interleaved ADC with ENOB and BW enhancement techniques in 28-nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 54, no. 3, pp. 821-833, March 2019. [CrossRef]

[17]

A. S. Yonar, P. A. Francese, M. Brändli, M. Kossel, T. Morf, and J. E. Proesel, “An 8-bit 56GS/s 64× timeinterleaved ADC with bootstrapped sampler and class-AB buffer in 4nm CMOS,” Proc. of 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Honolulu, HI, USA, pp. 168-169, 2022. [CrossRef]

[18]

L. Kull, T. Toifl, M. Schmatz, P. A. Francese, C. Menolfi, and M. Braendli, “22.1 A 90GS/s 8b 667mW 64× interleaved SAR ADC in 32nm digital SOI CMOS,” Proc. of 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA, pp. 378-379, 2014. [CrossRef]

[19]

L. Kull, D. Luu, C. Menolfi, M. Braendli, P. A. Francese, and T. Morf, “A 24-to-72GS/s 8b time-interleaved SAR ADC with 2.0-to-3.3pJ/conversion and > 30dB SNDR at nyquist in 14nm CMOS FinFET,” Proc. of 2018 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, pp. 358-360, 2018. [CrossRef]

[20]

D.-S. Jo, B.-R.-S. Sung, M.-J. Seo, W.-C. Kim, and S.-T. Ryu, “A 40-nm CMOS 7-b 32-GS/s SAR ADC with background channel mismatch calibration,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 4, pp. 610-614, April 2020. [CrossRef]

[21]

D.-R. Oh, A 6-bit 20 GS/s time-interleaved two-step flash ADC in 40 nm CMOS,” Electronics, vol. 11, no. 19, 3052, 2022. [CrossRef]

저자소개

Sang-Won Oh

received his B.S. degree in electronic engineering from Jeju National University, located in Jeju, South Korea, in 2024. He is currently pursuing a master’s degree in the Jeju National University, Faculty of Applied Energy System, Major of Electronic Engineering. His current research interests include high-speed analog-to-digital converters (ADC) and digital calibration for time-interleaved ADCs.

Dong-Ryeol Oh

received his B.S. degree in electronic engineering from Soongsil University, Seoul, South Korea, in 2013, and a Ph.D. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2019. From 2019 to 2023, he was with Samsung Electronics, Hwaseong, South Korea, where he was a Staff Engineer, focusing on the design of high-speed analog front end (AFE) including data converters [analog-to-digital converter (ADC)/digitalto- analog converter (DAC)] for the development of wireless communication systems. Since 2023, he has been with the Department of Electronic Engineering and the Faculty of Applied Energy System, Major of Electronic Engineering at Jeju National University, Jeju, South Korea, where he is currently an Assistant Professor. His research interests include analog and mixed-signal IC design with a focus on power-efficient and high-speed data converters.