20Gb/s Energy Efficient and Linearity Enhanced Integrated Summer Latch-based PAM-4
DFE
Seung-Heon An1
Jin-Ku Kang1
-
(Department of Electronics Engineering, Inha University, Korea (the Republic of))
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Integrated summer latch (ISL), decision feedback equalizer (DFE), linearity, low power, PAM-4
I. INTRODUCTION
The increasing demand for high-speed and power-efficient receivers necessitates effective
solutions to improve signal integrity. Decision Feedback Equalizers (DFE) are employed
to alleviate frequency-dependent channel losses. To meet the timing constraints of
the first tap and reduce power consumption in DFE, an integrated summer latch (ISL)
architecture has been proposed, which eliminates the transconductance cell used as
a summer in closed-loop DFEs [1].
Various ISL DFE architectures have been introduced to minimize loop delay and power
consumption. For high-speed operation, ISL DFE utilizing current mode logic (CML)
latches has been proposed [2]. Additionally, DFE employing dynamic CML comparators has been suggested to reduce
$T_{ckq}$by varying the resistive load during the track and regenerate phases [3]. To further decrease power consumption, ISL structures replacing CML comparators
with dynamic comparators, such as strong-arm or double-tail latches, have been proposed
[4,
5]. Moreover, dynamic comparator-based ISL architecture for PAM-4 signals has been introduced
to support high-speed operation [5].
Fig. 1. (a) Closed-loop DFE, and (b) integrated summer latch DFE (both in half data
rate architecture) [1].
Figs. 1(a) and 1(b) illustrate the half-rate architecture of a 1-tap closed-loop DFE and a 1-tap integrated
summer latch DFE, respectively.
In a conventional closed-loop DFE, ISI at the output of transconductance cell should
be minimized, which is used as the slicer input, and this step should be done before
the next sampling of slicer.
To allow the "minimizing ISI" and "sampling" processes to occur simultaneously, an
integrated summer latch architecture has been proposed. The time constraints of the
closed-loop DFE can be expressed as follows.
$T_{ckq}$ is the clock-to-Q delay of the slicer, $T_{setup}$ is the setup time of
the slicer, and $T_{settle}$ is the time required for the summation node to be stabilized.
However, since the ISL DFE performs settling and amplification simultaneously, $T_{setup}$
and $T_{settle}$ can be merged, thereby relaxing the time constraint.
The major difference between a slicer for NRZ input signals and PAM-4 input signals
lies in the reference differential pairs used for each level decision. These differential
pairs cause variations in the preamplifier outputs(summing node) of the high, middle,
and low slicers. Consequently, for the same tap weight and previous decision values,
the high, middle, and low slicers exhibit different tap currents, respectively. This
leads to linearity degradation.
To address this issue, in this paper a novel PAM-4 ISL DFE architecture is proposed.
The proposed architecture removes the reference differential pairs that affect the
summing node output in conventional PAM-4 ISL DFEs, and instead applies the reference
differential pairs at the latch stage. This approach minimizes equalization differences
among slicers at each level while integrating the preamplifiers into a single unit,
thereby reducing area and power consumption.
This paper is organized as follows: Section II describes the architecture and operation
of the proposed DFE and compares it with the conventional structures. Section III
presents the simulation results and summarizes the performance of the proposed design.
Section IV concludes.
II. PROPOSED PAM-4 INTEGRATED SUMMER LATCH DFE
1. Proposed PAM-4 ISL DFE
Fig. 2 shows the top block diagram of the proposed architecture. The differential input
($V_{in}$) and the feedback voltages X, X_B, Y, and Y_B generated by the FIR taps
are applied to the preamplifier of the slicer. The RZ signal determined by the slicer
and the NRZ signal output from the SR latch are fed into the FIR taps to generate
the feedback voltages. In the preamplifier, both the feedback voltages and the input
signal are applied simultaneously to equalize the summing node output. The 3-bit NRZ
output from the SR latch is then decoded into the LSB and MSB by the thermometer-to-binary
(T2B) converter.
Fig. 2. Top block diagram of proposed PAM-4 ISL DFE.
Fig. 3 shows the schematic of preamplifier and latches of proposed DFE slicer. The proposed
slicer is based on a double-tail latch [6]. FIR tap feedback voltages are applied through M15 and M16 to the summing nodes (A
and B). The preamplifier equalizes these nodes by combining the input signal with
the FIR feedback voltages.
Fig. 3. Schematic of preamplifier and latches of proposed PAM-4 ISL DFE slicer.
Common mode voltage of Vth+ and Vth- is same as input common mode voltage, and amplitude
of Vth+ and Vth- on each high, middle, low latches are set to +$V_{A,REF}$, 0, -$V_{A,REF}$.
And the amplitude of reference voltage value ($V_{A,REF}$) should be set appropriately.
M9 and M10 receive Vth+ and Vth- as inputs. M11 and M12 receive the same preamplifier
output, but at the start of the evaluation phase, both M9 and M10 are in the triode
region, where they show different resistance values depending on the reference voltage
levels. This causes different currents to flow out of the latch output nodes, so the
three latches determine their levels for the same preamplifier output.
M13 and M14 are used to reset OUTP and OUTN during the pre-charge phase.
Since the preamplifier drives three latches, preamplifier size of proposed DFE may
larger than preamplifier of conventional PAM-4 ISL DFE. However, because proposed
DFE implemented as half-rate architecture, the effect of kickback noise is not causing
any functional problem due to sampling interval increases.
2. Comparison of Conventional PAM-4 ISL DFE with Proposed DFE
In conventional PAM-4 ISL DFEs, FIR taps must be implemented separately for each slicer,
which increases hardware complexity. Figs. 4(a) and 4(b) show the conventional PAM-4 ISL DFE structure and the proposed structure, respectively.
The conventional design requires three preamplifiers, FIR taps for each preamplifier,
and three latches. In contrast, the proposed architecture integrates the preamplifiers
into a single unit, allowing the FIR tap to also be implemented only once. As a result,
the number of transistors associated with increasing the number of taps can be reduced
by approximately 3.6 times compared to the conventional PAM-4 ISL DFE as listed in
Table 1. The proposed structure reduces overall area by integrating preamplifiers. Both the
number of transistors and the area were reduced in proposed DFE.
Fig. 4. (a) Conventional PAM-4 ISL DFE, and (b)proposed DFE structure(full data rate
cases for clarity).
Table 1. Number of transistors and area per tap in the PAM-4 ISL DFE.
|
|
1 tap
|
2 tap
|
|
# of TR
|
area
|
# of TR
|
area
|
Conv. PAM-4 ISL DFE [5] |
87
|
105 $\mu$m2
|
120
|
129 $\mu$m2
|
Proposed PAM-4 ISL DFE
|
49
|
95 $\mu$m2
|
58
|
107 $\mu$m2
|
Fig. 5 shows the schematics of the preamplifiers in the conventional PAM-4 ISL DFE and the
proposed DFE structure. Unlike the conventional PAM-4 ISL DFE preamplifier in Fig. 5(a), which employs a differential pair that receives the reference voltages (Vth), the
proposed preamplifier in Fig. 5(b) removes this differential pair.
Fig. 5. (a) Conventional PAM-4 ISL DFE preamplifier, and (b) proposed DFE preamplifier.
Fig. 5(b) shows the preamplifier and FIR tap of the proposed DFE. The preamplifier receives
input and the feedback voltage generated by the FIR tap. The FIR tap has a CML structure
and consists of load resistors ($R_L$) and differential pairs that receive the previous
decision values (D(n-1), D($n-1$)) as inputs.
$V_{TAP}$ is a value similar to the common-mode voltage ($V_{CM}$) of $V_{in}$. FIR_B
is the voltage used for FIR tap weighting. $I_{TAP+}$ and $I_{TAP-}$ denote the currents
drawn from the summing node by the DFE tap.
In the conventional PAM-4 ISL DFE preamplifier, the FIR tap structure increases the
capacitive load of summing nodes A and B as the number of taps increases, which deteriorates
the preamplifier's delay and power consumption [5,
7].
Fig. 6 shows the power consumption of the conventional and proposed slicers as $C_{TAP}$
is increased, assuming an increase in the number of FIR taps. In the proposed structure,
power consumption remains constant at 650 $\mu$W with increasing $C_{TAP}$, whereas
in the conventional structure it varies from 350 $\mu$W to approximately 3 mW.
Fig. 6. Comparison of power consumption of PAM-4 ISL DFE with variation of $C_{TAP}$.
This demonstrates the advantage of the reduced tap loading structure in terms of power
consumption with increasing tap numbers. The power consumption of the preamplifier
is proportional to $C_A V_{DD}^2$. $C_A$ represents the total capacitance of node
A at the summing node. So increasing the number of FIR taps affecting nodes A and
B is undesirable for power consumption. Therefore, a reduced tap loading structure
is applied to lower power consumption.
In the conventional PAM-4 ISL preamplifier, the differential pair with the reference
voltage produces three different summing node (A, B) outputs for the high, middle,
and low levels, respectively. This results in various current differences($I_{TAP+}
- I_{TAP-}$) at summing node in the high, middle, and low slicers for the same input,
decision values, and tap weights. Fig. 7 shows the variation of the current differences ($I_{TAP+} - I_{TAP-}$) as a function
of the difference of reference voltage (Vth). In the conventional PAM-4 ISL DFE structure,
when the amplitude of reference voltages applied to the high, middle, and low slicers
is set to -150 mV, 0 mV, and +150 mV, respectively, the differential currents flowing
through the summing node due to the FIR tap, are 24 $\mu$A, 21 $\mu$A, and 18 $\mu$A
under identical previous decision values, tap weights, and input conditions. The difference
in current increases with the differential value of the reference voltage in conventional
PAM-4 ISL DFE. Consequently, the equalization is applied differently for each level,
degrading linearity. In contrast, the proposed structure removes the differential
pair and integrates the preamplifier into a single unit, eliminating the effect of
the reference voltage differentials on the current difference at tap.
Fig. 7. Tap current comparison between conventional PAM-4 ISL DFE and proposed DFE.
3. Equalization Nonlinearity Improvement
The decision thresholds in PAM-4 are denoted as $V_{TH,High}$, $V_{TH,Mid}$, and $V_{TH,Low}$.
Changes in the reference voltage affect the summing node outputs, and the resulting
variations in the tap current differences influence the shift of the threshold voltages.
The threshold voltage change is proportional to the current flowing through the FIR
taps from the summing node [8].
To compare the linearity of equalization, the nonlinearity of equalization is defined
as follows.
This is illustrated as a function of middle threshold variation in Fig. 8. In PAM-4, the previously decided values can be 000, 001, 011, or 111, and the current
difference varies according to the tap weights. For a 1-tap configuration, the conventional
PAM-4 ISL DFE and the proposed structure show four nonlinearity values corresponding
to each decision value. The absolute value of equalization nonlinearity is largest
when the decision value is 111 or 000, and relatively smaller when the decision value
is 001 or 011. The proposed DFE reduces equalization nonlinearity by about threefold
compared to the conventional PAM-4 ISL DFE as shown in Fig. 8.
Fig. 8. Equalization nonlinearity comparison between conventional PAM-4 ISL DFE [5] and proposed PAM-4 ISL DFE.
In contrast to the conventional PAM-4 ISL DFE, where both the input and the reference
differential pair affect the summing node, the proposed DFE removes the influence
of the reference differential pair, resulting in improved linearity. Section III demonstrates
the improvement in equalization nonlinearity through a comparison of the horizontal
eye openings between the conventional PAM-4 ISL DFE and the proposed DFE.
III. EXPERIMENTAL RESULTS
1. Timing Diagram and Time Constraints
Fig. 9 shows the timing diagram of the proposed structure. Signals of $OUT_H$, $OUT_M$,
and $OUT_L$ are the high, middle, and low output signals of the slicer, respectively.
Y and Y_B represent the feedback voltages. The timing constraint for the first loop
is given as follows.
$T_{ckq}$ is the clock-to-Q delay of the slicer, and $T_{FIR}$ is the time required
for the feedback voltage to settle. In the simulation, the maximum clock-to-Q delay
of the slicer is 44 ps. When the resistance of the FIR tap is about 200 $\Omega$,
the delay for the feedback voltage to settle is approximately 15 ps. At a 20 Gb/s
PAM-4 half-rate, 1 UI corresponds to 100ps, resulting in a worst-case timing margin
of approximately 41ps.
Fig. 9. Timing diagram and simulated timing constraint.
2. Performance Comparison between Conventional PAM-4 ISL DFE and Proposed DFE
The AC response of the channel used in the simulation exhibits a loss of -6.2 dB at
the Nyquist frequency of 5 GHz (Fig. 10(a)). The signal swing after passing through the channel has a single-ended voltage of
300 mV with almost closed eye (Fig. 10(b)).
Fig. 10. (a) channel AC response, and (b) eye-diagram after channel.
Fig. 11 shows the eye diagrams of the average input plus feedback voltage and the summing
node for the proposed PAM-4 ISL DFE in 1-tap and 2-tap configurations. In the eye
diagram of the average input and feedback voltage, the vertical eye opening is 38mV
for the 1-tap structure and 50mV for the 2-tap structure (Fig. 11(a)). This difference is also reflected in the vertical eye opening of the summing node,
which is 180 mV and 240 mV for the 1-tap and 2-tap structures, respectively, indicating
a larger eye opening for the 2-tap configuration (Fig. 11(b)).
Fig. 11. Eye diagram of diff. average of input + feedback voltage with (a) 1 FIR tap
(b) 2 FIR tap, and eye diagram of summing node (a) with 1 FIR tap (d) with 2 FIR tap.
Fig. 12 shows the simulated BER bathtub plots for the conventional PAM-4 ISL DFE and the
proposed DFE structures with one and two tap operation. Due to time limitations, the
bathtub plot was measured up to BER of 10-4.5. The measurements were taken over a 215 - 1 cycle PRBS-15 pattern using the same channel-transmitted input signal. A comparison
of the horizontal eye opening at a BER of 10-4.5 between the conventional PAM-4 ISL DFE and the proposed DFE demonstrates an improvement
in linearity for the proposed structure.
Fig. 12. BER Bathtub Plot of Conventional PAM-4 ISL DFE and Proposed DFE (2tap).
Table 2 shows the power consumption and the horizontal eye opening at a BER of 10-4.5for the conventional PAM-4 ISL DFE [5] and the proposed structure with one and two taps. The proposed structure achieves
about 20% lower power consumption and a 30% improvement in horizontal eye opening
compared to the conventional design under one and two tap operation.
Table 2. Power and horizontal eye opening of conventional and proposed PAM-4 ISL DFE.
|
|
Conventional PAM-4 ISL DFE [5] (1Tap)
|
Conventional PAM-4 ISL DFE [5] (2Tap)
|
Proposed DFE (1Tap)
|
Proposed DFE (2Tap)
|
|
Slicer
|
691$\mu$W
|
759$\mu$W
|
496.8$\mu$W
|
497.1$\mu$W
|
|
FIR Tap
|
-
|
-
|
99.92$\mu$W
|
122.1$\mu$W
|
|
Total
|
691$\mu$W
|
759$\mu$W
|
596.7$\mu$W
|
619.2$\mu$W
|
Horizontal eye opening for BER of LSB = 10-4.5
|
0.23UI
|
0.34UI
|
0.32UI
|
0.43UI
|
The advantages of the proposed DFE, which incorporates a reduced tap loading structure
and integrates the preamplifier, can be summarized as follows. First, it simplifies
the hardware.
Second, it reduces power consumption compared to the conventional PAM-4 ISL DFE. Third,
it improves linearity, resulting in a wider horizontal eye opening in terms of BER.
As the number of DFE taps increases, the benefits in hardware simplification and power
reduction are expected to become even more significant compared to the conventional
PAM-4 ISL DFE.
Fig. 13 shows the layout of the proposed DFE. Each DFE slicer consists of a preamplifier,
FIR tap, three latches, and three SR latches.
Fig. 13. Layout of proposed DFE.
The simulated eye diagrams of the average input+ feedback voltage with the DFE turned
ON and OFF are shown in Figs. 14(a) and 14(b).When the DFE is ON, the vertical eye opening is 44 mV for the high, middle, and low
levels. In addition, the eye diagram of the summing node exhibits a vertical eye opening
of 197 mV (Fig. 14 (d)).
Fig. 14. Post-layout simulated eye diagram of diff. average of input + feedback voltage
(a) without equalization (b) with equalization and eye diagram of summing node (c)
without equalization (d) with equalization.
Fig. 15 shows the BER bathtub plots with the DFE turned ON and OFF. The proposed structure
exhibits a horizontal eye opening of 0.4 UI when the DFE is ON.
Fig. 15. Post-layout simulated BER bathtub plot of proposed PAM-4 ISL DFE (2 tap).
Fig. 16 shows Monte-Carlo simulation of MSB offset and LSB offset of proposed DFE slicer.
In the proposed DFE slicer, both Monte Carlo simulations were performed using 200
samples. The standard deviation of the offset is 60.67 mV for LSB and 36.2 4mV for
MSB
Fig. 16. Monte-Carlo simulation results for MSB and LSB offset of proposed DFE slicer.
Table 3 compares various ISL DFEs. In this work, the horizontal eye opening of the ISL DFE
receiving 20 Gb/s PAM-4 signals has been improved to approach that of an NRZ ISL DFE.
The measured horizontal eye-opening data from the references in Table 3 were adjusted to the BER of 10-4.5. Furthermore, it demonstrates superior power efficiency.
Table 3. Performance comparison.
|
Reference
|
This work
(simulated)
|
JSSC' 17 [5] |
JSSC'21 [10] |
JSSC'19 [2] |
TCASI'22 [3] |
TCASII' 24 [4] |
TCASII' 20 [9] |
|
signaling
|
PAM-4
|
PAM-4
|
PAM-4
|
PAM-4
|
PAM-4
|
NRZ
|
NRZ
|
|
Technology
|
45nm
|
45nm
|
28nm
|
65nm
|
65nm
|
65nm
|
65nm
|
|
Clocking
|
Half-rate
|
Half-rate
|
Half-rate
|
Quarter-rate
|
Quarter-rate
|
Half-rate
|
Quarter-rate
|
|
Equalization
|
2-FIR
|
1-FIR
|
CTLE/2-FIR
|
CTLE/1-FIR/1- IIR
|
CTLE/4-FIR
|
2-FIR
|
1-FIR/ 1-IIR
|
|
Supply(V)
|
1.1
|
1.1
|
0.9
|
1
|
1.2
|
1.2
|
1
|
|
Efficiency (pJ/bit)
|
0.16
|
0.18
|
0.43
|
3.6
|
2.85
|
0.23
|
0.43
|
|
Data Rate (Gbps)
|
20
|
20
|
60
|
56
|
56
|
4
|
10.8
|
|
Channel loss
|
6dB @5G
|
6dB @5G
|
8.2dB @15G
|
16.1dB @14G
|
20dB @14G
|
18dB @2G
|
26dB @5.4G
|
|
PRBS length
|
15
|
15
|
31
|
15
|
7
|
15
|
7
|
Horizontal Eye opening for BER of LSB = 10-4.5
|
0.4UI
|
0.21UI
|
0.36UI
|
0.38UI
|
0.16UI
|
0.44UI
|
0.44UI
|
IV. CONCLUSIONS
The proposed PAM-4 DFE structure addresses the issue in conventional PAM-4 ISL DFEs
where the differential pair receiving the reference voltage affects the summing node,
causing variations in tap currents and degrading linearity. By relocating the differential
pair to the latch stage, the proposed structure ensures a constant tap current regardless
of the differential value of the reference voltage.
Additionally, the preamplifiers of the slicers are integrated into a single unit,
simplifying the hardware. The reduced tap loading structure is also applied to PAM-4,
mitigating the power increase associated with additional taps and improving power
efficiency.
Simulation results demonstrate an improvement in the horizontal eye opening in the
BER bathtub plot, along with reduced power consumption.
ACKNOWLEDGEMENTS
This work was supported by Inha University. Authors also thank the IDEC program and
for its hardware and software assistance for the design and simulation.
REFERENCES
Lu Y. , Alon E. , 2013, Design techniques for a 66 Gb/s 46 mW 3-tap decision feedback
equalizer in 65 nm CMOS, IEEE Journal of Solid-State Circuits, Vol. 48, No. 12, pp.
3243-3257

Roshan-Zamir A. , Iwai T. , Fan Y.-H. , Kumar A. , Yang H.-W. , Sledjeski L. , Hamilton
J. , Chandramouli S. , Aude A. , Palermo S. , 2019, A 56-Gb/s PAM4 receiver with low-overhead
techniques for threshold and edge-based DFE FIR- and IIR-tap adaptation in 65-nm CMOS,
IEEE Journal of Solid-State Circuits, Vol. 54, No. 3, pp. 672-684

Wang D. , Wang Z. , Xu H. , Wang J. , Zhao Z. , Zhang C. , Wang Z. , Chen H. , 2022,
A 56-Gbps PAM-4 wireline receiver with 4-tap direct DFE employing dynamic CML comparators
in 65 nm CMOS, IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 69,
No. 3, pp. 1027-1040

Prusty S. K. , Surya V. K. , Wary N. , 2024, Energy efficient integrated summer and
latch-based DFE with reduced tap loading, IEEE Transactions on Circuits and Systems
II: Express Briefs, Vol. 71, No. 4, pp. 1779-1783

Roshan-Zamir A. , Elhadidy O. , Yang H.-W. , Palermo S. , 2017, A reconfigurable 16/32
Gb/s dual-mode NRZ/PAM4 SerDes in 65-nm CMOS, IEEE Journal of Solid-State Circuits,
Vol. 52, No. 9, pp. 2430-2447

Schinkel D. , Mensink E. , Klumperink E. A. M. , van Tuijl E. , Nauta B. , 2007, A
double-tail latch-type voltage sense amplifier with 18 ps setup+hold time, Proc. of
2007 IEEE International Solid-State Circuits Conference Digest of Technical Papers,
pp. 314-315

Lee D. , Lee D. , Kim Y.-H. , Kim L.-S. , 2019, A 0.9-V 12-Gb/s two-FIR-tap direct
DFE with feedback-signal common-mode control, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, Vol. 27, No. 3, pp. 724-728

Son S. , Kim H. , Park M.-J. , Kim K.-H. , Chen E.-H. , Leibowitz B. S. , Kim J. ,
2013, A 2.3-mW, 5-Gb/s low-power decision-feedback equalizer receiver front-end and
its two-step, minimum bit-error-rate adaptation algorithm, IEEE Journal of Solid-State
Circuits, Vol. 48, No. 11, pp. 2693-2704

Lee D. , Lee D. , Kim Y.-H. , Jeon H.-K. , Kim B.-G. , Kim L.-S. , 2020, A 10.8 Gb/s
quarter-rate 1 FIR 1 IIR direct DFE with non-time-overlapping data generation for
4:1 CMOS clockless multiplexer, IEEE Transactions on Circuits and Systems II: Express
Briefs, Vol. 67, No. 1, pp. 67-71

Chen K.-C. , Kuo W. W.-T. , Emami A. , 2021, A 60-Gb/s PAM4 wireline receiver with
2-tap direct decision feedback equalization employing track-and-regenerate slicers
in 28-nm CMOS, IEEE Journal of Solid-State Circuits, Vol. 56, No. 3, pp. 750-762

Seung-Heon An was received the B.S. degree in electronic engineering from Inha University,
Incheon, South Korea, in 2024. He is currently pursuing an M.S degree in electrical
and computer engineering with Inha University. His research interests include PLL/CDR,
Equalizer, highspeed serial interface, and transceiver design for PAM signaling.
Jin-Ku Kang received the Ph.D. degree in electrical and computer engineering from
North Carolina State University, Raleigh, NC, USA, in 1996.,From 1983 to 1988, he
was with Samsung Electronics, Inc., South Korea, where he was involved in memory design.
In 1988, he was with Texas Instruments, South Korea. From 1996 to 1997, he was with
Intel Corp., Portland, OR, USA, as a Senior Design Engineer, where he was involved
in high- speed I/O and timing circuits for processors. Since 1997, he has been with
Inha University, Incheon, South Korea, where he is currently a professor and leads
the System IC Design Laboratory in the Department of Electronics Engineering. His
research interests include high-speed/low-power mixed-mode circuit design for high-speed
serial interfaces.