A time-based low-power transceiver is proposed for short-reach PCB interconnect which connects two chips closely placed on a printed circuit board (PCB). This was achieved by reducing the I/O signaling power of transmitter (TX) with the increase of on-die termination (ODT) resistance. The short-reach PCB interconnect is approximated as a single-time-constant RC channel due to the large ODT resistance. The increase of inter-symbol interference (ISI) by the increased R-C time constant of channel was compensated by using a 1-tap infinite-impulse-response (IIR) decision-feedback equalizer (DFE) at receiver (RX). The RX circuit is composed of a cascaded connection of a voltage-to-time converter, an IIR DFE, a FIR DFE and a time comparator. The transceiver chip was implemented in 65 nm CMOS technology; in tests with a 1.6-mm micro-strip line channel the transceiver achieved maximum data-rate of 12 Gb/s at 0.8 V supply and minimum energy efficiency of 0.37 pJ/b at 8 Gb/s and 0.75 V supply.

※ The user interface design of www.jsts.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

### Journal Search

## I. INTRODUCTION

Demands for low-power high bandwidth interface between DRAM and processor keep on
increasing with the advent of cloud computers and deep-learning. Recently, a high-bandwidth
memory (HBM) was introduced to meet these demands ^{(1)}. The HBM increases the memory bandwidth by using hundreds of parallel data lines
and the data-rate of single data line is a few Gb/s. The through-silicon-via (TSV)
technology is used in HBM to build a short-reach interconnect with small capacitance
loading for each data line. No termination resistance is used in HBM to reduce the
signaling power of transmitter (TX) driver.

However, because the HBM technology is costly, a short-reach PCB interconnect between
DRAM and ASIC chips can be used to implement an inexpensive deep-learning accelerator;
a DRAM chip is placed on one side of printed circuit board (PCB) and an ASIC chip
is placed on the opposite side of PCB and then the PCB with two chips is placed in
a package for system-in-package (SIP) application ^{(2)}. This work was initiated to implement a short-reach PCB interconnect economically
without using HBM. Only CMOS inverters are used for transmitter and receiver in HBM
because of very small capacitance of TSV. However, a larger capacitance is associated
with the short-reach PCB interconnect of this work; the bonding-wire pad capacitance
and the PCB transmission line capacitance contribute to the capacitance of the short-reach
PCB interconnect. Because of the larger interconnect capacitance, this work requires
termination and equalization, which are not necessary with HBM. The short-reach PCB
interconnect can be used for general chip-to-chip interface on PCB if two signal pins
to be connected are located very close from each other (Fig. 1(a)). When the interconnect length is much shorter than the wavelength of the highest
signal frequency (0.35/T_{R}), it can be approximated as RLC lumped circuit rather than as transmission line ^{(3)}; T_{R} is the 10%-to-90% signal rise time. The short-reach PCB interconnect can use on-die-termination
(ODT) resistance which is much larger than 50 Ω; the ODT resistance at TX and receiver
(RX) reduces the ringing due to the inductive component such as bonding wire. The
short-reach PCB interconnect with ODT is modeled in Fig. 1(b). R_{TX} and R_{RX} represent the ODT resistance at TX and RX, respectively; total inductance (L) includes
the inductance of interconnect channel (L_{CH}) and bonding wire (L_{BOND}), and total capacitance (C) includes the capacitance of interconnect channel (C_{CH}) and chip pin (C_{PAD}).

Fig. 1. (a) Short-reach interface implemented on PCB, (b) Lumped equivalent circuit of short-reach PCB interconnect.

_{TX.IN}and R

_{TX}) in Fig. 1(b) is a voltage mode driver, which swings between 0 and V

_{DD}(supply voltage). As V

_{DD}is reduced, the TX power is reduced but the RX power is increased because the voltage swing at RX input is also reduced proportionally and hence a higher-gain RX circuit is required to recover digital data from the reduced-swing RX input. While the RX gain is high at high V

_{DD}in the conventional voltage-based circuits, it is high at low V

_{DD}in the time-based (TB) circuits

^{(4)}; this is due to the voltage-to-time converter (VTC) used as the pre-amplifier of the TB RX circuit. The VTC gain is proportional to C/gm which increases as V

_{DD}is reduced. This property of VTC helps to reduce the TX power as well as the RX power in the TB circuits. Several TB RX circuits are published for low-power operation

^{(4-}

^{6)}.

When the transfer function of interconnect is a single-time-constant equation, its
single-bit response is an exponentially decaying waveform and hence its ISI can be
compensated completely by a 1-tap IIR DFE. However, the single-bit response of the
short-reach PCB interconnect (Fig. 1(a)) is a combination of an exponentially decaying waveform with single time constant
and a ringing waveform. With the increase of R_{TX} and R_{RX} in Fig. 1(b), its single-bit response looks like an exponentially decaying waveform with small
ringing terms. Besides, the large R_{TX} and R_{RX} reduces the TX power. In this work, a 1-tap IIR DFE is used at RX with large R_{TX} and R_{RX}.

Section II explains the transmission channel for the short-reach PCB interconnect used in this work. Section III describes the operating principle of the proposed TB IIR DFE circuit. Section IV presents the circuit implementation. Section V shows the measurement results. Section VI concludes this work.

## II. 1-TAP IIR DFE FOR SHORT-REACH PCB INTERCONNECT

To achieve a short-reach PCB interconnect, TX and RX chips are placed on a PCB by
using a chip-on-board (COB) package technique; TX and RX chips are connected through
a 1.6 mm micro-strip transmission line on the PCB (Fig. 1(a)). The 10%-to-90% rise time (T_{R}) of the signal applied to the micro-strip line is ~ 60 ps; the highest frequency
with significant energy is 5.83 GHz (0.35/T_{R}) with the wavelength of 25.7 mm ^{(3)}. Because the micro-strip line is shorter than one eighth the wavelength of the highest
frequency (3.2 mm), the transmission channel can be approximated as a lumped circuit
(Fig. 1(b)). R_{TX} and R_{RX} are the termination resistance of TX and RX, respectively. The capacitance (C) is
2.0 pF, which includes the chip pin capacitance (2×0.9 pF) and the micro-strip line
capacitance (0.2 pF). The inductance (L) is 1 nH, which is the sum of the series inductance
of two 0.5-mm long double bonding wires (0.5 nH) and the micro-strip line inductance
(0.5 nH). The ratio of R_{TX} to R_{RX} is set to 3 to 1 to keep the V_{RX.IN} swing between 0 and V_{DD}/4. The small channel swing reduces the dynamic power consumption of TX driver, but
requires a high-gain high-power RX front-end circuit in conventional transceiver circuits.
In the time-based RX of this work, the RX front-end circuit (VTC: voltage-to-time
converter) can achieve a high gain with low power compared to the conventional circuit,
and consumes the same power independent of input voltage swing. Therefore, in this
paper, the ratio of R_{TX} and R_{RX} was fixed at 3:1 to achieve 200 mV swing at V_{DD} = 800 mV. Besides, R_{TX} and R_{RX} were increased together to reduce static power as well; R_{RX} ranges from 80 Ω to 480 Ω in this work.

The short-reach PCB interconnect (Fig. 1(b)) can be approximated as a RC channel with single-time-constant of (R_{RX}||R_{TX})·C; the L/(R_{RX}+R_{TX}) time constant is not included because it is much smaller than the RC time constant.
Using this property of the short-reach PCB interconnect, a 1-tap IIR DFE was used
in this work to compensate for ISI.

In this work, there are three limiting factors in increasing the data-rate. One is the linearity limit; the RX input eye opening must be located inside the linear input range of RX front-end circuit for proper DFE compensation. Another is the sensitivity limit; a non-zero eye opening is required at RX input. The other is the LC resonance limit; the data-rate must be lower than the LC resonance frequency of channel. To identify the three limiting factors, the channel single-bit response, ISI and the RX eye opening are derived in the following paragraphs.

The single-bit response of the short-reach PCB interconnect (Fig. 2) is a combination of an exponentially decaying term with single-time-constant and
an exponentially decaying ringing term. The single-time-constant exponentially decaying
term corresponds to the approximated RC channel and the ringing term is originated
by the LC resonance at ~7.38 GHz. To verify the RC channel approximation, the channel
single-bit response is compared between the original channel model including a 1.6
mm-long lossy transmission line model and the approximated RC channel (Fig. 2). A unit pulse is applied at TX; the rise and fall times are 60 ps, respectively.
The single-bit response is used instead of the impulse response, to model the more
realistic situation where the signals have non-zero rise and fall times. The red x
is the single-bit response of V_{RX.IN} for the original channel and the blue dot is the single-bit response of V_{RX.IN} for the approximated RC channel in Fig. 2. Let G(z) be the single-bit response of the original channel and H(z) be the approximated
single-bit response of the channel.

Fig. 2. Normalized single-bit response of original channel (red x) and approximated
RC channel (blue dot) with (a) R_{RX} = 240 Ω at 8 Gb/s, (b) R_{RX} = 120 Ω at 10 Gb/s.

##### (1)

$$ \mathrm{G}(\mathrm{z})=g_{0}+g_{1} \cdot \mathrm{z}^{-1}+g_{2} \cdot \mathrm{z}^{-2}+g_{3} \cdot \mathrm{z}^{-3} \cdots $$

##### (2)

$$ \mathrm{H}(\mathrm{z})=h_{0}+h_{1} \cdot \mathrm{z}^{-1}+h_{2} \cdot \mathrm{z}^{-2}+h_{3} \cdot \mathrm{z}^{-3} \cdots $$_{n}be the time-domain single-bit response of the short-reach PCB interconnect channel. h

_{n}is the time-domain single-bit response of the channel with no inductance; h

_{n}is an exponentially decaying waveform with single time constant of (R

_{TX}||R

_{RX})∙C. g

_{n}is a combination of h

_{n}and a ringing waveform. The difference between the real channel (g

_{n}) and the approximated channel (h

_{n}) is characterized by a ringing factor (f

_{ringing}).

##### (3)

$$ f_{\text {ringing }}=\sum_{\mathrm{n}=1}^{\infty}\left|g_{\mathrm{n}}-h_{\mathrm{n}}\right| / \sum_{\mathrm{n}=1}^{\infty}\left|g_{\mathrm{n}}\right| $$

Fig. 3. Contour plot of ringing factor (f_{ringing}, Eq. (3)) for different values of data-rate and R_{RX}.

_{ringing}represents the remaining ISI ratio after the 1-tap IIR DFE operation. With the increase of R

_{TX}and R

_{RX}, f

_{ringing}is reduced and g

_{n}approaches h

_{n}. f

_{ringing}values were calculated from g

_{n}and h

_{n}values that are generated by simulation for different values of R

_{RX}and data-rate; R

_{TX}was set to 3∙RRX (Fig. 3). In the range of R

_{RX}> 120 and data-rate < 12 Gb/s, the transceiver of this paper was measured to work successfully with bit-error-rate < 10

^{-12}, where f

_{ringing}< 0.6.

The worst case ISI after the 1-tap IIR DFE operation (ISI_{EQ}) is the sum of absolute values of single-bit response errors from |g_{1} - h_{1}| to |g_{∞} - h_{∞}|.

##### (4)

$$ I S I_{\mathrm{EQ}}=\sum_{\mathrm{n}=1}^{\infty}\left|g_{\mathrm{n}}-h_{\mathrm{n}}\right| $$_{RX.IN}) is g

_{0}- ISINOEQ

^{(7)}; ISINOEQ is the worst case ISI which is the sum of absolute values of tail single-bit responses from |g

_{1}| to |g

_{∞}|. With the 1-tap IIR DFE, ISI is reduced to ISI

_{EQ}(Eq. (4)) and the RX eye is widened to g

_{0}- ISI

_{EQ}.

##### (5)

$$ E Y E_{\mathrm{EQ}}=g_{0}-\sum_{\mathrm{n}=1}^{\infty}\left|g_{\mathrm{n}}-h_{\mathrm{n}}\right| $$_{0}- 0.5 must be larger than - 0.5∙LR so that ‘1’ can be located within the linear RX input range. The linearity limit is stated by Eq. (6).

To meet the sensitivity limit, the RX input eye must be larger than the RX input sensitivity (V

_{RX.SENSITIVITY}) to recover the correct digital data at RX output within 1 UI; the normalized V

_{RX.SENSITIVITY}is estimated to be 0.12 in this work.

To meet the LC resonance limit, the data-rate must be much lower than twice the LC resonance frequency (7.38 GHz) to avoid a large ISI due to the ringing of channel single-bit pulse response g

_{n}.

## III. OPERATING PRINCIPLE OF TIME-BASED IIR DFE

The ISI of short-reach PCB interconnect is compensated by 1-tap IIR DFE because the
short-reach PCB interconnect can be approximated as a single-time-constant (RC) channel
if f_{ringing} is small. The loop delay of the IIR DFE is required to be < 1 UI in this work for
efficient implementation. TB RX circuits are more suitable to include the IIR DFE
than the voltage-based RX, because of the small loop delay; this is because the equalization
operation is performed in the TB RX using simple digital circuits with small capacitive
loading, such as inverters and NAND SR-latch ^{(4)}.
Because of the feedback loop delay in the 1-tap IIR DFE circuit, the 1^{st} tap ISI (h_{1}) cannot be compensated completely by the IIR DFE alone ^{(8,}^{9)}. As in Fig. 4, due to the feedback loop delay (T_{delay}), at t = 1 T the RC-filtered voltage (V_{IIR}) reaches h_{1.IIR} which is smaller than h_{1}. The remaining part (h_{1} ‒ h_{1.IIR}) is compensated by a 1 tap FIR DFE (h_{1.FIR} = h_{1} ‒ h_{1.IIR}).

The TB RX is implemented by cascading VTC, TB DFE and time comparator (TCMP) (Fig. 5). The VTC converts the RX input voltage (V_{RX.IN}) into a clock-like signal pair (CP, CN) which are two return-to-zero (RZ) signals
with the rising edges separated by a time interval of T_{C}; T_{C} is defined to be the time interval from the rising edge of CP to the rising edge
of CN that occurs after the falling edge of CK, and is proportional to (V_{RX.IN} - V_{REF}). For V_{RX.IN} = ‘1’, the rising edge of CP comes before CN; for V_{RX.IN} = ‘0’, the rising edge of CN comes before CP. ISI reduces |T_{C}|, the time interval between the two rising edges. The TCMP recovers a voltage level
RZ digital data (FP, FN) by identifying whose rising edge comes first. The equalization
in the time-domain (TB FIR and IIR DFE) generates another clock-like RZ signal pair
(OP, ON) such that the time interval (|T_{O}|) between the rising edges of OP and ON is large enough for TCMP to easily identify
‘1’ or ‘0’; T_{O} is defined to be the time interval from the rising edge of OP to the rising edge
of ON that occurs after the falling edge of CK. This basic operation of TB RX is described
in detail in ^{(4)}. The TB DFE of this work consists of a cascaded connection of a 1-tap IIR DFE and
a 1-tap FIR DFE between VTC and TCMP; the FIR DFE compensates for the residual component
(h_{1} - h_{1.IIR}). The FIR DFE is placed closer to the TCMP to ensure the FIR loop-delay < 1 UI; the
FIR DFE fails if the FIR loop-delay > 1 UI, but the performance is degraded in the
IIR DFE if the IIR loop-delay > 1 UI. The TB IIR DFE accepts a clock-like signal pair
(CP, CN) and a differential analog voltage (IIRP, IIRN) as input and generates another
clock-like signal pair (DP, DN) as output, such that, T_{D} (time interval from the rising edge of DP to that of DN) can be written as Eq. (8).

where V

_{IIR}= IIRP - IIRN and A

_{IIR}is the gain of the TB IIR DFE block. V

_{IIR}is generated by an IIR filter that accepts the RZ decision data (FP, FN) as input; the IIR filter is a differential RC filter that has the same RC time constant as the channel. The IIR DFE widens the time difference T

_{D}between the clock-like signal pair (DP, DN) such that |T

_{D}| > |T

_{C}| (Fig. 5(b)). The 1-tap FIR DFE further widens the time interval between the rising edges by generating a clock-like signal pair (OP, ON) such that |T

_{O}| > |T

_{D}|; T

_{O}is the time interval from the rising edge of OP to that of ON.

The mathematical model of TB RX (Fig. 5) is summarized in Fig. 6; G(z) models the single-bit response of the real channel which connects TX and RX.
H(z) is the single-bit response of a RC channel with single time constant (RC). The
combination of a 1-tap IIR DFE and a 1-tap FIR DFE (shaded part of Fig. 6) generates a feedback gain of H(z) - h0; this gives the loop gain of {H(z) - h0}∙AVTC∙ACMP
in Fig. 6. The forward gain is G(z)∙AVTC∙ACMP. Because both V_{TX.IN} and V_{OUT} have a normalized range of [-1.0 , 1.0], the VTC gain (A_{VTC}) and the time comparator gain (A_{CMP}) are adjusted to satisfy (9)

By using the loop gain, the forward gain and (9), the transfer function of the proposed transceiver can be derived as

##### (10)

$$ \frac{V_{\text {OUT }}(z)}{V_{\text {TX.IN }}(z)}=\frac{g_{0}+g_{1} \cdot z^{-1}+g_{2} z^{-2}+g_{3} z^{-3}+\cdot \cdot \cdot}{g_{0}+h_{1} \cdot z^{-1}+h_{2} \cdot z^{-2}+h_{3} \cdot z^{-3}+\cdot \cdot \cdot} $$
A behavior-level simulation is presented in Fig. 7(a) to explain the operation of the proposed TB IIR DFE. TX sends a 10 Gb/s digital signal
of ‘10000000111111011111 001’ (a part of PRBS-7) to RX through the channel; the channel
refers to the short-reach PCB interconnect with R_{TX} = 360 Ω, R_{RX} = 120 Ω, L = 0.5 nH and C = 1.8 pF and 1.6 mm micro-strip line. The RX input waveform
(V_{RX.IN}) is obtained with HSPICE simulation. V_{RX.IN} is converted into a T_{C} signal by VTC with A_{VTC} = 0.4 ps/mV; large ISI can be observed in V_{RX.IN} and the T_{C} signal. The 1-tap TB IIR DFE with the single-bit response of H(z) in Eq. (2) is used to generate a T_{O} signal; T = 10^{-10} sec (10 Gb/s), and RC = 180·10-12 sec.

Fig. 7. Behavior simulation of proposed short-reach interface (a) Timing diagram of
each signal node in Fig. 5, (b) Eye diagram of clock-like signals before EQ (T_{C}) and after EQ (T_{O}).

_{O}signal, a clear separation into two groups can be observed (Fig. 7(a)). The effect of the IIR DFE operation can be observed more clearly in the eye patterns of the T

_{C}and T

_{O}signals (Fig. 7(b)); although eye is almost closed in the T

_{C}signal, the T

_{O}signal has a clear eye opening of 12 ps which is large enough for TCMP to separate V

_{OUT}into ‘1’ or ‘0’ signal.

## IV. CIRCUIT IMPLEMENTATION

The TB RX circuit (Fig. 6) was implemented in a quarter-rate architecture (Fig. 8) to increase the VTC gain as in the previous design ^{(4)}. The same circuit is used in this work as in ^{(4)} except the IIR-DFE and the IIR-filter. A 1.6-mm micro-strip line connects TX and
RX as a short-reach PCB interconnect channel. The characteristic impedance (ZO) of
the micro-strip line is ~50 Ω, but a relaxed termination is used at TX and RX to save
TX power (R_{TX}, R_{RX} > ZO); R_{RX} is set to be one of six values (80, 96, 120, 160, 240, 480 Ω) by connecting a set
of six 480-Ω resistors in parallel. A voltage-mode TX driver is used with R_{TX} = 3·RRX. Because reducing R_{RX} increases the maximum data-rate but it increases TX power, R_{RX} is increased to the largest possible value at a given data rate to save power. To
keep R_{TX} = 3·RRX, R_{TX} is implemented with a set of six 1440 Ω resistors; R_{TX} of each TX driver is set to 1440 Ω by two analog voltages (VBP, VBN), at V_{RX.IN} = 0.2 V, V_{DD} = 0.8V. The same number of parallel resistors are turned on for R_{TX} and R_{RX} to keep R_{TX} = 3·RRX. Four VTCs generate four clock-like signal pairs (CP0, CN0), (CP90, CN90),
(CP180, CN180), (CP270, CN270) by using quarter rate clocks (CK0, CK90, CK180, CK270),
respectively. Each VTC is composed of two comparators with opposite offsets ( ) as
in ^{(4)}; V_{OS} = 100 mV, the linear range ranges from 50 mV to 150 mV and A_{VTC} is 0.4 ps/mV in this work. The clock-like VTC output pairs are applied to the corresponding
IIR DFE of the quarter rate architecture; the IIR DFE generates another four pairs
of clock-like signals (DP0, DN0), (DP90, DN90), (DP180, DN180), (DP270, DN270) by
using a differential analog signal (V_{IIR} = IIRP - IIRN; the IIR filter output). The following FIR DFE accepts the IIR DFE
output and generates another four pairs of clock-like signals (OP0, ON0), (OP90, ON90),
(OP180, ON180), (OP270, ON270) by using the TCMP output pairs (FN270, FP270), (FN0,
FP0), (FN90, FP90), (FN180, FP180), respectively. The FIR DFE output is applied to
the corresponding TCMP input. The TCMP output pairs are also applied to the IIR filter;
it generates the differential analog signal (V_{IIR} = IIRP - IIRN) used for the IIR DFE operation.

The IIR filter (Fig. 9(a)) multiplexes the four pairs of the TCMP output to generate a RC-filtered differential analog signal (V

_{IIR}= IIRP - IIRN)

^{(10)}. The multiplexing is required because of the infinite range of the IIR DFE operation; the equalization operation of an IIR DFE block of the quarter rate architecture is affected by all the previous decision data including the nearest three preceding data which are the TCMP outputs of other branches. The multiplexing is done by using gated NMOS input differential pairs with the quadrature signal used as the gating signal; the FP90 and FN90 signals are used for the FP0 and FN0 input signals. Because the TCMP outputs (FP0, FN0, ···) are active-low RZ signals, they are inverted to be used as the input signal of NMOS input differential pairs. The gating signals (W90, W180, W270, W0) are generated by passing the quadrature signals through AND gates; clock signals are not used for gating because the time delay from the clock sampling (falling) edge at VTC to the falling edge of TCMP output (FPn, FNn) is not constant. Due to the series connection of the differential pair and the gating transistor, the current IH or IL is injected to each branch of RF-CF low-pass filter during the conduction time interval of around 1 UI; when FPn - FNn < 0 (V

_{RX.IN}< V

_{REF}) as in Fig. 9(b), the differential output (V

_{IIR}=IIRP-IIRN) decreases with time. The conduction time interval changes from 1 UI - jitter to 1 UI + jitter; the jitter refers to the TCMP output jitter. To compensate for the channel RC time constant of around 1 ns, the RC time constant of the RF-CF filter varies up to 1.1 ns from 70 ps. RF is either 1 kΩ or 3 kΩ poly resistor and CF is a 4-b binary weighted NMOS capacitor. By adjusting IH and IL, IIRP and IIRN swings between 0.5 V and 0.7 V to match the linear input voltage range of the IIR DFE circuit.

Fig. 9. (a) Circuit diagram of proposed IIR filter, (b) Timing diagram of gating signal and differential input signal.

The IIR DFE of Fig. 8uses current-starved inverters as in the FIR DFE

^{(4)}, as shown in Fig. 10. T

_{D}(time interval from the rising edge of DP to that of DN) must be linearly proportional to the differential analog voltage V

_{IIR}as stated in Eq. (8); IIRP and IIRN swing differentially between 0.5 V and 0.7 V with the common mode voltage (VCM) of 0.6 V. By the operation of the IIR DFE circuit (Fig. 10), T

_{D}can be derived as Eq. (11).

##### (11)

$$ T_{\mathrm{D}}=T_{\mathrm{C}}-\left(R_{\mathrm{ON} . \mathrm{P}}-R_{\mathrm{ON} . \mathrm{N}}\right) \cdot C_{\mathrm{M}} $$_{IIR}(= IIRP - IIRN) for the V

_{IIR}range of [- 0.2 V, + 0.2 V] within the error bound of 2.5 %. A

_{IIR}of (8) can be derived as (12).

##### (12)

$$ A_{I I R}=\frac{1}{\mu_{\mathrm{n}} C_{\mathrm{ox}} \mathrm{W}_{1} / \mathrm{L}_{1}} \cdot \frac{C_{\mathrm{M}}}{\left\{V_{\mathrm{CM}}-V_{\mathrm{TH}}+\alpha \cdot\left(\mathrm{V_{DD}}-V_{\mathrm{TH}}\right)\right\}^{2}} $$

Fig. 11. Comparison of differential delay time between equation (lines, Eq. (11)) and simulation (symbols).

_{IIR}of Fig. 11ranges from 0.04 ps/mV to 0.16 ps/mV. VBIIR is set by a 7 bit DAC. Comparison of T

_{D}− T

_{C}between Eq. (11) (lines) and circuit simulation (symbols) demonstrates the average relative error of 4.6 % (Fig. 11). VBFIR, VBIIR and the RC time constant of the RF-CF filter are manually controlled in this work.

## V. MEASUREMENT RESULTS

The proposed TB RX was implemented in a 65-nm CMOS process (Fig. 12(a)). Chip areas of TX and RX are 1000 µm^{2} and 7700 µm^{2}, respectively. The TX and RX chips are placed on PCB through COB (Fig. 12(b)); a double-bonding technique was used between a die chip pad and channel to reduce
the bonding-wire inductance by half. The TX chip receives full rate data from a BER
tester and transmits the data to the RX chip through a 1.6 mm micro-strip line (Fig. 12(c)). The RX returns quarter-rate data to the BER tester. The channel eye diagrams were
measured by using an active probe (0.35 pF, 25 kΩ) for R_{RX} =160 Ω, 120 Ω, 96 Ω (Fig. 13); smaller R_{RX} gives larger eye opening.

Fig. 14. Measured maximum data-rate vs. 1/R_{RX} with (red circle) and without (blue X) IIR DFE. VTC linearity limit (dotted line),
and RX sensitivity limit (broken line) are derived from Eq. (6), (7) with simulated single-bit responses. V_{DD} = 0.8 V.

Fig. 15. Measured energy efficiency of proposed TX and RX circuits vs. 1/R_{RX}; V_{DD} = 0.8 V (red circles), V_{DD} = 0.75 V (blue X).

^{-12}) by the 1-tap IIR DFE for the entire range of R

_{RX}(Fig. 14) at V

_{DD}= 0.8 V; red circles (O) are the measured highest data-rate achieved with the 1-tap IIR DFE and blue crosses (X) are those without equalization. As discussed in Section II, the measured maximum data-rate with 1-tap IIR DFE are limited by either the VTC linearity limit (Eq. (6)) or the RX sensitivity limit (Eq. (7)). The measured data rate was limited to 12 Gb/s because of the BER tester limit. The minimum energy efficiency of 0.367 pJ/b was achieved at R

_{RX}= 240 Ω, data-rate = 8 Gb/s and V

_{DD}= 0.75 V. The maximum data-rate of 12 Gb/s was achieved at R

_{RX}= 120 Ω; the energy efficiency was 0.446 pJ/b. The energy efficiency ranges from 0.367 pJ/b to 0.49 pJ/b (Fig. 15); the six red circles are the same as those in Fig. 14with V

_{DD}= 0.8 V.

The measured bathtub curves demonstrate the successful operation of the IIR DFE at
R_{RX} = 240 Ω and 8 Gb/s (Fig. 17(a)) and R_{RX} = 120 Ω and 12 Gb/s (Fig. 17(b)). With the help of relaxed termination, the power consumption of TX (pre-driver +
main-driver) is greatly reduced to 23 % of the total transceiver power (Fig. 16); the TX power is reduced to 44 % of the 50-Ω terminated TX ^{(5)}.

The proposed TB transceiver is compared with the previous works with the same technology node of 65 nm (Table 1). The TX power is 37 % of the best previous value.

Fig. 17. Measured bathtub curves (a) R_{RX} = 240 Ω, data-rate = 8 Gb/s and V_{DD} = 0.75 V, (b) R_{RX} = 120 Ω, data-rate = 12 Gb/s and V_{DD} = 0.8 V.

## VI. CONCLUSIONS

To reduce the energy efficiency of time-based transceiver with short-reach PCB interconnect,
the RX termination resistance was increased to 120 Ω or 240 Ω with the TX termination
resistance being three times the RX termination resistance. A voltage mode driver
is used at TX for low power. A 1-tap IIR DFE was used at RX to compensate for the
increased channel ISI due to the increased termination resistance. A 1.6 mm micro-strip
transmission line was modeled as a RC channel with single time constant because it
is shorter than the critical length for transmission line analysis, which is 3.2 mm
for a pulse signal with the 10 %-to-90 % rise time of 60 ps. The 1-tap IIR DFE was
added to the time-based transceiver with a 1-tap FIR at RX ^{(4)}; The proposed time-based transceiver chip fabricated in a 65 nm CMOS process achieved
the minimum energy efficiency of 0.367 pJ/b with RX termination resistance of 240
Ω at 8 Gb/s, and the maximum data rate of 12 Gb/s and the energy efficiency of 0.446
pJ/b and RX termination resistance of 120 Ω. The TX and RX chip areas are 1000 µm^{2} and 7700 µm^{2}, respectively.

### ACKNOWLEDGMENTS

This work was supported in part by Institute of Information & Communications Technology Planning & Evaluation (IITP) Grant funded by the Ministry of Science and ICT (MSIT), Korea (No. 2019001394, Automatic Design Generation of Ultra-High-Speed I/O Circuit to support Intelligent Semiconductor Devices) and in part by Samsung Electronics.

### REFERENCES

## Author

Min-Kyun Chae received the B.S. and M.S. degrees in electronic and electrical engineering from the Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2012 and 2014, respectively, where he is currently pursuing the Ph.D. degree in electronic and electrical engineering.

His current research interests include high-speed low-power I/O circuits.

Seung-Jun Bae received the B.S. and Ph.D. degrees in electrical engineering from the Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2000 and 2005, respectively.

In 2005, he joined Samsung Electronics, Hwaseong, South Korea, where he was involved in the design of high-bandwidth DRAM such as GDDR5, LPDDR4/4X, DDR4, HBM2, and GDDR6.

From 2013 to 2014, he was a Visiting Scientist with the Massachusetts Institute of Technology (MIT), Cambridge, MA, USA.

He is currently a Vice President of the Mobile/Graphic DRAM Design Group.

His current research interests include high-speed interface circuits, signal/power integrity, high-speed analog-to-digital converters, and next-generation memory architecture.

Dr. Bae has served on the Technical Program Committees of the IEEE International Solid-State Circuits Conference (ISSCC) from 2016.

Jung-Hwan Choi was born in Daegu, South Korea, in 1968.

He received the B.S. degree in electrical engi-neering from Kyungpook National University, Daegu, in 1990, and the M.S. and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology, Daejeon, South Korea, in 1992 and 1997, respectively.

In 1997, he joined Samsung Electronics, Hwaseong, South Korea, where he was involved in the design of Rambus, XDR DRAM, and high-speed I/O interface for memory applications.

He is a currently a Master with Samsung Electronics, where he is responsible for the design of DRAM interface and the development of high-speed DRAM interfaces for the next generation, including LPDDRx and DDRx.

His current research interests include the design of monolithic microwave IC, high-speed memory, and high-frequency measurement.

Kwang-Il Park received the B.S., M.S., and Ph.D. degrees in electrical and electronic engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 1993, 1995, and 1999, respectively.

He joined LG Semicon Corporation Ltd., Seoul, South Korea, in 1999, where he was involved in the Rambus DRAM and PLL.

Since 2003, he has been with Samsung Electronics, Hwaseong, South Korea.

He is currently a Senior Vice President with the DRAM Design Division.

His current research interests include high-speed, high-density, and low-power DRAM and interface design.

Jung-Bae Lee was born in Seoul, South Korea, in 1967.

He received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul National University, Seoul, in 1989, 1991, and 1995, respectively.

He joined the DRAM Design Team, Samsung Electronics, Hwaseong, South Korea, in 1995, as a Circuit Design Engineer, where he participated in the development of various DRAM products, including DDR, DDR2, DDR3, GDDR, LPDDR2, and LPDDR3.

He became the Head of the DRAM Design Team in 2012, the Memory Product Planning and Application Engineering Team in 2014, and Quality Assurance in 2017.

His leadership through various backgrounds, including design, product planning, and quality assurance enhance overall completeness of Samsung memory products.

Since 2019, he has been leading DRAM product and technology.

His research interests include the design of high-speed low-power architecture for the next-generation memory and noise phenomena in devices.

Hong-June Park (M’88-SM’13) received the B.S. degree in electronic engineering from Seoul National University, Seoul, South Korea, in 1979, the M.S. degree from the Korea Advanced Institute of Science and Technology, Daejeon, South Korea, in 1981, and the Ph.D. degree in electrical engineering and computer sciences from the University of California, Berkeley, CA, USA, in 1989.

From 1981 to 1984, he was a CAD engineer with ETRI, Daejeon.

From 1989 to 1991, he was a Senior Engineer with the TCAD Department of INTEL, USA.

In 1991, he joined the Electronic and Electrical Engineering Department as a Faculty Member, Pohang University of Science and Technology, Pohang, South Korea, where he is currently a Professor.

His current research interests include CMOS analog circuit design such as high-speed interface circuits, ROIC of touch sensors, and analog/digital beamformer circuits for ultrasound medical imaging.

Prof. Park is a member of IEEK.

He served as the Editor-in-Chief of the Journal of Semiconductor Technology and Science, an SCIE journal from 2009 to 2012, as the Vice President of IEEK in 2012, and as a Technical Program Committee Member of ISSCC, SOVC, and A-SSCC for several years.