KongByeong Yong1,2
-
(Division of Electrical, Electronic, and Control Engineering, Kongju Na- tional University,
Cheonan 31080, South Korea)
-
(Institute of IT Convergence Technology, Kongju National University, Cheonan 31080,
South Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Fault tolerance, interleave division multiple access (IDMA), interleaver, multiuser detection (MUD), nonorthogonal multiple access (NOMA)
I. INTRODUCTION
Interleave division multiple access (IDMA) [1] is one of nonorthogonal multiple access (NOMA) techniques. It has accommodated multiple
users in various communication systems including the 5G and beyond, the Internet of
Things (IoT), and unmanned aerial vehicle (UAV) networks [2,3].
Fig. 1 illustrates an IDMA system for ${U}$ users [1]. It consists of ${U}$ transmitters associated with ${U}$ respective users, the single
receiver that accommodates all the ${U}$ users, and the multiple access channel that
bridges the transmitters and the receiver. While all the transmitters employ the same
spreader, each of them adopts a distinct interleaver with a unique interleaving pattern
different from the others, which is the key principle of enabling the IDMA. The subscript
in the name of a block, e.g., interleaver${}_{u}$, indicates that the component is
solely for the ${u}$th user. The receiver or the multiuser detector is subdivided
into the elementary signal estimator (ESE) and ${U}$ user-specific processing blocks
(UPBs). The ESE first calculates the log-likelihood ratios (LLRs) for the chips sent
from the users, and UPB${}_{u}$ refines the LLRs associated with the ${u}$th user
and feeds them back to the ESE. The ESE then recalculates the LLRs for UPBs based
on the statistics of all the users' LLRs from UPBs. In such a manner, the ESE and
UPBs exchange their LLRs several times to improve the reliability.
As stated above, to enable simultaneous accesses, every user transmits its data by
employing its own interleaver of a distinct interleaving pattern. Then, the iterative
multiuser detector reconstructs the original data from ${U}$ users by employing ${U}$
interleaver-deinterleavers pairs, each of which corresponds to the distinct pattern.
As emphasized with gray in Fig. 1, the interleavers are integral in implementation of the systems.
Fig. 1. IDMA system for U users.
There have been numerous interleaver architectures in the literature. In [4], efficient methods to generate the interleavers were presented. The group random
permutation was used in [5] to constitute interleaving patterns. In [6,7], partially parallel interleavers were developed. Common subexpressions were eliminated
in [8]. The algebraic interleaving formula and its multi-stage architecture were proposed
in [9,10], and have been widely employed [3,7,10,11,12,13,14,15,20]. Recently, [16] replaced the multiple stages by a fixed-latency look-up table (LUT), and [17] eliminated the conversion to further mitigate the latency.
All the above architectures, however, may not be suitable for harsh environments [3,15], where they may experience a single-event upset that causes the bits in registers
to flip [18]. For example, tactical surveillance UAVs are usually subjected to jamming and cosmic
rays [3]. Since the multiuser detection is performed chip-by-chip according to an interleaving
pattern, it is of paramount importance to keep track of the current index at all times.
Under such circumstances, the following contributions are showcased in this paper.
• For the first time in the literature, we explicitly quantify how problematic the
errors in interleavers are, and propose a fault-tolerant interleaver to efficiently
deal with them.
• More specifically, to protect the interleaving index in the register from soft
errors, an encoder and a decoder associated with an error-correcting code (ECC) are
added.
• Furthermore, to mitigate the latency overhead of such an architecture, the encoder
is merged with an essential logic that precomputes the next index.
• As a result, the proposed interleaver can tolerate soft errors while suppressing
the latency overhead.
The remainder of this paper is organized as follows. Section II reviews the existing
interleavers. In Sections III and IV, the proposed fault-tolerant interleaver is presented
and evaluated, respectively. Concluding remarks are made in Section V.
II. EXISTING ARCHITECTURES
The multi-stage algebraic interleaver [10], which have been widely used in the literature [3,7,10,11,12,13,14,15,20], is drawn in Fig. 2. The up-counter first puts a sequence of ${j} = 0$, $1$, ..., ${J}-1$ at each clock
cycle, where ${J}$ is the interleaving length. The subsequent ${S}$-stage interleaver
converts ${j}$ into the interleaved index $\pi_{uS}(j)$ for the ${u}$th user, where
$\pi_{uS}(j)$ is the ${S}$-time composition of $f_{us}(j)$ in [9] for ${s} = 1$, $2$, ..., ${S}$. More precisely,
where $K_{us}$ is a randomly chosen odd number for the ${u}$th user in the ${s}$th
stage, and ${J}$ is a power of two. Each stage realizes (2) with two multipliers and
one adder.
The ${S}$-time composition of $f_{us}(j)$ in Fig. 2 may suffer from a high latency, as it calculates the intermediate values of $\pi_{uS}(j)$
for all $1 \le s < S$ on the fly. Fig. 3 depicts the fixed-latency architecture [16] in which the multiple stages are replaced by a single LUT that readily returns $\pi_{uS}(j)$
corresponding to $j$. While the overall latency of Fig. 2 increases with ${S}$, that of Fig. 3 remains the same.
Fig. 2. Multi-stage architecture [10].
Fig. 3. LUT-based architecture [16].
Both of the above architectures [10,16] first generate $\{j\}$, and then convert it into $\{ \pi_{uS}(j)\}$. Such an approach
may not be favorable in low-latency IDMA systems, as several operations of the system
cannot be started until $\pi_{uS}(j)$ is available [1,12]. Accordingly, the conversion-less structure [17] in Fig. 4 was developed. It consists of a register that holds the current index $\pi_{uS}(j)$
and the next-index (NI) logic that precomputes the next index $\pi_{uS}(j+1)$ for
the following cycle. In such a configuration, the output of the register is immediately
$\{\pi_{uS}(j)\}$ without any further manipulations.
Fig. 4. Conversion-less architecture [17].
In summary, the key features of the interleaver architectures are compared in Table 1. The dependence on ${S}$ of [10], which elongates the overall latency, is resolved in [16]. The sequential-to-interleaved domain conversion, which defers the availability of
$\pi_{uS}(j)$, is removed in [17]. The vulnerability to errors, which may corrupt the entire multiuser detection procedure,
will be remedied by the proposed fault-tolerant interleaver to be unveiled in Section
III. Note that the proposed interleaver inherits all the advantages of the prior arts,
and serves as the only one with the fault tolerance.
Table 1. Key features of interleaver architectures.
III. PROPOSED ARCHITECTURE
This section first derives the baseline fault-tolerant interleaver. Subsequently,
the overhead of empowering the fault tolerance is alleviated by integrating the two
components therein.
1. Baseline Fault-Tolerant Architecture
Although the existing interleavers may operate well in gentle environments, none of
them is prepared for soft errors that may occur in harsh surroundings. Hence, we develop
the fault-tolerant interleaver by renovating the state-of-the-art structure in Fig. 4 with the minimal interleaving latency [17]. Since the register in Fig. 4 is prone to soft errors, we protect it by employing an error-correcting code. In
other words, the next index $\pi_{uS}(j+1)$ is encoded before being stored into the
register, and decoded after being retrieved. The corresponding architecture is drawn
in Fig. 5, where the encoder and the decoder are associated with a linear block code of generator
matrix $\mathbf{G}$ and parity-check matrix $\mathbf{H}$. Let us trace the datapath
starting from the current index $\pi_{uS}(j)$. $\mathbf{b}(j)$ is the binary representation
of $\pi_{uS}(j)$. Given $\mathbf{b}(j) = \pi_{uS}(j)$, the next-index (NI) logic [17] puts $\mathbf{b}({j}+1) = \pi_{uS}({j}+1)$. The encoder is a cluster of XOR gates
[19] that computes the codeword $\mathbf{c}(j+1) = \mathbf{b}({j}+1)\mathbf{G}$ for the
register. While the register in Fig. 4 holds $\pi_{uS}(j+1)$ of $\log_{2} J$ bits, the register in Fig. 5 holds $\mathbf{c}(j+1)$ of $\log_{2} J + p$ bits, where $p$ is the number of parity
bits. When the clock ticks, a new cycle starts. $\mathbf{r}({j})$ is a retrieved codeword
that may have bit errors. The decoder is of a common structure [19] that first calculates the syndrome $\mathbf{s}(j) = \mathbf{r}(j)\mathbf{H}^{T}$,
detects the error pattern $\mathbf{e}(j)$ corresponding to $\mathbf{s}(j)$, and restores
the error-corrected codeword $\mathbf{c}(j) = \mathbf{r}(j) \oplus \mathbf{e}(j)$,
where $\oplus$ denotes the Galois-field addition or the XOR operation. Due to the
systematic property of the code, a $\log_{2} J$-bit subset of $\mathbf{c}(j)$ is equal
to $\mathbf{b}(j) = \pi_{uS}(j)$.
Fig. 5. Fault-tolerant architecture with separate NI logic and encoder.
2. Integration of NI Logic and Encoder
While the inclusion of the encoder and decoder makes Fig. 5 tolerant of errors, the overall latency is inevitably increased. To alleviate such
an overhead, we integrate the functionalities of the NI logic and the encoder, and
the resulting architecture is depicted in Fig. 6. As its name implies, the encoded NI logic therein puts $\mathbf{c}(j+1)$ for given
$\mathbf{b}(j)$. The construction of the logic is exemplified for ${J} = 8$, ${S}
= 3$, $K_{u1} = 3$, $K_{u2} = 5$, $K_{u3} = 7$, and
in Fig. 7, where $\pi_{uS}({j})$, $\mathbf{b}({j})$, and $\mathbf{c}({j})$ are enumerated for
all ${j} = 0$, $1$, ..., ${J}-1$. Note that the systematic property of $\mathbf{G}$
ensures that the $\log_{2} {J} = 3$ rightmost bits of $\mathbf{c}({j})$ are the same
as $\mathbf{b}({j})$. When the logic takes $\mathbf{b}(0) = 000$, for instance, it
returns $\mathbf{c}(1) = \mathbf{b}(1)\mathbf{G} = 100011$, as paired by the red segment.
For $\mathbf{b}(1) = 011$, it puts $\mathbf{c}(2) = \mathbf{b}(2)\mathbf{G} = 111001$,
as bound by the blue segment. In the same manner, all the pairs of $\mathbf{b}({j})$
and $\mathbf{c}({j}+1)$ are tabulated as the input and the output of the combinational
logic, respectively.
Fig. 6. Fault-tolerant architecture with the encoded NI logic.
Fig. 7. Construction of the encoded NI logic.
Note that all the interleavers in Figs. 2-6 provide the same output $\pi_{uS}(j)$ for $j = 0$, $1$, ..., $J-1$. This indicates
that the error-correcting capability of the proposed architecture is completely transparent,
making it possible to replace the error-prone interleavers in IDMA systems without
any modification of the interface.
IV. EVALUATION
This section assesses how robust the proposed interleaver is in harsh environments
in terms of bit-error rates (BERs), and how practical it is in terms of implementation
results.
1. BER Performance
Fig. 8 exhibits the BERs of the iterative multiuser detector in Fig. 1, where $p_{e}$ denotes the bit-error probability of the register in an interleaver.
The simulations were conducted for the most prevalent set of parameters in the literature
[3,7,8,12,13,14,15,16,17,20,21], i.e., the interleaving length $J = 8192$, the spreading factor $P = 16$, the number
of users $U = 16$, and the number of maximally allowed iterations $I = 6$. Fig. 8(a) corresponds to the case when no ECC is employed. As shown, $p_{e} \ge 10^{-4}$ severely
deteriorates the BER, hindering the practical use. $p_{e} \le 10^{-5}$ can be regarded
as near-optimal. Figs. 8(b) and 8(c) are the cases when at most one and two errors among $\log_{2} J$ bits
in the register can be corrected, respectively. In Figs. 8(b) and 8(c), only $p_{e} \ge 10^{-2}$ remains problematic, while $p_{e} \le 10^{-3}$
is fully operational. By virtue of one additional error-correcting capability, $p_{e}
= 10^{-2}$ in Fig. 8(c) demonstrates a significant improvement over the counterpart in Fig. 8(b), despite still not robust enough to be used in practice. Although the interleavers
with higher error-correcting capabilities may tolerate the extreme environments of
$p_{e}\ge 10^{-2}$, such topologies will draw too many resources in exchange for the
marginal gains. Accordingly, in Section IV.2, we will consider a single-error-correcting
code of the minimal overhead yet with a significant fault tolerance.
Fig. 8. Bit-error rates (BERs) when (a) zero, (b) one, and (c) two errors in the register
of an interleaver can be corrected.
Fig. 9. Orders of processing when errors occur (a) zero, (b) one, and (c) two times
during $J = 16$ cycles.
Before moving on to the next analysis, an interesting phenomenon that contradicts
the expectation can be observed from Figs. 8(a) and 8(b), e.g., the BER of $p_{e} =10^{-1}$ is slightly lower than that of $p_{e}
= 10^{-2}$. The underlying reason is exemplified for $J = 16$ in Fig. 9, where the hexadecimal numbers are the cycle counts. Fig. 9(a) is an error-free case that the cycle counts from the left to the right are perfectly
in an order of the sequential indices ${j} = 0$, $1$, ..., $J- 1$. Fig. 9(b) exemplifies that an error occurs after the 4th cycle, and ${L} = 3$ indices are bypassed.
As the index reaches ${J}- 1$, it starts back from 0 due to the overflow, and wastes
${L}$ cycles to redundantly process the ${E}$ indices at the front. After all ${J}$
cycles, the interleaver fails to secure the data of ${L}$ indices, degrading the BER.
On the other hand, Fig. 9(c) corresponds to the case that errors occur twice. After the 4th cycle, the first error
bypasses ${L} = 3$ indices in the same manner as Fig. 9(b). After the 11th cycle, the second error fortunately returns back to the middle of
the bypassed indices, reducing the number of missing data to be less than ${L}$. Then,
the BER deterioration can be slightly alleviated. As exemplified, while the case of
only one error has no way to recover ${L}$ lost indices, a few more errors may grant
us chances to make up for some previous mistakes.
In summary, as shown in Fig. 8, the BER of the multiuser detector is drastically deteriorated in harsh environments
if the interleavers are not fault-tolerant. On the other hand, the error-correcting
capability of the proposed architecture makes it possible to endure the harsh environments
in practice, sustaining the BER to be near-optimal.
Table 2 summarizes the implementation results of the interleavers in a 65-nm CMOS process
for ${S} = 3$ and ${J} = 2^{9}$, $2^{10}$, $2^{11}$, $2^{12}$, $2^{13}$. Table 3 compiles the results for ${J} = 2^{13}$ and $S = 3$, $4$, $5$, $6$, $7$. The parameters
have been widely adopted in the literature [3,7,8,10,11,12,13,14,15,16,17,20,21]. Synopsys Design Compiler was used to synthesize the designs and to measure the following
metrics. $t_{\pi1}$, $t_{\pi2}$, and $t_{\rm FP}$ are important latency metrics to
be defined and scrutinized soon. Equivalent gate counts (EGCs) were calculated by
regarding a two-input NAND gate as one. The switching activities were first extracted
from the testbenches of practical operation scenarios in the switching activity interchange
format (SAIF). Then, the SAIF file was back-annotated when measuring power consumptions
in Synopsys Design Compiler. All the metrics were averaged over 16 different sets
of $\{K_{us}\}$. As a matter of course, only the proposed ones in Figs. 5 and 6 are fault-tolerant as they are protected by the single-error-correcting Hamming code
for $\log_{2} J$ bits.
Table 2. Implementation results for $S = 3$ and $J = 2^9$, $2^{10}$, $2^{11}$, $2^{12}$,
$2^{13}$.
Table 3. Implementation results for $J = 2^{13}$ and $S = 3$, $4$, $5$, $6$, $7$.
2. Implementation Results
To analyze the latencies, let $t_{\rm C2Q}$ and $t_{\rm SU}$ denote the clock-to-Q
delay and the setup time of the register, respectively. Let $t_{\rm ENC}$, $t_{\rm
DEC}$, $t_{\rm NIL}$, and $t_{\rm ENIL}$ indicate the delays of the encoder, the decoder,
the NI logic, and the encoded NI logic, respectively. $t_{\pi1}$ is the latency from
the clock edge to the time that error-prone $\pi_{uS}(j)$ becomes ready. Since Figs. 4-6 are all conversion-less [17], $t_{\pi1}$ of them are equal to the momentary tC2Q, and are much lower than those
of Figs. 2 and 3. Note that the output of the register in Figs. 5 and 6, $\mathbf{r}(j)$, consists of error-prone $\pi_{uS}(j)$ and the parity bits, as the
code is systematic. $t_{\pi2}$ is the latency from the clock edge to the time that
error-corrected $\pi_{uS}(j)$ becomes available. $t_{\pi2}$ can be measured only for
the fault-tolerant ones, and $t_{\rm DEC} = t_{\pi2} - t_{\pi1}$. Since the decoders
in Figs. 5 and 6 are not different from each other, $t_{\pi2}$ of them are almost the same. $t_{\rm
FP}$ is the latency of the entire feedback path. $t_{\rm FP}$ of Figs. 4, 5, and 6 can be formulated as $t_{\rm C2Q} + t_{\rm NIL} + t_{\rm SU}$, $t_{\rm C2Q} + t_{\rm
DEC} + t_{\rm NIL} + t_{\rm ENC} + t_{\rm SU}$, and $t_{\rm C2Q} + t_{\rm DEC} + t_{\rm
ENIL} + t_{\rm SU}$, respectively. The inevitable increase of $t_{\rm FP}$ due to
the adoption of the ECC is indeed suppressed by the encoded NI logic that substitutes
the series of the NI logic and the encoder, as $t_{\rm FP}$ of Fig. 6 is shorter than that of Fig. 5. In exchange for such a benefit, Fig. 6 occupies a larger area and dissipates a higher power than Fig. 5. Compared with the overall complexity of the entire multiuser detector, however,
the overhead is not significant, as the detector includes hundreds of kilo-gates and
even those gates are overwhelmed by the vast area of internal memories [7,10,11,12,13,14,15]. Accordingly, the proposed scheme can be a promising and practical solution to effectively
overcome the harsh environments.
V. CONCLUSIONS
The fault-tolerant interleaver combined with the error-correcting code has been presented
for IDMA systems operating in harsh environments. The latency overhead has been alleviated
by merging the encoder with the NI logic. While the proposed architecture has been
exemplified for the Hamming codes only, any linear block code can be adopted to further
enhance the fault tolerance.
ACKNOWLEDGMENTS
This work was supported by the National Research Foundation (NRF) of Korea under
Grant 2022R1I1A3 064200 funded by the Ministry of Education, and Grant 2020R1G1A1102620
funded by the Ministry of Science and ICT. Electronic design automation (EDA) tools
were provided by the IC Design Education Center (IDEC), South Korea.
References
L. Ping, L. Liu, K. Wu, and W. K. Leung, ``Interleave-division multiple-access,''
IEEE Transactions on Wireless Communications, vol. 5, no. 4, pp. 938-947, Apr. 2006.

C. Zhang, Y.-H. Huang, F. Sheikh, and Z. Wang, ``Advanced baseband processing algorithms,
circuits, and implementations for 5G communication,'' IEEE Journal on Emerging and
Selected Topics in Circuits and Systems, vol. 7, no. 4, pp. 477-490, Dec. 2017.

B. Y. Kong, ``Bitwise early termination of multiuser detection for IDMA systems,''
IEEE Communications Letters, vol. 25, no. 9, pp. 2998-3002, Sep. 2021.

I. Pupeza, A. Kavčić, and L. Ping, ``Efficient generation of interleavers for IDMA,''
Proc. of IEEE International Conference on Communications (ICC), Istanbul, Turkey,
pp. 1508-1513, Jun. 2006.

W. Belaoura, M. Djeddou, and K. Ghanem, ``GRP-based interleaver for IDMA systems over
frequency selective channel,'' Electronics Letters, vol. 51, no. 18, pp. 1462-1464,
Sep. 2015.

S. Wu, X. Chen, and S. Zhou, ``A parallel interleaver design for IDMA systems,'' Proc.
of International Conference on Wireless Communications and Signal Processing (WCSP),
Nanjing, China, pp. 1-5, Nov. 2009.

B. Y. Kong and I.-C. Park, ``Parallel IDMA architecture based on interleaving with
replicated subpatterns,'' Proc. of IEEE International Conference on Communications
(ICC), Shanghai, China, pp. 1-6, May 2019.

B. Y. Kong and I.-C. Park, ``Efficient implementation of multiple interleavers in
IDMA for 5G,'' Proc. of International SoC Design Conference (ISOCC), Daegu, South
Korea, pp. 119-120, Nov. 2018.

O. Y. Takeshita and D. J. Costello, ``New classes of algebraic interleavers for turbo-codes,''
Proc. of IEEE International Symposium on Information Theory (ISIT), Cambridge, MA,
USA, p. 419, Aug. 1998.

S. Yoshizawa, M. Nozaki, and H. Tanimoto, ``VLSI implementation of an interference
canceller using dual-frame processing for OFDM-IDMA systems,'' IEICE Transactions
on Fundamentals of Electronics, Communications and Computer Sciences, vol. E98-A,
no. 3, pp. 811-819, Mar. 2015.

T. T. T. Nguyen, L. Lanante, S. Yoshizawa, and H. Ochi, ``Low latency IDMA with interleaved
domain architecture for 5G communications,'' IEEE Journal on Emerging and Selected
Topics in Circuits and Systems, vol. 7, no. 4, pp. 582-593, Dec. 2017.

B. Y. Kong and I.-C. Park, ``A memory-efficient IDMA architecture based on on-the-fly
dispreading,'' IEEE Journal of Solid-State Circuits, vol. 53, no. 11, pp. 3327-3337,
Nov. 2018.

B. Y. Kong and I.-C. Park, ``A 120-mW 0.16-ms-latency connectivity-scalable multiuser
detector for interleave division multiple access,'' IEEE Transactions on Circuits
and Systems II: Express Briefs, vol. 67, no. 3, pp. 470-474, Mar. 2020.

B. Y. Kong and I.-C. Park, ``Improved parallel-IDMA architecture with low-complexity
elementary signal estimators,'' Proc. of IEEE International Symposium on Circuits
and Systems (ISCAS), Seville, Spain, pp. 1-4, Oct. 2020.

B. Y. Kong, ``A 97-mW bitwise-early-terminating multiuser detector for IDMA systems,''
IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 8, pp.
3390-3394, Aug. 2022.

B. Y. Kong, ``Fixed-latency architecture for multi-stage algebraic interleavers in
interleave division multiple access systems,'' Electronics Letters, vol. 58, no. 22,
pp.

B. Y. Kong, ``Conversion-less algebraic interleaver architecture for low-latency IDMA
systems,'' Electronics Letters, vol. 59, no. 13, pp. 1-3, Jul. 2023.

R. C. Baumann, ``Radiation-induced soft errors in advanced semiconductor technologies,''
IEEE Transactions on Device and Materials Reliability, vol. 5, no. 3, pp. 305-316,
Sep. 2005.

S. Lin and D. J. Costello, Error Control Coding: Fundamentals and Applications, 2nd
ed., Pearson Prentice Hall, Upper Saddle River, NJ, USA, 2004.

B. Y. Kong, ``Statistical analysis of bitwise early termination for iterative multiuser
detection in IDMA systems,'' Proc. of International Conference on Electronics, Information
and Communications (ICEIC), Taipei, Taiwan, pp. 1-3, Jan. 2024.

B. Y. Kong, ``Low-complexity address generation for multiuser detectors in IDMA systems,''
Electronics, vol. 9, no. 12, 2069, Dec. 2020.

Byeong Yong Kong received his B.S., M.S., and Ph.D. degrees in electrical engineering
from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South
Korea, in 2011, 2013, and 2017, respectively. From 2017 to 2018, he was a Senior Researcher
with the Agency for Defense Development, Daejeon, where he was involved in the research
of guided missile systems. From 2018 to 2019, he was a Postdoctoral Researcher and
a Research Assistant Professor with KAIST. Since 2019, he has been with the Division
of Electrical, Electronic, and Control Engineering, Kongju National University, Cheonan,
South Korea, where he is currently an Associate Professor. His research interests
include digital circuit design and very large scale integration (VLSI) signal processing
for communications. Dr. Kong was a recipient of the 1st Place Award at the Altera
FPGA Design Contest in 2015.