Mobile QR Code QR CODE

Main Menu

The Journal of Semiconductor Technology and Science (JSTS) is an international, peer-reviewed, and open-access journal that is published bimonthly.
- Scope: semiconductor processes, devices, circuits, and MEMS.
- Editor-in-Chief: Prof. Woo Young Choi (ECE, Seoul National University)
- Indexed within Science Citation Index Expanded (SCIE), SCOPUS, Korea Citation Index (KCI), and other databases.

Journal Search

[

Research article

]

JSTS(Journal of Semiconductor Technology and Science)

IEIE Vol. 20, No. 2, p.220-223

ISSN (print) :

1598-1657

ISSN (online) :

2233-4866

Received : 9 October 2019Accepted : 17 March 2020

DOI :

https://doi.org/10.5573/JSTS.2020.20.2.220

High-efficiency Low-latency NTT Polynomial Multiplier for Ring-LWE Cryptography

Tan Tuy Nguyen¹ Nguyen Tram Thi Bao² Lee Hanho²^*

(Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam)
(Dept. of Information and Communication Eng., Inha University, Incheon, 22212, Korea)

^* E-mail: hhlee@inha.ac.kr

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

This paper presents a novel architecture to perform polynomial multiplication in ring learning with errors (ring-LWE) cryptosystems. By employing number theoretic transform (NTT) of the input polynomials simultaneously, the multiplication latency is significantly reduced. In addition, a multiple-path delay feedback (MDF) architecture is used to speed up the multiplication process. As a result, the proposed NTT multiplier offers a better value of area-latency product compared with that of previous studies. The simulation results for the security parameters n = 512 and q = 12,289 on Xilinx Virtex-7 FPGA show that the proposed multiplier uses only about 8.69% of the number of clock cycles required by previous works to completely perform the polynomial multiplication. Furthermore, the obtained area-latency product value of the proposed architecture is less than 45.3% of that of previous works.

Index Terms

Cryptography, multiple-path delay feedback, NTT polynomial multiplier, ring-LWE

I. Introduction

Conventional cryptosystems such as RSA and elliptic curve cryptography (ECC) can be solved in polynomial time with the algorithms proposed by Shor using a quantum computer ⁽¹⁾. To replace these classic cryptosystems, post-quantum cryptosystems offering higher levels of security have been researched recently. Among the existing post-quantum cryptosystems, ring-LWE cryptography is a promising candidate because its security is based on the worst-case hardness problem ⁽²⁾. The basic concept of ring-LWE cryptography can be found in ^(2,³⁾. The block diagram of a ring-LWE cryptosystem is shown in Fig. 1. The input message m is encrypted into ciphertext $\left(c_{1}, \quad c_{2}\right)$ using arithmetic computation on public key (a, p) and error polynomials e1, e2, and e3. Original message m can be recovered from ciphertext $\left(c_{1}, \quad c_{2}\right)$ and private key $r_2$ through the decryption operation. The arithmetic operations in ring-LWE cryptography include polynomial addition, modulus, and polynomial multiplication, in which polynomial multiplication is the basic and most computationally intensive operation ⁽⁴⁾.

Several architectures to perform polynomial multiplication have been introduced in literatures. Among the existing approaches used to conduct polynomial multiplication, the number theoretic transform (NTT) studied in ^(4-⁶⁾ is an efficient method. Let $a(x)$ be a polynomial over the ring $\left.R_{q}=Z_{q} / \underline{(} x^{n}+1\right)$. Then, the NTT transform of each coefficient of $a(x)$ and its inverse NTT (INTT) with a primitive n-th root of unity ${\omega}$ can be expressed as follows:

Fig. 2. Block diagram of the proposed NTT polynomial multiplier.

Fig. 3. Timing diagram of the proposed NTT polynomial multiplier.

(1)

$A_{i}=\mathrm{NTT}_{\omega}^{n}(a(x))_{i}=\sum_{j=0}^{n-1} a_{j} \omega^{i j} \bmod q$

(2)

$a_{i}=\operatorname{INTT}_{\omega^{-1}}^{n}\left(A_{i}\right)=n^{-1} \sum_{j=0}^{n-1} A_{j} \omega^{-i j} \bmod q$

The multiplication of two polynomials $a(x)$ and $b(x)$ is computed as (3), where ∘ denotes a point-wise multiplication, and two polynomials $a^{\prime}(x)$ and $b^{\prime}(x)$ are constructed as shown in (4).

(3)

$c(x)=a(x) \cdot b(x)=\operatorname{INNT}_{\omega}^{2 n}\left(\mathrm{NTT}_{\omega}^{2 n}\left(a^{\prime}\right) \circ \mathrm{NTT}_{\omega}^{2 n}\left(b^{\prime}\right)\right.$

(4)

$a^{\prime}=\left(a_{0}, \varphi a_{1}, \dots, \varphi a_{n-1}\right), b^{\prime}=\left(b_{0}, \varphi b_{1}, \dots, \varphi b_{n-1}\right)$

In ⁽⁴⁾, authors implemented an n-point NTT-core architecture using a systolic array to speed up the multiplication process compared with previous studies. However, this architecture is deployed using a single-path delay feedback (SDF), which offers a low throughput and efficiency. In addition, the SDF and multiple-path delay commutator (MDC) architectures deployed in recent researches ^(5,⁶⁾ are not sufficient to process the polynomial multiplication.

In this work, a multiple-path delay feedback (MDF) based NTT polynomial multiplier offering high efficiency with low latency is presented. By employing the NTT process of two polynomials $a(x)$ and $b(x)$ concurrently, the multiplication latency is reduced considerably. Additionally, the NTT core is designed using MDF architecture to achieve a better performance compared with previous polynomial multipliers.

II. Proposed NTT Polynomial Multiplier

The block diagram of the proposed NTT polynomial multiplication is shown in Fig. 2. To conduct polynomial multiplication, the coefficients of input polynomials $a(x)$ and $b(x)$ are initially multiplied by a factor ${\varphi}$, where ${\varphi}$ ${\equiv}$ ${\omega}$ mod q, to get the polynomials $a^{\prime}(x)$ and $b^{\prime}(x)$. These multiplication results are then transformed using NTT cores, followed by a point-wise multiplication and an INTT computation.

The polynomial multiplication result $c(x)$ is completely computed by multiplying the output of INTT with the factor ${\varphi}$-1. The entire operations of the multiplier are directed by a controller. To speed up the multiplication of the two polynomials $a(x)$ and $b(x)$, negative wrapped convolutions as well as NTT operations are controlled so that the multiplication latency is minimized. Specifically, the two NTT operations NTT($a^{\prime}$) and NTT($b^{\prime}$) are processed simultaneously, resulting in the reduction of one NTT operation compared with the architecture in ⁽⁵⁾. The negative wrapped convolutions of the two input polynomials $a(x)$ and $b(x)$ are also conducted in parallel to decrease the processing time. Fig. 3 shows the timing diagram of the proposed NTT polynomial multiplier. As can be seen from Fig. 3, two computations a ${\times}$ ${\varphi}$ and b ${\times}$ ${\varphi}$ are scheduled to work simultaneously. Two NTT cores transforms NTT($a^{\prime}$) and NTT($b^{\prime}$) also work in parallel. Compared with the conventional architecture presented in ⁽⁵⁾, the proposed architecture offers a reduction in the processing time of one NTT computation and one negative wrapped convolution.

Fig. 4 presents the proposed NTT core using an 8-datapath MDF architecture to process the input data in parallel. Specifically, the coefficients of the input polynomial $a(x)$ are divided into 8 datapaths, which are processed by different processing elements (PE1s, PE2s). Each PE consists of modular reduction units (MR1, MR2), constant multiplier, butterfly unit (BU), and one first-in-first-out (FIFO) register.

III. Results and Comparisons

Table 1. Comparison of the proposed NTT polynomial multiplier architecture with other works for parameter n=512, q=12,289

	Proposed	[4]		[5]
Devices	Virtex-7	Spartan-6		Stratix
LUTs	63,522	6,119$^{(1)}$	3,750$^{(2)}$	2,064
Slices	13,588	2,062$^{(1)}$	1,348$^{(2)}$	1,819
Cycles	226	3,618$^{(1)}$	3,622$^{(2)}$	2,601
Frequency (MHz)	353	224	254	239
Time ($\mu$s)	0.64	16.11$^{(1)}$	14.26$^{(2)}$	10.85
Area-latency product (slice × $\mu$s)	8,696	33,218$^{(1)}$	19,222$^{(2)}$	19,736

$^{(1), (2)}$ Design 1 and design 2 in [4].

Fig. 4. (a) Proposed 8-datapath MDF-based NTT core architecture, (b) PE1, (c) PE2

(a)

(b)

(c)

The proposed NTT polynomial multiplier architecture is synthesized and implemented on a Xilinx Virtex-7 FPGA board. The simulation results and performance comparison are provided in Table 1. As can be seen from Table 1, the number of clock cycles necessary to complete a polynomial multiplication of the proposed architecture is only 226 clock cycles, which is about 6.25% and 8.69% of that of design 1 in ⁽⁴⁾ and the architecture in ⁽⁵⁾, respectively. In addition, the proposed architecture can operate at a higher clock frequency than others, resulting in a reduction in the multiplication latency. To compare the balance between area and latency of the proposed architecture with that of others, a parameter named area-latency product described in ⁽⁴⁾ is used. As presented in Table 1, the proposed polynomial multiplier architecture offers a much lower area-latency product compared with that of other architectures.

IV. Conclusions

An MDF-based NTT polynomial multiplier for ring-LWE cryptography is presented in this paper. By processing the NTT operation of the input polynomials concurrently, the multiplication process is speeded up. The implementation results prove that the proposed NTT polynomial multiplier achieves better performance than previous approaches. Therefore, it can be applied in ring-LWE and lattice-based cryptosystems to boost polynomial multiplication and improve encryption and decryption processes.

ACKNOWLEDGMENTS

This work was supported by the Basic Science Research Program (2019R1F1A1061926) through the NRF funded by the MSIT and by Inha University, Incheon, Korea.

REFERENCES

Shor P. W., Feb 1997, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer, SILAM J. Computing, Vol. 26, No. 5, pp. 1484-1509

Regev O., May 2005, On lattices, learning with errors, random linear codes, and cryptography, Annual ACM Symp. Theory of Computing, pp. 84-93

Tan T. N., Lee H., Feb 2019, High-secure fingerprint authentication system using ring-LWE crypto-graphy, IEEE Access, Vol. 7, No. 1, pp. 23379-23387

Chen D. D., Jan 2015, High-speed polynomial multiplication architecture for ring-LWE and SHE cryptosystems, IEEE Trans. Circuits Syst. I, Reg. Papers,, Vol. 62, No. 1, pp. 157-166

Rentería-Mejía C. P., Velasco-Medina J., 2014, Hardware design of an NTT-based polynomial multiplier, 2014 IX Southern Conf. Programmable Logic, Buenos Aires, pp. 1-5

Du C., Bai G., 2016, Efficient polynomial multiplier architecture for ring-LWE based public key cryptosystem, 2016 IEEE Int. Symp. Circuits Syst. (ISCAS2016), Montreal, QC, pp. 1162-1165

Author

Tuy Nguyen Tan

Tuy Nguyen Tan received the M.S. and Ph.D. degrees in Information and Communication Engineering from Inha University, South Korea, in 2016 and 2019, respectively.

From 2009 to 2013, he had been a Telecommunication engineer at GTel Mobile JSC., Danang Branch, Vietnam.

Since 2019, he has been a researcher at Faculty of Information Technology, Ton Duc Thang University, Vietnam.

His research interest includes algorithm and VLSI architecture design for cryptosystems and error correction codes.

Tram Thi Bao Nguyen

Tram Thi Bao Nguyen received the M.S. and Ph.D. degrees in Information and Communication Engineering from Inha University, South Korea, in 2016 and 2020, respectively.

Since 2013, she has been with Danang University, Vietnam where she is a lecturer.

Her research interests include VLSI architecture design for digital signal processing and communication systems.

Hanho Lee

Hanho Lee received the M.Sc. and Ph.D. degrees in electrical and computer engineering from the University of Minnesota, Minnea-polis, in 1996 and 2000, respectively.

From 2000 to 2002, he was a Member of the Technical Staff with Lucent Technologies (Bell Labs Innovations), Allentown.

From 2002 to 2004, he was an Assistant Professor with the Department of Electrical and Computer Engineering, University of Connecticut, USA.

Since 2004, he has been with the Department of Information and Communication Engineering, Inha University, where he is currently a Professor.

From 2010 to 2011, he was a Visiting Scholar with Bell Labs, Alcatel-Lucent, Murray Hill, NJ, USA.

His research interests include algorithm and architecture design for cryptographic, forward error correction coding, and digital signal processing.