I. INTRODUCTION
Physical Unclonable Functions (PUFs) ^{[1]} have attracted a great deal of research interest since they are expected to provide
inexpensive security primitives as digital fingerprints. By definition, a PUF is an
original function mapping challenges to responses, which is determined by the unique
physical characteristics most often resulted from manufacturing process variation.
Major applications of PUFs include device authentication and cryptographic key generation
for various kinds of digital rights management (DRM) ^{[2}^{7]}. PUFs are considered as one of the key technologies in internet of things (IoT) environment.
Since billions of devices need to identify and communicate to each other over unsafe
public networks, PUFs as lightweight hardware security primitives under the resourceconstrained
conditions are required.
Previous work has presented various structures to make ideal PUFs. They are fallen
into one of the two primary categories of weak PUFs and extensive PUFs. A weak PUF
supports a small number of challenge response pairs (CRPs) linearly related to its
hardware resource. It is also known as an obfuscating PUF or physically obfuscated
keys (POKs) since the primary application is data obfuscation for key generation.
Recently, PUFs exploiting memorybased cell arrays ^{[8}^{14]} have been presented with very high reliability performance. One of the limitations
of weak PUFs, however. is that they accompany computation overhead to implement highcomplexity
encryption and decryption algorithms for authentication. For each lowcost device
over IoT networks, it is not feasible to use computational resource and energy for
heavy hash functions. And also, weak PUFs are vulnerable to invasive attacks because
it is possible to read out the entire CRPs in relatively short time. For instance,
a semiinvasive, singletrace, backside readout of logic states has been successfully
applied to obtain the responses of SRAM PUFs ^{[15]}. Physical attacks such as probing, destruction, and reconstruction are very high
threats particularly to weak PUFs ^{[16]}. It is expected that adversaries can break the other various types of weak PUFs using
the similar techniques. Further research on tamper evidence is necessary to withstand
such invasive attacks.
An extensive PUF, on the other hand, supports a large enough number of CRPs exponentially
scalable with the amount of circuit components or the number of challenge bits. Attempts
to read out the entire CRPs in a limited time simply cannot be successful. Extensive
PUFs are considered as an excellent choice for IoT devices since they can implement
lowcost authentication without additional hash functions. Most extensive PUFs, however,
suffer from defending modeling attacks using machine learning (ML) technology. Previous
works showed that the responses of several types of PUF are exactly predicted with
probability higher than 90% ^{[17}^{19]}. In recent years, various techniques for extensive PUFs have been proposed to strengthen
ML attack resistance. Most of them employ arbiter PUFs, ring oscillator PUFs, or bistable
ring PUFs as basic elements and apply the proposed obfuscation techniques. Unfortunately,
the common problem has been that adversaries can make a linear model from even a small
amount of correlative information between CRPs. This means that it is interesting
and necessary to exploit nonlinear circuits and operations to eliminate or drastically
reduce the correlations.
This article presents a new extensive PUF with the secure structure using a feedforward
(FF) chain of bistable rings (BRs) as a parallel PUF. We employ feed forward operations
to every challenge bit not only to make each bit nonlinear but also to lengthen the
response time. The proposed PUF with 64bit challenge and 32bit response has been
implemented on an FPGA device with the suggested optimization and automation process.
We also built a support vector machine (SVM) to apply ML modeling attacks on the PUF.
Experimental results show that the prediction accuracy remains around 50% after training
104 CRPs. Note that the proposed PUF is a promising candidate for a new extensive
PUF with enhanced ML attack resistance.
The rest of this paper is organized as follows. Section II explains the overall lightweight
secure architecture of the proposed PUF as nonlinear circuit components of interconnect,
input, and output networks. Section III describes the detailed circuit implementation
of parallel PUF using feedforward chain structure of BR PUFs with actual optimization
methods. Section IV provides the major metrics and hardware setup for performance
evaluation with the results that shows the enhanced ML attack resistance. Section
V concludes the paper.
II. LIGHTWEIGHT SECURE PUF ARCHITECTURE
The proposed PUF has an overall architecture as depicted in Fig. 1, which consists of four components: interconnect, input network, BR FF chains, and
output network. Along the path from an nbit vector challenge C[n1:0] to an rbit
vector response R[r1:0], each component functions as follows. First, the interconnect
generates different challenges for parallel rows to reduce direct controllability
on the internal operations. After the input network applies nonlinear transform, each
BR FF chain then takes nbit challenge to generate a 1bit response. The output network
then transforms the intermediate response into the final rbit response.
Fig. 1. Overall architecture of the proposed PUF.
We adopted the design methodology introduced in Lightweight Secure (LS) PUF ^{[20]} since it has multiple nonlinear operations so that it is shown to be more resistant
to ML attacks. While an arbiter PUF is used as the parallel PUF in ^{[20]}, the proposed PUF employs a feed forward chain of bistable rings as discussed in
section III. Note that we present circuit implementation in Verilog hardware description
language (HDL) instead of general theoretical notions. To be clear and easy to understand,
all the expressions and equations are written in Verilog code throughout this paper.
We will explain each component in more detail.
Interconnect conducts circular shift operation to make each row have unique characteristics.
It generates shifted challenges for q rows, D^{[0]}[n1:0], D^{[1]}[n1:0], D^{[2]}[n1:0], {\ldots}, D[q1][n1:0], or a qelement array of nbit vectors, D[q1:0][n1:0].
This is to distribute unique local challenge for each row. It is remarkable that such
a simple operation improves security a lot since it prevents adversaries from bypassing
all the input network easily. Onetoone permutation function can be expressed formally
as follows.
where D[j][i] is the ith challenge bit in the jth row and % denotes the modulus
operator. Even if an attacker tried to bypass input network using an inverse transformation,
only one input network is bypassed, but the others cannot be bypassed.
Input network transforms nbit input D[j][n1:0] into nbit output E[j][n1:0] using
XOR gates. The input network is derived to guarantee strict avalanche criterion (SAC)
such that whenever a challenge bit flips, another challenge bit at n+1/2 locations
apart also flip. When one challenge bit flips, each output flips with a probability
of 0.5.
Fig. 2(a) shows an input network implementation for the abovementioned operation.
Fig. 2. (a) Input network implementation using XOR gates; (b) Output network example for q=5, r=4, x=4, s=1.
A parallel PUF calculate a 1bit output Q[j] from the nbit local challenge input
E[j][n1:0]. Actual PUF operation that transforms physical characteristics into a
digital data is done here. Detailed circuit implementation and FPGA optimization are
presented in section III. The parallel PUFs on the q rows make a qbit result vector
Q[q1:0].
Output network generates the rbit final response R[r1:0] by a lossy transformation
from Q[q1:0] as shown in Fig. 2(b) with an example. Each bit of R is generated by XORing xadjacent inputs where sets
are circularly shifted by s bits with respect to each other. The mapping is defined
by
where ^ denotes the unary reduction operation, which performs a bitwise XOR operation
on every bit to produce a single bit result. {Q,Q} gives a 2qbit concatenated result
so that we can express the circular shift operation easily. Note that r and x are
equal to or less than q. The example for q=5, r=4, x=4, s=0 is demonstrated in Fig. 2(b). This means that Q[4:0], 5bit response output from 5row parallel PUFs, are mapped
into R[3:0] by XORing 4adjacent responses circularly shifted by 1bit.
III. CIRCUIT IMPLEMENTATION OF A BR FEEDFORWARD CHAIN
Arbiter PUF is one of the most popular types of extensive PUFs implemented in silicon.
The reasons are that it only takes minimum hardware resource of delay elements to
provide exponential scalability on number of CRPs and that it generates very reliable
responses. Also, it is easy to understand and analyze its behaviors with a simple
additive delay model. This is, unfortunately, why arbiter PUF is inherently vulnerable
to modeling attacks. Previous works showed that the entire CRPs can be predicted with
99% probability through machine learning algorithms using just a small number of CRPs.
An extensive PUF based on pure arbiter PUFs can be an easy victim for ML attacks.
To make arbiter PUFs resilient to ML attacks, plenty of variants has been proposed.
Feedforward arbiter scheme ^{[21]} is one of the attempts to add unpredictable nonlinearity to fortify the security
of the original arbiter PUF. Racing computation results in previous stages are forwarded
to arbitrary next stages to replace the original challenges with the new nonlinear
ones. Note that this technique also lengthens response time to prevent adversaries
from massive CRP gathering in a short time.
BR PUF ^{[22]} is one of the extensive PUFs based on crosscoupled memory structures including SRAM
PUF ^{[23]}, butterfly PUF ^{[24]}, and flipflop PUF ^{[25]}. BR or cross coupled inverter pairs are one of the smallest circuits that provides
a form of challenge response pair. It also is easy to understand the behavioral model.
Fig. 3 depicts the schematic of a basic BR PUF and the proposed implementation in a practical
FPGA device. The smallest BR can be built with 2 crosscoupled inverters as shown
in Fig. 3(a). To make it a PUF with 2bit challenge, inverters are replaced with NAND gates with
one input of a common negative reset. The other input of each NAND is connected to
a MUX output so that one challenge bit selects which one out of the two identical
inputs is chosen. To implement the BR PUF onto an FPGA device, a NAND and a MUX can
be mapped into a single LUT4, which is a logical lookup table with four inputs. Each
output of LUT is fed back to the 2 inputs of the other LUT, so that they work as a
BR PUF. A unique response Q[j]^{[0]} is determined by 2bit challenge E[j][1:0]. A negative reset, rstn forces NAND gates
to output logic 1 whenever the PUF needs to be restarted.
Fig. 3. (a) The smallest BR; (b) A basic BR PUF with 2bit challenge; (c) A practical implementation of the BR on 2 LUTs in a Xilinx FPGA device.
Since BR PUF exhibits extensive and inherent nonlinearities, it requires much more
complex model to emulate its behaviors. In addition, BR PUF oscillates for a long
period of time before settlement under specific conditions. It usually takes relatively
long period of time to read out responses due to nonuniform distribution of settling
times. That makes collection of enough number of CRPs for modeling attacks in timely
manner very hard or impossible. As a result, we can consider BR PUF as more secure
type of extensive PUFs inherently.
A parallel PUF is implemented as a feedforward chain of BR PUFs. Fig. 4 shows a parallel PUF with an nbit challenge and its variation using a feedforward
XOR chain. A BR PUF consists of two LUTs as depicted in Fig. 3(c). BRs on the top and bottom respond with 1bit response of Q[j]^{[0]} from 2bit challenge of E[j][1:0] and with Q[j][n/21] from E[j][n1:n2], respectively.
First, nbit PUF can be built simply by making an array of n /2 ${\times}$ q BRs.
In order to add more nonlinearity, we then propose a new PUF with feedforward XOR
gates as shown Fig. 4. BR on the leftmost column gets 2bit XORed challenges of E[j][1:0] ^ Q[j][n/21]
instead of the original challenge of E[j][1:0]. BR on the next column gets E[j][3:2]
^ Q[j]^{[0]} as the same way. The PUF feedforward response from the previous stage to the next
stage to XOR the original challenges. As a consequence, the last stage BR PUF responds
as a final 1bit response. The parallel PUF is working with nbit challenge E[j][n1:0]
to make 1bit response Q[j].
Fig. 4. A parallel BR PUF with feedforward XOR chains.
We implemented the proposed PUF on a Xilinx Spartan 7 device, which is one of the
most popular costefficient FPGA products. Fig. 5 shows the placement of the PUF cells and detailed components inside each logic slice
and lookup table (LUT). A PUF with 64bit challenge and 32bit response occupies
2,144 LUTs. We can estimate the equivalent gate count of 8,576. Since the proposed
circuit consumes 36.82 nJ to generate a 32bit response, the energy per bit is 1.15
nJ approximately. Note that 1,024 BRs are regularly placed along the form of a 32x32
array on the bottom left corner of the FPGA as highlighted with blue dots. As a consequence,
routing wires between adjacent BRs have similar length to result in evenly distributed
delays. We can see the wires connecting two inverter pairs are routed symmetrically
in the logic slice. A design constraint has been generated automatically to apply
on the placement and routing phase. As we zoom in one blue dot, a logic slice contains
various components including 4 LUTs, two of which are allocated to build up a BR.
Zooming again in one LUT, two inverters, a MUX, and an AND gate are implemented into
a 4input LUT.
Fig. 5. Device utilization of the PUF cells and logic elements.
IV. PERFORMANCE EVALUATION
A lot of publications have discussed performance metrics of PUFs ^{[26}^{28]}. Among them, uniqueness and reliability are the two most common figures of merits
to characterize PUFs in general. We evaluate the proposed PUF with the two characteristics
at first, and then with ML attack resistance as one of the most critical characteristics
for an extensive PUF. Uniqueness represents how different the responses of a PUF instance
are from the responses of the other PUF instances with the identical design for the
same challenge. Average interdie Hamming distance or simply interHD is used to quantify
the uniqueness, which is expressed as follows.
where m is the number of PUF instances and n is the number of response bits. For 8
instances and 32bit responses, the proposed PUF shows the interHD of 48.31%. Each
response was measured 10 times.
Reliability represents how consistent the responses of a PUF are over varying temperature,
voltage, and time for the same challenge. Average intradie Hamming distance or intraHD
is used to quantify the reliability in general, which is expressed as follows.
where k is the number of nbit response samples, R_{i} is the reference response, and R’_{i,j} is the jth sample of R’_{i} in a different operating condition. As the temperature changes from 0℃ to 50℃ and
as the supply voltage changes from 90% to 110% of the nominal, reliability of 98.15%
was measured for the samples taken from each operating condition.
ML attack resistance represents how difficult for adversaries to build a model based
on ML technology. An extensive PUF should guarantee that any attempts allowed by available
stateoftheart resources to predict the responses with significant accuracy is not
feasible. We run a support vector machine (SVM) classifier ^{[29]} with a linear kernel as an arbitrary ML attack. SVM has been proven to be very efficient
for classification and regression analysis with representative works in computer vision,
natural language processing, neuroimaging, and bioinformatics. Note that SVM can perform
powerful modeling attacks against extensive PUFs with nonlinear operations using
kernel tricks which are mapping the challenge inputs into highdimensional feature
spaces. After training periods with different numbers of random CRPs from 10^{1} to 10^{4}, the SVM attempts to mimic the operation of the proposed PUF. We evaluate ML attack
resistance of the proposed PUF as prediction accuracy for the test set of 10^{5} CRPs. Intel Quadcore Q9300 was used as a commercial processor to run SVM.
Table 1. Comparison of prior arts of extensive PUFs

This work

BR

XOR

RO

Arbiter

Uniqueness
= interHD

48.31

44.10

46.15

47.24

7.20

Reliability
= 100 – intraHD

98.15

97.81

99.52

99.14

99.76

Uniformity
or randomness

49.49





50.56

55.69

Fig. 6 shows the prediction accuracy versus the number of CRPs for training. Note that we
trained the SVM with up to 10^{4} CRP instances and tested its prediction to the entire 10^{5} CRPs. While an ideal PUF stays completely unclonable with prediction accuracy of
0.5, practical PUFs show increasing prediction accuracy as the number of CRPs increases.
Prediction is correct when the entire 32bit value predicted perfectly match to the
actual PUF response. We can see that the accuracy for XOR PUF reaches 93.19% with
10^{3} training instances, FF PUF shows 92.88% with 10^{2} training instances. Note that BR reaches 95.71% for even 10 training instances, which
means not secure at all. On the other hand, it is remarkable that the prediction accuracy
for the proposed PUF remains around steady 50% regardless of the number of CRPs.
Fig. 6. Prediction accuracy vs. training CRP instances.
Bit error rate (BER) and unstable bit ratio (UBR) are 1.85 % and 9.17 %, respectively.
Those metrics are not as good as stateoftheart PUFs. Since heavy use of XOR gates
are known to be the cause of degradation on the stability of PUF responses, it is
necessary to study the quantitative impact of XOR gates and to find out the optimized
architecture with minimum use of XOR operations. We also expect the stability or reliability
to be improved by applying simple techniques such as temporal majority voting (TMV)
in the future work.
Randomness represents how close the proportions of ‘0’s and ‘1’s in the response bits
are. For the ideal PUF, this must be 50%. Percentage Hamming weight is used to quantify
the randomness in general, which is expressed as follows.
where R_{i,l} is lth bit of an nbit response from a ith PUF.
When the two LUTs in a BR have a significant difference in their load conditions,
the BR state will be biased to one direction. While we cannot guarantee the pair of
LUTs have exactly the same routing nets as implemented in an FPGA device, we can see
the wires between them are routed symmetrically as shown in the logic slice of Fig. 5. The feedforward paths and output XOR network also improve the randomness of the
final response. As a consequence, the randomness of 49.49% was observed for 10^{5} responses. Table 1 summarizes uniqueness, reliability, and randomness data to compare
the proposed PUF and prior arts of extensive PUFs.
V. CONCLUSIONS
We have introduced a new extensive PUF, which builds up a secure architecture of multiple
rows of BR feedforward chains. According to the experimental results, the proposed
PUF shows uniqueness of 48.31%, reliability of 98.15%, respectively. It also remains
secure against ML attack with training 10^{4} CRPs. Note that ML attack resistance has been enhanced compared to the existing PUFs
including arbiter PUF, feedforward, BR, and Lightweight Secure. While we applied
one of the most efficient ML attacks using normal computation resources and basic
algorithms, ML attacks are on their own evolutions. Thorough study on stateoftheart
attacks including reliabilitybased CMAES is necessary. Several parameters in the
proposed PUF will also be optimized in the future research.
ACKNOWLEDGMENTS
This work was supported by the Sun Moon Research Grant of 2021.
References
B. Gassend, et al, “Silicon physical random functions,” Computer and Communications
Security, 2002. CCS 2002, Proceedings of the ACM Conference on, pp. 148160, Nov.,
2002.
G. Vaidya and T. Prabhakar, “Hardware based identification for Intelligent Electronic
Devices,” 2022 IEEE/ACM Seventh International Conference on InternetofThings Design
and Implementation (IoTDI), 2022.
C. Herder, et al, “Physical Unclonable Functions and Applications: A Tutorial,” IEEE,
Proceedings of, Vol. 102, No. 8, pp. 11261141, May, 2014.
L. Bolotnyy and G. Robins, “Physically unclonable functionbased security and privacy
in RFID systems,” Pervasive Computing and Communications, 2007. PERCOM, 2007, IEEE
International Conference on, pp. 211220, Mar., 2007.
J. Guajardo, et al, “FPGA Intrinsic PUFs and Their Use for IP Protection,” Cryptographic
Hardware and Embedded Systems, 2007. CHES, 2007, Lecture Notes in Computer Science,
pp. 6380, 2007.
G. Suh and S. Devadas, “Physical Unclonable Functions for Device Authentication and
Secret Key Generation,” Design Automation Conference, 2007. DAC, 2007, Proceedings
of, pp. 914, Jun., 2007.
F. Koushanfar and M. Potkonjak, “CADbased Security, Cryptography, and Digital Rights
Management,” Design Automation Conference, 2007. DAC, 2007, Proceedings of, pp. 268269,
Jun., 2007.
J. Lee, et al, “A 20F2/Bit CurrentIntegrationBased Differential nandStructured
PUF for Stable and V/T VariationTolerant LowCost IoT Security,” SolidState Circuits,
IEEE Journal of, Vol. 57, No. 10, pp. 29572968, Oct., 2022.
J. Park, B. Kim, and J. Sim, “A BERSuppressed PUF With an Amplification of Process
Mismatch Effect in an Oscillator Collapse Topology,” SolidState Circuits, IEEE Journal
of, Vol. 57, No. 7, pp. 22082219, Jul., 2022.
L. Lu, T. Yoo, and T. Kim, “A 6T SRAM Based TwoDimensional Configurable ChallengeResponse
PUF for Portable Devices,” Circuits and Systems I: Regular Papers, IEEE Transactions
on, Vol. 69, No. 6, pp. 25422552, Jun., 2022.
D. Jeon, et al, “A 325 F2 Physical Unclonable Function Based on Contact Failure Probability
with Bit Error Rate < 0.43 ppm After Preselection With 0.0177% Discard Ratio,” SolidState
Circuits, IEEE Journal of, Vol. 57, No. 7, pp. 112, Jul., 2022.
Q. Zhao, et al, “A 108 F2/Bit Fully Reconfigurable RRAM PUF Based on Truly Random
Dynamic Entropy of Jitter Noise,” Circuits and Systems I: Regular Papers, IEEE Transactions
on, Vol. 67, No. 11, pp. 38663879, Nov., 2020.
Y. Choi, et al, “Physically unclonable function in 28nm fdsoi technology achieving
high reliability for aecq 100 grade 1 and iso 26262 asilb,” SolidState Circuits
Conference, 2020. IEEE International, ISSCC, 2020. Feb., 2020.
D. Li and K. Yang, “A SelfRegulated and Reconfigurable CMOS Physically Unclonable
Function Featuring ZeroOverhead Stabilization,” SolidState Circuits, IEEE Journal
of, Vol. 55, No. 1, pp. 98107, Jan., 2020.
D. Nedospasov and C. Helfmeier, “Invasive PUF Analysis,” Workshop on Fault Diagnosis
and Tolerance in Cryptography, 2013, Aug., 2013.
J. Zhang, et al, “A PUFFSM Binding Scheme for FPGA IP Protection and PayperDevice
Licensing,” Information Forensics and Security, IEEE Trans. on, vol. 14, no. 8, pp.
11371150, Jun., 2016.
J. Delvaux, “MachineLearning Attacks on PolyPUFs, OBPUFs, RPUFs, LHSPUFs, and PUF–FSMs,”
IEEE Trans. Inf. Forensic Secur., vol. 14, no. 8, pp. 20432058, Aug. 2019.
W. Z. Yu and Y. M. Wen, “Efficient hybrid sidechannel/machine learning attack on
XOR PUFs,” Electronics Letters, vol. 55, no. 20, pp. 10801082, Oct. 2019.
U. Rührmair, et al, “PUF modeling attacks on simulated and silicon data,” Information
Forensics and Security, IEEE Transaction on, vol. 8, pp. 18761891, Nov., 2013.
M. Majzoobi, F. Koushanfar, and M. Potkonjak, “Lightweight secure PUFs,” ComputerAided
Design, 2008. ICCAD, 2008, IEEE/ACM International Conference on, pp. 670673, Nov.,
2008.
D. Lim, et al, “Extracting secret keys from integrated circuits,” Very Large Scale
Integration (VLSI) Systems IEEE Transactions on, vol. 13, pp. 12001205, Oct., 2005.
Q. Chen, et al, “The Bistable Ring PUF: A new architecture for strong Physical Unclonable
Functions,” HardwareOriented Security and Trust, 2011. HOST, 2011, IEEE International
Symposium on, pp. 134141, Jun., 2011.
J. Guajardo, et al, “Physical Unclonable Functions and PublicKey Crypto for FPGA
IP Protection,” FPGA, 2007, Field Programmable Logic and Applications, International
Conf. on, Aug., 2007.
S. Kumar, et al, “The butterfly PUF: protecting IP on every FPGA,” HardwareOriented
Security and Trust, 2008. HOST, 2008, IEEE International Symposium on, pp. 6770,
Jun., 2008.
R. Maes, P. Tuyls, I. Verbauwhede, “Intrinsic PUFs from flipflops on reconfigurable
devices,” Information and System Security, 2008. WISSec, 2008, Benelux Workshop on,
pp. 117, 2008.
R. Maes and I. Verbauwhede, “PUF: a study on the state of the art and future research
directions,” Toward Hardwareintrinsic security, springerVerlag, pp. 317, 2010.
M. Majzoobi, F. Koushanfar, and M. Potkonjak, “Testing techniques for hardware security,”
ITC, 2008, IEEE International Test Conference, pp. 110, Oct. 2008.
A. Maiti, et al, “A Large Scale Characterization of ROPUF,” HardwareOriented Security
and Trust, 2010. HOST, 2010, IEEE International Symposium on, pp. 9499, Jun., 2010.
C. Chang and C. Lin, “LIBSVM: A library for support vector machines,” Intelligent
Systems and Technology, ACM Transactions on, Vol. 21, No. 3, pp. 127, Apr., 2011.
Jiho Han received the BS, MS, and PhD degrees in Electrical Engineering and Computer
Science from Seoul National Univ. in 2002, 2004, and 2009, respectively. From 2009
to 2014, he was with Samsung Electronics. He has been an associate professor in Sun
Moon Univ. since 2014. His research interests include hardware security and clock
synchronization.
Changyong Shin received the BS and MS degrees from Yonsei Univ. in 1993 and 1995,
respectively, and the PhD degree from the Univ. of Texas at Austin, in 2006, all in
Electrical Engineering. From 1995 to 2001 and from 2007 to 2014, he was with LG Electronics
and with SAIT, respectively. Since 2014, he has been with the Department of Smart
Information and Communications Engineering at Sun Moon Univ. His research interests
include wireless communications and signal processing for communications.