ML-Driven Optimization of Standard Cell Performance and Timing in Advanced Nodes
HyunJoon Jeong1
Junha Suk2
Jeong-Taek Kong3
SoYoung Kim3
-
(Department of Electrical and Computer Engineering, College of Information and Communication
Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea)
-
(PDK Development Team, Foundry Division, Samsung Electronics Company Ltd., Giheung
17113, Republic of Korea)
-
(Department of Semiconductor Systems Engineering, College of Information and Communication
Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Standard cell, nanosheet field-effect transistor (NSFET), buried power rail (BPR), timing, performance, multi-objective Bayesian optimization (MOBO), artificial neural network (ANN)
I. INTRODUCTION
As technology nodes scale below 3 nm, gate control over the channel weakens significantly
[1]. To overcome this limitation, the nanosheet field-effect transistors (NSFETs) with
a gate-all-around (GAA) structure have been introduced [2-
6]. They provide a higher drive current per area [8]. However, the increased structural complexity of NSFETs intensifies the impact of
parasitic components, not only within the device but also from the middle of line
(MOL) and the back end of line (BEOL), which can significantly degrade both digital
and analog/RF circuit performance [7]. As a result, accurate prediction and effective minimization of parasitic effects
have become critical.
Previous works have proposed analytical models to predict parasitic components in
NSFETs while accounting for structural variations [9,
10]. These models, compatible with the Berkeley short-channel IGFET model (BSIM), enable
circuit performance evaluation through SPICE simulations. By incorporating actual
device structures, they facilitate accurate prediction and effective minimization
of parasitics, leading to performance optimization.
With the advancement of technology nodes, maintaining the timing balance in standard
cells has become increasingly challenging. Standard cell timing has been optimized
by adjusting the channel width and length of P- and N-type transistors [11-
15]. However, since channel length scaling has stagnated in advanced nodes, timing optimization
primarily depends on the channel width, which is limited by the fixed cell height
[12,
15].
Recently, machine learning (ML)-based optimization methodologies have been proposed
to improve device and circuit performance under such constraints. However, most prior
studies have primarily focused on device-level parameter optimization using conventional
evolutionary algorithms such as genetic algorithms (GA) [16] and NSGA-II [17], while a few works have attempted to extend the scope toward design-technology co-optimization
(DTCO) [18] or optimization for specific applications such as SRAM [19]. Although these early DTCO approaches are useful for optimizing device parameters,
they generally do not explicitly incorporate circuit-level timing balance into the
optimization objectives. They also tend to ignore design rule constraints imposed
by the fixed cell height, which are essential in standard cell layout design. In addition,
they often overlook the size of the datasets required before optimization. This paradoxically
leads to situations where the time required to generate training datasets can exceed
the computational cost of exploring optimal design candidates.
In this paper, we propose a novel standard cell optimization methodology that addresses
the challenges in performance optimization while simultaneously achieving timing balance.
The artificial neural network (ANN) model is trained to take structural parameters
as input and the key performance metrics as output. Once trained, an optimization
algorithm is applied to identify structural configurations that maximize performance.
To ensure design feasibility and improve computational efficiency, a constraint function
is introduced during the optimization process. This function limits the design space
to only those configurations that satisfy the given constraints. A novel sizing method
is incorporated to maintain the timing balance between P- and N-type transistors under
a limited cell height, achieving a 1:1 rise-fall delay ratio. As shown in Table 1, the proposed approach outperforms existing methods by optimizing both aspects without
being constrained by a fixed cell height. Moreover, instead of altering routing or
interconnect parameters, it selectively optimizes structural parameters that most
significantly impact performance. We employ an ANN-based model to predict standard
cell performance metrics (rise delay, fall delay, propagation delay, and total power)
as a function of structural parameter variations, and achieve equal or better prediction
accuracy and speed with significantly fewer training samples than simulation-based
approaches, thereby maximizing data efficiency in the optimization process. Our main
contributions are as follows:
Table 1. Comparison of prior related works and the proposed ML-driven optimization
methodology.
|
|
[16]
|
[17]
|
[18]
|
[19]
|
Proposed method
|
|
Optimization method
|
GA
|
NSGA-II
|
-
|
BO
|
MOBO
|
|
Optimization scope
|
Device-only
|
Device-only
|
Device-circuit
|
SRAM-only
|
Device-circuit
|
|
Design parameters
|
$L_g$, $W_{sheet}$, $T_{sheet}$, $R_{sheet}$, $L_{sp}$
|
$N_{sd}$, $N_{sub}$, $L_g$, $WF$, $W_{sheet}$, $T_{sheet}$, $T_{sus}$
|
Contact size, contact position
|
Tr. width, length, $NF$
|
$W_{sheet}$, P/N spacing, $NF$
|
|
Objectives
|
$V_{th}$, SS, $I_{on}$, $I_{off}$
|
Delay, power, gain, $f_T$
|
Delay, power, yield
|
Power, access time
|
Delay, power
|
|
Timing balance
|
x
|
x
|
x
|
o
|
o
|
|
Design rule constraint
|
x
|
x
|
x
|
x
|
o
|
-
We propose the ANN-based model for standard cell performance prediction that achieves
99% accuracy with 7.56$\times$ fewer training samples than SPICE simulation-based
approaches.
-
Applying the multi-objective Bayesian optimization (MOBO) for standard cell performance
and timing optimization, we reduce the propagation delay by 23.2% in high performance
designs, and the total power by 10.3% in low power designs.
-
The proposed method improves timing symmetry (i.e., balance between rise and fall
delays) by more than 15%.
In Section II, the device and standard cell structures and parameters that will be
used in this study are defined. In Section III, the process of generating the objective
function using the trained data through the ANN objective function model is explained.
In addition, the MOBO process is described to optimize total power and propagation
delay. In Section IV, the performance of test circuits is evaluated by applying the
optimized structures. Conclusions are given in Section V.
II. DEVICE AND STANDARD CELL STRUCTURES
1. Device Definition and Simulation Conditions
The NSFET was implemented using Synopsys Sentaurus TCAD [20]. Fig. 1 shows the 3-D and cross-sectional views of a typical 3-sheet NSFET structure. The
3-sheet 3 nm NSFET structural parameters were determined based on [2,
6,
21-
24]. The equivalent oxide thickness (EOT) reported in the international roadmap for devices
and systems (IRDS) was adopted, and the gate height was modeled independently from
the source/drain regions to accurately reflect actual device structures. In addition,
the channel doping concentrations are 1016 cm-3 for N-type and 1015 cm-3 for P-type, while the source/drain doping concentrations are 5$\times$1020 cm-3 for N-type and 1021 cm-3 for P-type. The structural parameters that are used in this study are shown in Table 2. As shown in Fig. 2, calibration was performed on the I-V measurement data of the IBM 3 nm NSFET [2] to verify the physical models applied in TCAD. The error was achieved within 1%.
Fig. 1. NSFET device structure: (a) 3D view, (b) Y-axis cross-section, and (c) X-axis
cross-section.
Table 2. Device structural parameters of 3 nm NSFET.
|
Geometrical parameters
|
Value [nm]
|
|
Contacted gate pitch ($CGP$)
|
44
|
|
Oxide thickness ($T_{ox}$)
|
1.5
|
|
Source/Drain (S/D) length ($L_{sd}$)
|
8.5
|
|
Gate length ($L_g$)
|
16
|
|
Spacer length ($L_{sp}$)
|
4
|
|
Sheet width ($W_{sheet}$)
|
25
|
|
Sheet thickness ($T_{sheet}$)
|
8
|
Fig. 2. Calibrated I-V curves with measurement data [2] in (a) linear scale and (b) log scale.
Using the calibrated TCAD dataset, the BSIM-CMG model parameters were extracted, and
the analytical parasitic capacitance models including contact parasitics [9] were incorporated into BSIM-CMG to construct the NSFET SPICE model. While the core
I-V models are preserved, geometry-dependent parasitic components are explicitly modeled,
enabling the proposed framework to achieve high accuracy in circuit-level performance
prediction with respect to layout-related parameters such as $W_{sheet}$ and contact
configurations.
2. Standard Cell Definition and Optimization Parameters
The standard cells were implemented using Synopsys Custom Compiler [27]. All layers used in the 3 nm NSFET layout were referenced from FreePDK3 [28]. Based on FreePDK3, the interconnect technology file (ITF) and a subset of design
rule parameters were modified to reflect our 3 nm NSFET and to satisfy the 5-track
cell height. In this study, the standard cell height is defined as the product of
the metal pitch and the number of tracks, and the power and ground rails adopt the
buried power rail (BPR) scheme provided by FreePDK3. The design rules are as follows:
The BPR is connected to M0A (MOL) through a VBPR (via), and to M0B (BEOL) through
a V0A (via). Additionally, M0B is connected to M1 (BEOL) through a V0B (via). The
standard cell is composed of sub-metal layers routed in both horizontal and vertical
directions. A design is carried out to meet these design requirements and to match
the 5-track cell height [29-
31]. The INV layout using the 3 nm NSFET is shown in Fig. 3 and the design parameters are shown in Table 3. Within the cell height constrained by the number of metal tracks, design parameters
that significantly affect parasitic component variations and their ranges were selected.
The width and height of interconnects, vias, and contacts, which can affect the performance
of standard cells, are fixed to minimum values for each process node and are difficult
to modify. Therefore, designers should select design parameters that can be adjusted
and have a significant impact on the parasitic components and key performance metrics
of the standard cell. The selected design parameters are sheet width ($W_{sheet}$),
P/N spacing, and the number of gate fingers ($NF$). First, in the design rules of
the developed standard cells, the minimum spacing between BPR and the sheet is 10
nm, and the minimum spacing between P-type and N-type transistors is 22 nm. Therefore,
within this range, $W_{sheet}$ can be set to 20-50 nm, and P/N spacing can be set
to 22-82 nm. Next, the driving strength is determined by the output load connected
during circuit operation, and the output load is set according to fan-out [14]. Fan-out 1-4 (FO 1-4) is a value that reflects the typical load condition in general
designs, and represents a reasonable and realistic load that is neither too small
nor too large. Moreover, it maintains a similar trend even when technology nodes change,
making it a suitable key metric for performance predictions for technology scaling
and high performance circuit designs. Therefore, the $NF$, which determines the driving
strength depending on the output load of the NSFET, is set to 1-4 [32].
Fig. 3. INV layout using 3 nm node NSFETs [28].
Table 3. Description of key layout design parameters used for standard cell construction.
|
Design parameters
|
Value [nm]
|
|
Cell height
|
173.5
|
|
BPR width
|
31.5
|
|
M0A width
|
15
|
|
V0A area
|
13 $\times$ 13
|
|
M0B width
|
12
|
|
M0B spacing
|
12
|
|
V0B area
|
10 $\times$ 14
|
|
M1 width
|
14
|
|
M1 spacing
|
14
|
To apply the proposed standard cell optimization method, three basic types of cells
(i.e., INV, NAND, and NOR) were selected as representatives [33]. For standard cells with more than two connected P- or N-type transistors, an increase
in channel width leads to a corresponding increase in the total effective capacitance.
As shown in Fig. 4(a), the NAND consists of N-type transistors connected in series. In this configuration,
the drain of the first NMOS and the source of the second NMOS form an internal node.
Due to the overlapping diffusion and junction capacitances of the two NMOS transistors
at this node, the fall delay increases during switching operations [34]. While the on-current increases with the channel width, it has been observed that
when $W_{sheet}$ exceeds 40 nm, the increase in total effective capacitance outweighs
the delay reduction effect. Therefore, the $W_{sheet}$ range of NAND is between 20-40
nm. In contrast, as shown in Fig. 4(b), the NOR cell has P-type transistors connected in series, leading to similar issues
to NAND. However, since the diffusion capacitance of P-type transistors is larger
than that of N-type transistors [6], the delay reduction effect in NOR is more significantly offset than that in NAND
[34]. Therefore, the $W_{sheet}$ range of NOR is smaller than NAND, which is between 20-35
nm. The range of design parameters for each standard cell is shown in Table 4.
Fig. 4. Schematic diagrams of (a) NAND and (b) NOR.
Table 4. Variation ranges of standard cells for 3 nm NSFET parameters.
|
|
$W_{sheet}$ [nm]
|
P/N spacing [nm]
|
$NF$
|
|
Value ranges for INV
|
20-50
|
22-82
|
1-4
|
|
Value ranges for NAND
|
20-40
|
22-82
|
1-4
|
|
Value ranges for NOR
|
20-35
|
22-82
|
1-4
|
III. PROPOSED ML-BASED OPTIMIZATION METHOD
1. ANN Model for Objective Function
To generate the datasets, the Sobol sampling method was used [35]. This method is known to efficiently explore the design space with a minimal number
of samples, and it helps in obtaining datasets with various parameters. The performance
of the standard cell was evaluated using 135, 256, 512, and 1024 samples extracted
through Sobol sampling. After performing parasitic extraction (PEX) using Synopsys
StarRC, post-layout simulations were conducted using Synopsys HSPICE to extract rise
delay ($t_{pLH}$), fall delay ($t_{pHL}$), propagation delay ($t_p$), and total power
($P$) [36,
37]. Here, total power is expressed as the sum of static power and dynamic power. The
ANN model was trained using 135 samples generated by Sobol sampling. After training,
the ANN model created a multi-objective function, and the relationship between input
$X$ and output $Y$ was expressed using the transpose matrix of the weights ($W$) and
the biases ($b$) as follows:
The ANN objective function model consists of one input layer, two hidden layers with
20 hidden neurons each, and one output layer, as shown in Fig. 5. The number of neurons in each hidden layer was set to 20 because this configuration
achieves approximately 99% test accuracy while maintaining a modest model size, as
shown in Fig. 6. The input layer is composed of $W_{sheet}$ , P/N spacing, and $NF$, which represent
the standard cell structure parameters, while the output layer consists of the performance
metrics of the standard cell to be optimized, namely $t_{pHL}$, $t_{pLH}$, $t_p$,
and $P$. When determining the size of the ANN model, the amount of training data must
be considered. This is because as the number of training data increases, the model
size also increases, which in turn raises computational costs and the risk of overfitting.
During the data preprocessing stage, the input was scaled using the Min-Max scaler,
and the output was logarithmically scaled for learning efficiency. Hyperbolic tangent
was used as the activation function and the Adam optimizer was used. In addition,
the mean squared error loss function was used to evaluate the model accuracy, and
early stopping was implemented to prevent overfitting due to larger epochs.
Fig. 5. Architecture of the proposed ANN-based objective function model.
Fig. 6. Test error according to the ANN model size.
To evaluate the test accuracy, we gradually increased the training data from 8 to
135 samples using the Sobol sampling method. The results are shown in Fig. 7(a), where the accuracy reached 99% when 135 samples were used. The training and test
datasets consisted of 135 Sobol sampling data points and 30 random samples, respectively,
and the simulation ran for 30,000 iterations [38]. Fig. 7(b) shows the average loss incurred when the trained ANN model makes predictions on a
validation dataset that was not used in the training process. This is used to evaluate
whether the model is overfitting the training data. It can be observed that as the
number of iterations increases, the validation loss decreases, indicating that the
ANN model used in this study achieves higher accuracy on the validation dataset. To
accelerate the training, the NVIDIA Titan XP GPU was used, and it was confirmed that
the ANN model completed training in about 2 minutes. Fig. 8 compares the accuracy between the trained ANN data and the data obtained through
simulation, achieving an accuracy of over 98.5%. In this study, standard cells that
satisfy a rise and fall delay ratio of 1:1 were designed with the objective of minimizing
the propagation delay (high performance) and total power (low power) based on the
design goals. This objective can be expressed as follows.
Fig. 7. (a) Test accuracy according to the number of training data with Sobol sampling
and (b) validation loss according to epoch.
Fig. 8. Accuracy of the ANN-based objective function model for NAND cells: (a) propagation
delay and (b) total power, compared against SPICE simulation.
2. MOBO Model for Optimization
We optimized the standard cell performance using the MOBO model, which efficiently
balances multiple performance metrics with high prediction accuracy using a small
number of samples. By leveraging a Gaussian process-based probabilistic model and
an acquisition function, MOBO selects informative accelerate convergence. It effectively
predicts the Pareto frontier and iteratively refines designs.
The MOBO model uses a Gaussian process regression (GPR) model [38]. The GPR model represents the relationship between the standard cell structure parameters
and performance, considering the uncertainty of the objective function expressed by
the variance of the new predicted data. Furthermore, it models the uncertainty between
the initial training dataset and the currently explored dataset, and after selecting
new samples, it identifies the standard cell structure that maximizes the hypervolume
using the acquisition function. Finally, the constraint function is applied to filter
out the optimal standard cell structures that do not satisfy the 1:1 rise and fall
delay ratio. The algorithm uses the parallel noise-expected hypervolume improvement
(qNEHVI) acquisition function of the Botorch package for parallel processing of MOBO
to determine the next candidate [39]. The output corresponding to the explored input points is interpreted by the ANN
objective function model. As a result, the input-output design space is updated by
the GPR model. This process is repeated for the defined 80 optimization iterations.
The function selects the next data point whose hypervolume is significantly improved
compared to the reference point in the MOBO process. Here, the hypervolume is used
as a performance metric for Bayesian optimization, which allows for a wider exploration
of the objective function space [38].
IV. RESULTS AND DISCUSSIONS
1. Standard Cell Results
We compared the performance of standard cells optimized using the Sobol sampling method
with that optimized using the MOBO model. Conventional performance optimization methods,
based on different design rules and layer structures, are excluded from comparison
[14,
40]. Fig. 9 shows the hypervolume for the acquisition function, indicating that the optimization
process converges sufficiently after 80 iterations. Fig. 10 shows the change in the optimal values explored within the design space as the number
of simulation samples increases. A larger number of samples is needed to explore the
optimal values. However, as the number of samples increases beyond 1024, the optimal
point almost converges. Therefore, we set the optimal number of simulation samples
to 1024 using the Sobol sampling method. The constraint was set to ensure a rise and
fall delay ratio of 1:1. Data that did not satisfy the constraint were excluded from
the candidate group for the MOBO process iterations.
Fig. 9. Hypervolume evolution of the MOBO acquisition function for (a) high performance
and (b) low power applications.
Fig. 10. Optimal values according to the number of simulation samples: (a) Propagation
delay and (b) total power.
Fig. 11 shows the distribution of optimal points satisfying the constraints for each design
objective of NAND obtained using Sobol sampling-based simulation and the MOBO model.
While the simulation-based approach required 1024 samples to reach optimal solutions,
the MOBO model achieved better results with only 135 samples. As a result, MOBO identified
6, 4, 5, 2, 4, and 2 additional valid design points within the optimal space for INV
HP/LP, NAND HP/LP, and NOR HP/LP, respectively. Tables 5 and 6 show the optimal structural parameters, performance, and required sample numbers.
First, the structures optimized by the proposed MOBO framework generally exhibit smaller
$W_{sheet}$ and appropriately adjusted P/N spacing compared with the simulation-based
optimal structural parameters, while the required drive strength tends to be compensated
by adjusting $NF$. For example, in the HP INV case, the PMOS and NMOS $W_{sheet}$
values are reduced from 30/30 nm to 20/20 nm, and the P/N spacing is increased from
41 nm to 76 nm, whereas the NMOS $NF$ is increased from 3 to 4. This configuration
reduces the gate, diffusion, and junction capacitances at the P/N boundary by increasing
the P/N spacing, while maintaining sufficient drive current through the increased
$NF$, thereby reducing the propagation delay of the HP INV by approximately 23.2%.
In the LP application, the proposed MOBO framework mainly selects structures that
minimize power by shrinking the $W_{sheet}$ of transistors connected in series in
NAND/NOR cells to suppress diffusion capacitance at internal nodes, and by adjusting
the P/N spacing. As a result, it achieves up to 10.3% reduction in power for the LP
NAND case compared with the simulation-based optimal structural parameters.
Fig. 11. Comparison of optimized NAND cell performance: (a) high performance mode
and (b) low power applications.
Table 5. Comparison of optimal structural parameters obtained from simulation-based
and MOBO-based optimization.
|
|
|
Optimal structural parameters [nm]
|
|
PMOS $W_{sheet}$
|
PMOS $NF$
|
NMOS $W_{sheet}$
|
NMOS $NF$
|
P/N spacing
|
|
HP
|
LP
|
HP
|
LP
|
HP
|
LP
|
HP
|
LP
|
HP
|
LP
|
|
INV
|
Sim.
|
30
|
40
|
4
|
1
|
30
|
30
|
3
|
1
|
41
|
32
|
|
MOBO
|
20
|
26
|
4
|
1
|
20
|
34
|
4
|
1
|
76
|
52
|
|
NAND
|
Sim.
|
30
|
30
|
4
|
2
|
20
|
30
|
3
|
2
|
62
|
62
|
|
MOBO
|
26
|
26
|
3
|
2
|
20
|
20
|
4
|
2
|
70
|
58
|
|
NOR
|
Sim.
|
20
|
25
|
4
|
3
|
25
|
25
|
2
|
1
|
72
|
62
|
|
MOBO
|
20
|
20.5
|
4
|
2
|
23.5
|
20.5
|
2
|
1
|
52
|
52
|
Table 6. Performance comparison of optimal structures obtained using simulation-based
and MOBO-based optimization.
|
|
|
$t_{pLH}$ [ps]
|
$t_{pHL}$ [ps]
|
$t_p$ [ps]
|
$P$ [nW]
|
Required samples
|
|
HP
|
LP
|
HP
|
LP
|
HP
|
LP
|
HP
|
LP
|
|
INV
|
Sim.
|
8.19
|
17.50
|
8.51
|
17.08
|
8.35
|
-
|
-
|
696.29
|
1024
|
|
MOBO
|
6.45
|
18.35
|
6.35
|
18.26
|
6.41
|
-
|
-
|
625.49
|
135
|
|
NAND
|
Sim.
|
12.30
|
23.60
|
14.40
|
25.60
|
13.40
|
-
|
-
|
718.47
|
1024
|
|
MOBO
|
11.58
|
18.94
|
11.71
|
18.93
|
11.65
|
-
|
-
|
644.17
|
135
|
|
NOR
|
Sim.
|
11.10
|
14.10
|
13.20
|
16.20
|
12.20
|
-
|
-
|
684.56
|
1024
|
|
MOBO
|
10.07
|
17.13
|
10.08
|
17.14
|
10.08
|
-
|
-
|
622.11
|
135
|
We further performed a Spearman correlation analysis to investigate how the explored
directions of the optimal structural parameters obtained from the proposed MOBO framework
affect delay and power. The Spearman correlation is a metric that can quantify the
importance of nonlinear relationships between input and output variables [41]. As shown in Table 7, the propagation delay of the three cells (INV, NAND, and NOR) exhibits a positive
correlation with the PMOS and NMOS $W_{sheet}$ , whereas it shows a negative correlation
with P/N spacing, PMOS $NF$, and NMOS $NF$. This quantitatively supports that an excessive
increase in $W_{sheet}$ leads to an increased cell propagation delay, while widening
the P/N spacing or appropriately adjusting $NF$ is more effective in reducing propagation
delay. For power, the correlation coefficients between power and NMOS $NF$ are 0.67,
0.73, and 0.85 for INV, NAND cells, and NOR, respectively, indicating that an increase
in NMOS $NF$ is a dominant factor that increases total power consumption.
Table 7. Spearman correlation coefficients between structural design parameters and
performance metrics.
|
|
INV
|
NAND
|
NOR
|
|
$\rho(t_p)$
|
$\rho(P)$
|
$\rho(t_p)$
|
$\rho(P)$
|
$\rho(t_p)$
|
$\rho(P)$
|
|
PMOS $W_{sheet}$
|
0.66
|
0.38
|
0.59
|
0.46
|
0.34
|
-0.19
|
|
PMOS $NF$
|
-0.36
|
0.23
|
-0.36
|
0.22
|
-0.63
|
0.49
|
|
NMOS $W_{sheet}$
|
0.50
|
0.52
|
0.62
|
0.33
|
0.10
|
-0.04
|
|
NMOS $NF$
|
-0.27
|
0.67
|
-0.27
|
0.73
|
-0.64
|
0.85
|
|
P/N spacing
|
-0.50
|
-0.35
|
-0.36
|
-0.26
|
-0.06
|
-0.03
|
Fig. 12 shows the rise and fall delays of the final standard cells designed using optimized
INV, NAND, and NOR. In the cases of NAND and NOR HP, the rise and fall delay ratios
are improved by 15.8% and 18.7%, respectively. Since complex standard cells (i.e.,
XOR, AOI, OAI, DFF, etc.) consist of a combination of INV, NAND, and NOR, optimization
can be well achieved. In addition, the rise and fall delay ratios of more complex
standard cells are improved. Therefore, MOBO can optimize the rise and fall delay
ratios of various standard cells to be much closer to 1:1 compared to simulation-based
approaches. Fig. 13 compares the performance of various standard cells. MOBO showed significantly better
performance optimization results than simulation-based optimization, resulting in
improved performance optimization for a variety of standard cells. The simulation
took 40 minutes for 80 iterations on a computer with an Intel Core i9-7900X CPU.
Fig. 12. Comparison of standard cell rise and fall delays.
Fig. 13. Comparison of standard cell performance across optimization methods: (a)
propagation delay and (b) total power.
2. Test Circuit Results
To evaluate the optimized standard cells, a 7-stage ring oscillator (RO) and a 4-bit
ripple carry adder (RCA) were used as test circuits [12]. Total power (LP) and propagation delay and frequency (HP) were measured for the
RO, while total power (LP) and propagation delay (HP) were measured for the RCA. The
frequency means the oscillation frequency measured during operation [42]. The SPICE model was adopted from previous research [9,
25,
26,
43,
44]. As shown in Table 8, the RO achieved a rise-fall delay ratio close to 1:1, which is about 12% better
than the simulation-based design. In addition, for HP, the propagation delay was reduced
by 22.9% and the frequency was improved by 29.9%. For LP, the total power was reduced
by 19.2%. The performance of the 4-bit RCA designed using various standard cells optimized
by simulation-based approach and the MOBO model was also evaluated. For HP, the propagation
delay was reduced by 18.6% and for LP, the total power was reduced by 11.9%. Based
on these results, we can conclude that not only can timing be symmetrically improved
in complex circuits but also overall performance can be significantly improved. Furthermore,
performance and power consumption improvements are expected at larger complex circuit
levels.
Table 8. Performance comparison of 7-stage RO and 4-bit RCA.
|
|
|
HP
|
LP
|
Diff. [%]
|
|
Sim.
|
MOBO
|
Sim.
|
MOBO
|
|
7-stage RO
|
$t_{PLH}$ [ps]
|
11.06
|
8.08
|
22.83
|
24.10
|
-
|
|
$t_{pHL}$ [ps]
|
9.78
|
7.97
|
20.35
|
23.49
|
-
|
|
$P$ [nW]
|
-
|
-
|
42.76
|
34.56
|
19.2
|
|
$t_p$ [ps]
|
10.42
|
8.03
|
-
|
-
|
22.9
|
|
Frequency [GHz]
|
6.85
|
8.9
|
-
|
-
|
29.9
|
|
4-bit RCA
|
$P$ [nW]
|
-
|
-
|
10.26
|
9.03
|
11.9
|
|
$t_p$ [ps]
|
18.92
|
15.41
|
-
|
-
|
18.6
|
V. CONCLUSIONS
In this work, we presented a ML-driven framework for standard cell performance optimization
that integrates Sobol sampling, an ANN-based objective function model, and MOBO engine.
Key Structural parameters that critically affect propagation delay and total power
were selected. Under a constraint function based on the fixed cell height and technology
design rules, the ANN objective function was defined to jointly minimize propagation
delay and total power.
The performance optimization was performed within the MOBO framework, using an objective
function and a constraint function with a sizing method to maintain rise and fall
delay balance. Both propagation delay and power consumption were reduced while using
7.56$\times$ fewer samples than simulation-based approaches. Validation on various
standard cells and on representative benchmark circuits (7-stage RO and 4-bit RCA)
demonstrated up to a 22% reduction in propagation delay.
Beyond performance optimization, the framework is readily extensible to statistical
variability analysis. By redefining the ANN objective to model variation-related metrics
(e.g., threshold-voltage mismatch, layout dependent effects (LDE), or MOL/BEOL RC
variability), the same MOBO framework can efficiently explore statistical tail regions
of device and circuit variability distributions. This capability provides a promising
alternative to Monte Carlo simulations, enabling accelerated statistical characterization
with significantly fewer samples.
Future work will extend this framework to more complex standard cells and larger logic
structures, such as AES blocks and high-fanout combinational paths. In addition, integrating
the methodology with automated cell layout generation and explicitly modeling MOL/BEOL
parasitics will further enhance its applicability to full-chip timing and power optimization
in advanced semiconductor nodes.
ACKNOWLEDGMENTS
This work was supported in part by the National Research Foundation(NRF) of Korea
grant funded by the Korean Government (MSIT) under Grant RS-2020-NR049544, RS-2025-16067451
and in part by Samsung Electronics Co., Ltd (IO250225-12099-01). The EDA tool was
supported by the IC Design Education Center (IDEC), South Korea.
REFERENCES
Jang D. , Yakimets D. , Eneman G. , Schuddinck P. , Bardon M. G. , Raghavan P. , Speddot
A. , Verkest D. , Mocuta A. , 2017, Device exploration of nanosheet transistors for
sub-7-nm technology node, IEEE Transactions on Electron Devices, Vol. 64, No. 6, pp.
2707-2713

Loubet N. , Hook T. , Montanini P. , Yeung C.-W. , Kanakasabapathy S. , Guillom M.
, 2017, Stacked nanosheet gate-all-around transistor to enable scaling beyond FinFET,
Proc. of 2017 Symposium on VLSI Technology, pp. T230-T231

Kim S. , Guillorn M. , Lauer I. , Oldiges P. , Hook T. , Na M. , 2015, Performance
trade-offs in FinFET and gate-all-around device architectures for 7-nm node and beyond,
Proc. of 2015 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference,
pp. 1-3

Seon Y. , Chang J. , Yoo C. , Jeon J. , 2021, Device and circuit exploration of multi-nanosheet
transistor for sub-3 nm technology node, Electronics, Vol. 10, No. 2

Wang M. , Sun Y. , Li X. , Shi Y. , Hu S. , Shang E. , Chen S. , 2020, Design technology
co-optimization for 3 nm gate-all-around nanosheet FETs, Proc. of Symposium on VLSI
Technology, pp. 230-231

Yoon J. , Jeong J. , Lee S. , Baek R. , 2018, Systematic DC/AC performance benchmarking
of sub-7-nm node FinFETs and nanosheet FETs, IEEE Journal of the Electron Devices
Society, Vol. 6, pp. 942-947

Choi M. , Park J.-H. , Choi S. , Kwon K. , Lee Y. , Jang W. , Jeon J. , 2023, Study
on the circuit performance of various interconnect metal materials in the latest process
nodes, Journal of Semiconductor Technology and Science, Vol. 23, No. 4, pp. 215-227

Loubet N. , Kal S. , Alix C. , Pancharatnam S. , Zhou H. , Durfee C. , 2019, A novel
dry selective etch of SiGe for the enablement of high performance logic stacked gate-all-around
nanosheet devices, Proc. of 2019 IEEE International Electron Devices Meeting, pp.
11.4.1-11.4.4

Suk J. , Kim Y. , Do J. , Kim G. , Baek S. , Kye J. , Kim S. , 2023, Analytical parasitic
resistance and capacitance models for nanosheet field-effect transistors, IEEE Transactions
on Electron Devices, Vol. 70, No. 6, pp. 2941-2946

Suk J. , Kim Y. , Do J. , Kim G. , Rim W. , Baek S. , Yoon S. , Kim S. , 2024, A process-aware
analytical gate resistance model for nanosheet field effect transistors, IEEE Journal
of the Electron Devices Society, Vol. 12, pp. 898-904

Gupta P. , Mandadapu H. , Gourishetty S. , Abbas Z. , 2019, Robust transistor sizing
for improved performances in digital circuits using optimization algorithms, Proc.
of 20th International Symposium on Quality Electronic Design, pp. 85-91

Chen Y. , Jiao H. , 2019, Standard cell optimization for ultra-low-voltage digital
circuits, Proc. of 2019 International Conference on IC Design and Technology, pp.
1-4

Nishizawa S. , Ishihara T. , Onodera H. , 2012, A flexible structure of standard cell
and its optimization method for near-threshold voltage operation, Proc. of 2012 IEEE
30th International Conference on Computer Design, pp. 235-240

Song T. , Jung H. , Yang G. , 2022, 3 nm gate-all-around (GAA) design technology co-optimization
(DTCO) for succeeding PPA by technology, Proc. of IEEE Custom Integrated Circuits
Conference, pp. 1-7

Yakimets D. , Bhuwalka K. K. , Wu H. , Rzepa G. , Karner M. , Liu C. , 2024, Inflection
points in GAA NS-FET to C-FET scaling considering impact of DTCO boosters, IEEE Transactions
on Electron Devices, Vol. 71, No. 4, pp. 2309-2314

Xu H. , Gan W. , Cao L. , Yin H. , Wu Z. , 2022, Prediction of key metrics of stacked
nanosheet nFETs using genetic algorithm-based neural networks, Proc. of 2022 IEEE
International Conference on Integrated Circuits, Technologies and Applications, pp.
3-4

Ehteshamuddin M. , Sheelvardhan K. , Kumar A. , Guglani S. , Roy S. , Dasgupta A.
, 2024, Machine learning-assisted multiobjective optimization of advanced node gate-all-around
transistor for logic and RF applications, IEEE Transactions on Electron Devices, Vol.
71, No. 2, pp. 976-982

Kwon U. , Okagaki T. , Song Y.-S. , 2018, Intelligent DTCO (iDTCO) for next generation
logic path-finding, Proc. of 2018 International Conference on Simulation of Semiconductor
Processes and Devices, pp. 49-52

Lee J. , Park J. , Kim S. , Jeong H. , 2023, Bayesian learning automated SRAM circuit
design for power and performance optimization, IEEE Transactions on Circuits and Systems
I: Regular Papers, Vol. 70, No. 12, pp. 4949-4961

2019, Sentaurus Device User Guide, Version P-2019.03

2021, International Roadmap for Devices and Systems (IRDS) 2021 Edition

Yoon J. , Baek R. , 2020, Device design guideline of 5-nm-node FinFETs and nanosheet
FETs for analog/RF applications, IEEE Access, Vol. 8, pp. 189395-189403

Sun Y. , Thompson S. E. , Nishida T. , 2007, Physics of strain effects in semiconductors
and metal-oxide-semiconductor field-effect transistors, Journal of Applied Physics,
Vol. 101, No. 10

Reboh S. , Coquand R. , Augendre E. , 2016, An analysis of stress evolution in stacked
GAA transistors, Proc. of IEEE Silicon Nanoelectronics Workshop, pp. 206-207

Woo S. , Jeong H. , Choi J. , Cho H. , Kong J.-T. , Kim S. , 2022, Machine learning-based
compact modeling for sub-3-nm-node emerging transistor, Electronics, Vol. 11

Jeong H. , Woo S. , Choi J. , Cho H. , Kim Y. , Kong J.-T. , Kim S. , 2023, Fast and
expandable ANN-based compact model and parameter extraction for emerging transistors,
IEEE Journal of the Electron Devices Society, Vol. 11, pp. 153-160

2020, Synopsys Custom Compiler User Guide, Version P-2020.12

Sushant S. , FreePDK3: A Novel PDK for Physical Verification at the 3nm Node, Ph.D.
dissertation

Kim T. , Jeong J. , Woo S. , 2023, NS3K: A 3-nm nanosheet FET standard cell library
development and its impact, IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, Vol. 31, No. 2, pp. 163-176

Yang G. , Jung H. , Lim J. , 2022, Standard cell design optimization with advanced
MOL technology in 3 nm GAA process, Proc. of IEEE Symposium on VLSI Technology and
Circuits, pp. 363-364

Shaji S. M. , Zhu L. , Yoon J. , Lim S. K. , 2023, A comparative study on front-side,
buried and backside power rail topologies in 3nm technology node, Proc. of IEEE/ACM
International Symposium on Low Power Electronics and Design, pp. 1-6

Lee Y. M. , Na M. H. , Chu A. , 2017, Accurate performance evaluation for the horizontal
nanosheet standard cell design space beyond 7nm technology, Proc. of IEEE International
Electron Devices Meeting, pp. 29.3.1-29.3.4

Chang K. , Kim T. , 2022, Analysis of impacting multi-stack standard cells on chip
implementation, Proc. of 19th International SoC Design Conference, pp. 119-120

Merino J. L. , Bota S. A. , Samitier J. , Analysis of series-connected MOSFETs for
gate delay optimization, Proc. of International Workshop on Power and Timing Modeling,
Optimization and Simulation, pp. 1-10

Sobol I. M. , 1967, On the distribution of points in a cube and the approximate evaluation
of integrals, USSR Computational Mathematics and Mathematical Physics, Vol. 7, No.
4, pp. 86-112

2021, StarRC User Guide, Version E2021.06

2020, HSPICE User Guide, Version E2020.12

Jeong H. , Choi J. , Cho H. , Woo S. , Kim Y. , Kong J.-T. , Kim S. , 2024, MOBO-driven
advanced sub-3-nm device optimization for enhanced PDP performance, IEEE Transactions
on Electron Devices, Vol. 71, No. 5, pp. 2881-2887

Daulton S. , Balandat M. , Bakshy E. , 2021, Parallel Bayesian optimization of multiple
noisy objectives with expected hypervolume improvement, Advances in Neural Information
Processing Systems, Vol. 34, pp. 2187-2200

Jeong J. , Ko J. , Song T. , 2022, A study on optimizing pin accessibility of standard
cells in the post-3 nm node, Proc. of Proceedings of the ACM/IEEE International Symposium
on Low Power Electronics and Design, pp. 1-6

Yoon J.-S. , Lee S. , Yun H. , Baek R.-H. , 2021, Digital/analog performance optimization
of vertical nanowire FETs using machine learning, IEEE Access, Vol. 9, pp. 29071-29077

Gaddemane G. , 2025, Exploring GAA-nanosheet, forksheet and GAA-forksheet architectures:
A TCAD-DTCO study at 90 nm and 120-nm cell height, IEEE Journal of the Electron Devices
Society, Vol. 13, pp. 769-782

Choi J. , Jeong H. , Woo S. , Cho H. , Kim Y. , Kong J.-T. , Kim S. , 2024, Enhancement
and expansion of the neural network-based compact model using a binning method, IEEE
Journal of the Electron Devices Society, Vol. 12, pp. 65-73

Jeong H. , Choi J. , Kim Y. , Kong J.-T. , Kim S. , 2024, Efficient neural network
compact modeling for novel device structure using multi-fidelity model and active
learning, Electronics, Vol. 13, No. 23, pp. 4840

HyunJoon Jeong received his B.S. degree in electrical engineering from Myongji University
in 2020. He is currently pursuing a Ph.D. degree at the Department of Electrical and
Computer Engineering, Sungkyunkwan University, Suwon, South Korea. His research interests
include DTCO, device modeling, cell optimization, and machine learning.
Junha Suk received his B.S. degree in electronic engineering from Korea National University
of Transportation in 2017 and his Ph.D. degree in electrical and computer engineering
from Sungkyunkwan University in 2025. In 2025, he joined the Samsung Electronics Foundry
Business, Giheung, Korea, where he is involved in the SPICE Group, PDK Development
Team. His research interests include gate-all-around transistor modeling, device-circuit
interaction, and PDK.
Jeong-Taek Kong received his B.S. degree in electronics engineering from Hanyang University,
Seoul, Republic of Korea in 1981, an M.S. degree in electronics engineering from Yonsei
University, Seoul, Republic of Korea in 1983, and a Ph.D. degree in electrical engineering
from Duke University, Durham, NC in 1994. From 1983 to 2014, he was with Semiconductor
Business, Samsung Electronics Co., as VP of CAE Team, Senior VP of Intellectual Property
Team, and Vice Chancellor of Samsung Institute of Technology. In 2014, he became a
Professor with the Department of Electronic Engineering, Hanyang University, Seoul,
Republic of Korea. He is currently a Professor with the Department of Semiconductor
Systems Engineering, College of Information and Communication Engineering, Sungkyunkwan
University, Suwon, Republic of Korea. His research interests include various EDA tools
and design methodologies.
SoYoung Kim received her B.S. degree in electrical engineering from Seoul National
University, Seoul, Korea, in 1997 and her M.S. and Ph.D. degrees in electrical engineering
from Stanford University, Stanford, CA, in 1999 and 2004, respectively. From 2004
to 2008, she was with Intel Corporation, Santa Clara, CA, where she worked on parasitic
extraction and simulation of on-chip interconnects. From 2008 to 2009, she was with
Cadence Design Systems, San Jose, CA, where she worked on developing IC power analysis
tools. She is currently a Professor with the Department of Semiconductor Systems Engineering,
College of Information and Communication Engineering, Sungkyunkwan University, Suwon,
Korea. Her research interests include VLSI computer-aided design, signal integrity,
power integrity, and electromagnetic interference in electronic systems.