Mobile QR Code QR CODE

Main Menu

The Journal of Semiconductor Technology and Science (JSTS) is an international, peer-reviewed, and open-access journal that is published bimonthly.
- Scope: semiconductor processes, devices, circuits, and MEMS.
- Editor-in-Chief: Prof. Woo Young Choi (ECE, Seoul National University)
- Indexed within Science Citation Index Expanded (SCIE), SCOPUS, Korea Citation Index (KCI), and other databases.

Journal Search

[

Research article

]

JSTS(Journal of Semiconductor Technology and Science)

IEIE Vol. 26, No. 2, p.130-140

ISSN (print) :

1598-1657

ISSN (online) :

2233-4866

Received : 22 Sep. 2025Revised : 3 Dec. 2025Accepted : 7 Dec. 2025

DOI :

10.5573/JSTS.2026.26.2.130

ML-Driven Optimization of Standard Cell Performance and Timing in Advanced Nodes

HyunJoon Jeong¹ Junha Suk² Jeong-Taek Kong³ SoYoung Kim³

(Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea)
(PDK Development Team, Foundry Division, Samsung Electronics Company Ltd., Giheung 17113, Republic of Korea)
(Department of Semiconductor Systems Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea)

^*Corresponding Author : E-mail: jtkong@skku.edu, ksyoung@skku.edu

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Standard cell performance and timing optimization becomes increasingly challenging in advanced technology nodes such as sub-3 nm nanosheet FET (NSFET) with buried power rails (BPRs). In this paper, we propose a novel standard cell optimization methodology based on machine learning (ML) that simultaneously achieves performance improvement and timing balance while reducing simulation overhead. For INV/NAND2/NOR2 cell layouts designed with 3 nm NSFETs, we perform post-layout simulations using parasitic component extraction (PEX) to compute delays and power and generate a dataset. Using this dataset, we train an artificial neural network (ANN) model as an objective function and perform multi-objective Bayesian optimization (MOBO) under explicit design rules and cell height constraints to achieve 1:1 rise-fall delay symmetry across the cells. Within this framework, high performance (HP) applications target minimum propagation delay with 1:1 symmetry, while low power (LP) applications target minimum total power with the same symmetry. For 3 nm and beyond NSFET technology, delay is reduced by up to 23.2% for HP INV cells, and power is reduced by 10.3% for LP NAND2 cells. For NAND2/NOR2 cells, the rise-fall delay balance is improved by more than 15%. To evaluate the performance of the optimized standard cells, a 7-stage ring oscillator (RO) and a 4-bit ripple carry adder (RCA) were used as test circuits. The results show significant improvements in both delay and power efficiency.

Index Terms

Standard cell, nanosheet field-effect transistor (NSFET), buried power rail (BPR), timing, performance, multi-objective Bayesian optimization (MOBO), artificial neural network (ANN)

I. INTRODUCTION

As technology nodes scale below 3 nm, gate control over the channel weakens significantly ^[1]. To overcome this limitation, the nanosheet field-effect transistors (NSFETs) with a gate-all-around (GAA) structure have been introduced ^[2- ^6]. They provide a higher drive current per area ^[8]. However, the increased structural complexity of NSFETs intensifies the impact of parasitic components, not only within the device but also from the middle of line (MOL) and the back end of line (BEOL), which can significantly degrade both digital and analog/RF circuit performance ^[7]. As a result, accurate prediction and effective minimization of parasitic effects have become critical.

Previous works have proposed analytical models to predict parasitic components in NSFETs while accounting for structural variations ^[9, ^10]. These models, compatible with the Berkeley short-channel IGFET model (BSIM), enable circuit performance evaluation through SPICE simulations. By incorporating actual device structures, they facilitate accurate prediction and effective minimization of parasitics, leading to performance optimization.

With the advancement of technology nodes, maintaining the timing balance in standard cells has become increasingly challenging. Standard cell timing has been optimized by adjusting the channel width and length of P- and N-type transistors ^[11- ^15]. However, since channel length scaling has stagnated in advanced nodes, timing optimization primarily depends on the channel width, which is limited by the fixed cell height ^[12, ^15].

Recently, machine learning (ML)-based optimization methodologies have been proposed to improve device and circuit performance under such constraints. However, most prior studies have primarily focused on device-level parameter optimization using conventional evolutionary algorithms such as genetic algorithms (GA) ^[16] and NSGA-II ^[17], while a few works have attempted to extend the scope toward design-technology co-optimization (DTCO) ^[18] or optimization for specific applications such as SRAM ^[19]. Although these early DTCO approaches are useful for optimizing device parameters, they generally do not explicitly incorporate circuit-level timing balance into the optimization objectives. They also tend to ignore design rule constraints imposed by the fixed cell height, which are essential in standard cell layout design. In addition, they often overlook the size of the datasets required before optimization. This paradoxically leads to situations where the time required to generate training datasets can exceed the computational cost of exploring optimal design candidates.

In this paper, we propose a novel standard cell optimization methodology that addresses the challenges in performance optimization while simultaneously achieving timing balance. The artificial neural network (ANN) model is trained to take structural parameters as input and the key performance metrics as output. Once trained, an optimization algorithm is applied to identify structural configurations that maximize performance. To ensure design feasibility and improve computational efficiency, a constraint function is introduced during the optimization process. This function limits the design space to only those configurations that satisfy the given constraints. A novel sizing method is incorporated to maintain the timing balance between P- and N-type transistors under a limited cell height, achieving a 1:1 rise-fall delay ratio. As shown in Table 1, the proposed approach outperforms existing methods by optimizing both aspects without being constrained by a fixed cell height. Moreover, instead of altering routing or interconnect parameters, it selectively optimizes structural parameters that most significantly impact performance. We employ an ANN-based model to predict standard cell performance metrics (rise delay, fall delay, propagation delay, and total power) as a function of structural parameter variations, and achieve equal or better prediction accuracy and speed with significantly fewer training samples than simulation-based approaches, thereby maximizing data efficiency in the optimization process. Our main contributions are as follows:

Table 1. Comparison of prior related works and the proposed ML-driven optimization methodology.

	^[16]	^[17]	^[18]	^[19]	Proposed method
Optimization method	GA	NSGA-II	-	BO	MOBO
Optimization scope	Device-only	Device-only	Device-circuit	SRAM-only	Device-circuit
Design parameters	$L_g$, $W_{sheet}$, $T_{sheet}$, $R_{sheet}$, $L_{sp}$	$N_{sd}$, $N_{sub}$, $L_g$, $WF$, $W_{sheet}$, $T_{sheet}$, $T_{sus}$	Contact size, contact position	Tr. width, length, $NF$	$W_{sheet}$, P/N spacing, $NF$
Objectives	$V_{th}$, SS, $I_{on}$, $I_{off}$	Delay, power, gain, $f_T$	Delay, power, yield	Power, access time	Delay, power
Timing balance	x	x	x	o	o
Design rule constraint	x	x	x	x	o

We propose the ANN-based model for standard cell performance prediction that achieves 99% accuracy with 7.56$\times$ fewer training samples than SPICE simulation-based approaches.
Applying the multi-objective Bayesian optimization (MOBO) for standard cell performance and timing optimization, we reduce the propagation delay by 23.2% in high performance designs, and the total power by 10.3% in low power designs.
The proposed method improves timing symmetry (i.e., balance between rise and fall delays) by more than 15%.

In Section II, the device and standard cell structures and parameters that will be used in this study are defined. In Section III, the process of generating the objective function using the trained data through the ANN objective function model is explained. In addition, the MOBO process is described to optimize total power and propagation delay. In Section IV, the performance of test circuits is evaluated by applying the optimized structures. Conclusions are given in Section V.

II. DEVICE AND STANDARD CELL STRUCTURES

1. Device Definition and Simulation Conditions

The NSFET was implemented using Synopsys Sentaurus TCAD ^[20]. Fig. 1 shows the 3-D and cross-sectional views of a typical 3-sheet NSFET structure. The 3-sheet 3 nm NSFET structural parameters were determined based on ^[2, ^6, ^21- ^24]. The equivalent oxide thickness (EOT) reported in the international roadmap for devices and systems (IRDS) was adopted, and the gate height was modeled independently from the source/drain regions to accurately reflect actual device structures. In addition, the channel doping concentrations are 10¹⁶ cm^-3 for N-type and 10¹⁵ cm^-3 for P-type, while the source/drain doping concentrations are 5$\times$10²⁰ cm^-3 for N-type and 10²¹ cm^-3 for P-type. The structural parameters that are used in this study are shown in Table 2. As shown in Fig. 2, calibration was performed on the I-V measurement data of the IBM 3 nm NSFET ^[2] to verify the physical models applied in TCAD. The error was achieved within 1%.

Fig. 1. NSFET device structure: (a) 3D view, (b) Y-axis cross-section, and (c) X-axis cross-section.

Table 2. Device structural parameters of 3 nm NSFET.

Geometrical parameters	Value [nm]
Contacted gate pitch ($CGP$)	44
Oxide thickness ($T_{ox}$)	1.5
Source/Drain (S/D) length ($L_{sd}$)	8.5
Gate length ($L_g$)	16
Spacer length ($L_{sp}$)	4
Sheet width ($W_{sheet}$)	25
Sheet thickness ($T_{sheet}$)	8

Fig. 2. Calibrated I-V curves with measurement data ^[2] in (a) linear scale and (b) log scale.

Using the calibrated TCAD dataset, the BSIM-CMG model parameters were extracted, and the analytical parasitic capacitance models including contact parasitics ^[9] were incorporated into BSIM-CMG to construct the NSFET SPICE model. While the core I-V models are preserved, geometry-dependent parasitic components are explicitly modeled, enabling the proposed framework to achieve high accuracy in circuit-level performance prediction with respect to layout-related parameters such as $W_{sheet}$ and contact configurations.

2. Standard Cell Definition and Optimization Parameters

The standard cells were implemented using Synopsys Custom Compiler ^[27]. All layers used in the 3 nm NSFET layout were referenced from FreePDK3 ^[28]. Based on FreePDK3, the interconnect technology file (ITF) and a subset of design rule parameters were modified to reflect our 3 nm NSFET and to satisfy the 5-track cell height. In this study, the standard cell height is defined as the product of the metal pitch and the number of tracks, and the power and ground rails adopt the buried power rail (BPR) scheme provided by FreePDK3. The design rules are as follows: The BPR is connected to M0A (MOL) through a VBPR (via), and to M0B (BEOL) through a V0A (via). Additionally, M0B is connected to M1 (BEOL) through a V0B (via). The standard cell is composed of sub-metal layers routed in both horizontal and vertical directions. A design is carried out to meet these design requirements and to match the 5-track cell height ^[29- ^31]. The INV layout using the 3 nm NSFET is shown in Fig. 3 and the design parameters are shown in Table 3. Within the cell height constrained by the number of metal tracks, design parameters that significantly affect parasitic component variations and their ranges were selected. The width and height of interconnects, vias, and contacts, which can affect the performance of standard cells, are fixed to minimum values for each process node and are difficult to modify. Therefore, designers should select design parameters that can be adjusted and have a significant impact on the parasitic components and key performance metrics of the standard cell. The selected design parameters are sheet width ($W_{sheet}$), P/N spacing, and the number of gate fingers ($NF$). First, in the design rules of the developed standard cells, the minimum spacing between BPR and the sheet is 10 nm, and the minimum spacing between P-type and N-type transistors is 22 nm. Therefore, within this range, $W_{sheet}$ can be set to 20-50 nm, and P/N spacing can be set to 22-82 nm. Next, the driving strength is determined by the output load connected during circuit operation, and the output load is set according to fan-out ^[14]. Fan-out 1-4 (FO 1-4) is a value that reflects the typical load condition in general designs, and represents a reasonable and realistic load that is neither too small nor too large. Moreover, it maintains a similar trend even when technology nodes change, making it a suitable key metric for performance predictions for technology scaling and high performance circuit designs. Therefore, the $NF$, which determines the driving strength depending on the output load of the NSFET, is set to 1-4 ^[32].

Fig. 3. INV layout using 3 nm node NSFETs ^[28].

Table 3. Description of key layout design parameters used for standard cell construction.

Design parameters	Value [nm]
Cell height	173.5
BPR width	31.5
M0A width	15
V0A area	13 $\times$ 13
M0B width	12
M0B spacing	12
V0B area	10 $\times$ 14
M1 width	14
M1 spacing	14

To apply the proposed standard cell optimization method, three basic types of cells (i.e., INV, NAND, and NOR) were selected as representatives ^[33]. For standard cells with more than two connected P- or N-type transistors, an increase in channel width leads to a corresponding increase in the total effective capacitance. As shown in Fig. 4(a), the NAND consists of N-type transistors connected in series. In this configuration, the drain of the first NMOS and the source of the second NMOS form an internal node. Due to the overlapping diffusion and junction capacitances of the two NMOS transistors at this node, the fall delay increases during switching operations ^[34]. While the on-current increases with the channel width, it has been observed that when $W_{sheet}$ exceeds 40 nm, the increase in total effective capacitance outweighs the delay reduction effect. Therefore, the $W_{sheet}$ range of NAND is between 20-40 nm. In contrast, as shown in Fig. 4(b), the NOR cell has P-type transistors connected in series, leading to similar issues to NAND. However, since the diffusion capacitance of P-type transistors is larger than that of N-type transistors ^[6], the delay reduction effect in NOR is more significantly offset than that in NAND ^[34]. Therefore, the $W_{sheet}$ range of NOR is smaller than NAND, which is between 20-35 nm. The range of design parameters for each standard cell is shown in Table 4.

Fig. 4. Schematic diagrams of (a) NAND and (b) NOR.

Table 4. Variation ranges of standard cells for 3 nm NSFET parameters.

	$W_{sheet}$ [nm]	P/N spacing [nm]	$NF$
Value ranges for INV	20-50	22-82	1-4
Value ranges for NAND	20-40	22-82	1-4
Value ranges for NOR	20-35	22-82	1-4

III. PROPOSED ML-BASED OPTIMIZATION METHOD

1. ANN Model for Objective Function

To generate the datasets, the Sobol sampling method was used ^[35]. This method is known to efficiently explore the design space with a minimal number of samples, and it helps in obtaining datasets with various parameters. The performance of the standard cell was evaluated using 135, 256, 512, and 1024 samples extracted through Sobol sampling. After performing parasitic extraction (PEX) using Synopsys StarRC, post-layout simulations were conducted using Synopsys HSPICE to extract rise delay ($t_{pLH}$), fall delay ($t_{pHL}$), propagation delay ($t_p$), and total power ($P$) ^[36, ^37]. Here, total power is expressed as the sum of static power and dynamic power. The ANN model was trained using 135 samples generated by Sobol sampling. After training, the ANN model created a multi-objective function, and the relationship between input $X$ and output $Y$ was expressed using the transpose matrix of the weights ($W$) and the biases ($b$) as follows:

(1)

$Y_k = W^T X + b, \ (k = t_{pHL}, t_{pLH}, t_p, P).$

The ANN objective function model consists of one input layer, two hidden layers with 20 hidden neurons each, and one output layer, as shown in Fig. 5. The number of neurons in each hidden layer was set to 20 because this configuration achieves approximately 99% test accuracy while maintaining a modest model size, as shown in Fig. 6. The input layer is composed of $W_{sheet}$ , P/N spacing, and $NF$, which represent the standard cell structure parameters, while the output layer consists of the performance metrics of the standard cell to be optimized, namely $t_{pHL}$, $t_{pLH}$, $t_p$, and $P$. When determining the size of the ANN model, the amount of training data must be considered. This is because as the number of training data increases, the model size also increases, which in turn raises computational costs and the risk of overfitting. During the data preprocessing stage, the input was scaled using the Min-Max scaler, and the output was logarithmically scaled for learning efficiency. Hyperbolic tangent was used as the activation function and the Adam optimizer was used. In addition, the mean squared error loss function was used to evaluate the model accuracy, and early stopping was implemented to prevent overfitting due to larger epochs.

Fig. 5. Architecture of the proposed ANN-based objective function model.

Fig. 6. Test error according to the ANN model size.

To evaluate the test accuracy, we gradually increased the training data from 8 to 135 samples using the Sobol sampling method. The results are shown in Fig. 7(a), where the accuracy reached 99% when 135 samples were used. The training and test datasets consisted of 135 Sobol sampling data points and 30 random samples, respectively, and the simulation ran for 30,000 iterations ^[38]. Fig. 7(b) shows the average loss incurred when the trained ANN model makes predictions on a validation dataset that was not used in the training process. This is used to evaluate whether the model is overfitting the training data. It can be observed that as the number of iterations increases, the validation loss decreases, indicating that the ANN model used in this study achieves higher accuracy on the validation dataset. To accelerate the training, the NVIDIA Titan XP GPU was used, and it was confirmed that the ANN model completed training in about 2 minutes. Fig. 8 compares the accuracy between the trained ANN data and the data obtained through simulation, achieving an accuracy of over 98.5%. In this study, standard cells that satisfy a rise and fall delay ratio of 1:1 were designed with the objective of minimizing the propagation delay (high performance) and total power (low power) based on the design goals. This objective can be expressed as follows.

Fig. 7. (a) Test accuracy according to the number of training data with Sobol sampling and (b) validation loss according to epoch.

Fig. 8. Accuracy of the ANN-based objective function model for NAND cells: (a) propagation delay and (b) total power, compared against SPICE simulation.

(2)

$\max \left( -Y_{t_p} \right) \ \& \ \max(-Y_P),$ $(t_p : \text{Propagation delay}, \ P : \text{Total power}).$

2. MOBO Model for Optimization

We optimized the standard cell performance using the MOBO model, which efficiently balances multiple performance metrics with high prediction accuracy using a small number of samples. By leveraging a Gaussian process-based probabilistic model and an acquisition function, MOBO selects informative accelerate convergence. It effectively predicts the Pareto frontier and iteratively refines designs.

The MOBO model uses a Gaussian process regression (GPR) model ^[38]. The GPR model represents the relationship between the standard cell structure parameters and performance, considering the uncertainty of the objective function expressed by the variance of the new predicted data. Furthermore, it models the uncertainty between the initial training dataset and the currently explored dataset, and after selecting new samples, it identifies the standard cell structure that maximizes the hypervolume using the acquisition function. Finally, the constraint function is applied to filter out the optimal standard cell structures that do not satisfy the 1:1 rise and fall delay ratio. The algorithm uses the parallel noise-expected hypervolume improvement (qNEHVI) acquisition function of the Botorch package for parallel processing of MOBO to determine the next candidate ^[39]. The output corresponding to the explored input points is interpreted by the ANN objective function model. As a result, the input-output design space is updated by the GPR model. This process is repeated for the defined 80 optimization iterations. The function selects the next data point whose hypervolume is significantly improved compared to the reference point in the MOBO process. Here, the hypervolume is used as a performance metric for Bayesian optimization, which allows for a wider exploration of the objective function space ^[38].

IV. RESULTS AND DISCUSSIONS

1. Standard Cell Results

We compared the performance of standard cells optimized using the Sobol sampling method with that optimized using the MOBO model. Conventional performance optimization methods, based on different design rules and layer structures, are excluded from comparison ^[14, ^40]. Fig. 9 shows the hypervolume for the acquisition function, indicating that the optimization process converges sufficiently after 80 iterations. Fig. 10 shows the change in the optimal values explored within the design space as the number of simulation samples increases. A larger number of samples is needed to explore the optimal values. However, as the number of samples increases beyond 1024, the optimal point almost converges. Therefore, we set the optimal number of simulation samples to 1024 using the Sobol sampling method. The constraint was set to ensure a rise and fall delay ratio of 1:1. Data that did not satisfy the constraint were excluded from the candidate group for the MOBO process iterations.

Fig. 9. Hypervolume evolution of the MOBO acquisition function for (a) high performance and (b) low power applications.

Fig. 10. Optimal values according to the number of simulation samples: (a) Propagation delay and (b) total power.

Fig. 11 shows the distribution of optimal points satisfying the constraints for each design objective of NAND obtained using Sobol sampling-based simulation and the MOBO model. While the simulation-based approach required 1024 samples to reach optimal solutions, the MOBO model achieved better results with only 135 samples. As a result, MOBO identified 6, 4, 5, 2, 4, and 2 additional valid design points within the optimal space for INV HP/LP, NAND HP/LP, and NOR HP/LP, respectively. Tables 5 and 6 show the optimal structural parameters, performance, and required sample numbers. First, the structures optimized by the proposed MOBO framework generally exhibit smaller $W_{sheet}$ and appropriately adjusted P/N spacing compared with the simulation-based optimal structural parameters, while the required drive strength tends to be compensated by adjusting $NF$. For example, in the HP INV case, the PMOS and NMOS $W_{sheet}$ values are reduced from 30/30 nm to 20/20 nm, and the P/N spacing is increased from 41 nm to 76 nm, whereas the NMOS $NF$ is increased from 3 to 4. This configuration reduces the gate, diffusion, and junction capacitances at the P/N boundary by increasing the P/N spacing, while maintaining sufficient drive current through the increased $NF$, thereby reducing the propagation delay of the HP INV by approximately 23.2%. In the LP application, the proposed MOBO framework mainly selects structures that minimize power by shrinking the $W_{sheet}$ of transistors connected in series in NAND/NOR cells to suppress diffusion capacitance at internal nodes, and by adjusting the P/N spacing. As a result, it achieves up to 10.3% reduction in power for the LP NAND case compared with the simulation-based optimal structural parameters.

Fig. 11. Comparison of optimized NAND cell performance: (a) high performance mode and (b) low power applications.

Table 5. Comparison of optimal structural parameters obtained from simulation-based and MOBO-based optimization.

		Optimal structural parameters [nm]
		PMOS $W_{sheet}$		PMOS $NF$		NMOS $W_{sheet}$		NMOS $NF$		P/N spacing
		HP	LP	HP	LP	HP	LP	HP	LP	HP	LP
INV	Sim.	30	40	4	1	30	30	3	1	41	32
INV	MOBO	20	26	4	1	20	34	4	1	76	52
NAND	Sim.	30	30	4	2	20	30	3	2	62	62
NAND	MOBO	26	26	3	2	20	20	4	2	70	58
NOR	Sim.	20	25	4	3	25	25	2	1	72	62
NOR	MOBO	20	20.5	4	2	23.5	20.5	2	1	52	52

Table 6. Performance comparison of optimal structures obtained using simulation-based and MOBO-based optimization.

		$t_{pLH}$ [ps]		$t_{pHL}$ [ps]		$t_p$ [ps]		$P$ [nW]		Required samples
		HP	LP	HP	LP	HP	LP	HP	LP	Required samples
INV	Sim.	8.19	17.50	8.51	17.08	8.35	-	-	696.29	1024
INV	MOBO	6.45	18.35	6.35	18.26	6.41	-	-	625.49	135
NAND	Sim.	12.30	23.60	14.40	25.60	13.40	-	-	718.47	1024
NAND	MOBO	11.58	18.94	11.71	18.93	11.65	-	-	644.17	135
NOR	Sim.	11.10	14.10	13.20	16.20	12.20	-	-	684.56	1024
NOR	MOBO	10.07	17.13	10.08	17.14	10.08	-	-	622.11	135

We further performed a Spearman correlation analysis to investigate how the explored directions of the optimal structural parameters obtained from the proposed MOBO framework affect delay and power. The Spearman correlation is a metric that can quantify the importance of nonlinear relationships between input and output variables ^[41]. As shown in Table 7, the propagation delay of the three cells (INV, NAND, and NOR) exhibits a positive correlation with the PMOS and NMOS $W_{sheet}$ , whereas it shows a negative correlation with P/N spacing, PMOS $NF$, and NMOS $NF$. This quantitatively supports that an excessive increase in $W_{sheet}$ leads to an increased cell propagation delay, while widening the P/N spacing or appropriately adjusting $NF$ is more effective in reducing propagation delay. For power, the correlation coefficients between power and NMOS $NF$ are 0.67, 0.73, and 0.85 for INV, NAND cells, and NOR, respectively, indicating that an increase in NMOS $NF$ is a dominant factor that increases total power consumption.

Table 7. Spearman correlation coefficients between structural design parameters and performance metrics.

	INV		NAND		NOR
	$\rho(t_p)$	$\rho(P)$	$\rho(t_p)$	$\rho(P)$	$\rho(t_p)$	$\rho(P)$
PMOS $W_{sheet}$	0.66	0.38	0.59	0.46	0.34	-0.19
PMOS $NF$	-0.36	0.23	-0.36	0.22	-0.63	0.49
NMOS $W_{sheet}$	0.50	0.52	0.62	0.33	0.10	-0.04
NMOS $NF$	-0.27	0.67	-0.27	0.73	-0.64	0.85
P/N spacing	-0.50	-0.35	-0.36	-0.26	-0.06	-0.03

Fig. 12 shows the rise and fall delays of the final standard cells designed using optimized INV, NAND, and NOR. In the cases of NAND and NOR HP, the rise and fall delay ratios are improved by 15.8% and 18.7%, respectively. Since complex standard cells (i.e., XOR, AOI, OAI, DFF, etc.) consist of a combination of INV, NAND, and NOR, optimization can be well achieved. In addition, the rise and fall delay ratios of more complex standard cells are improved. Therefore, MOBO can optimize the rise and fall delay ratios of various standard cells to be much closer to 1:1 compared to simulation-based approaches. Fig. 13 compares the performance of various standard cells. MOBO showed significantly better performance optimization results than simulation-based optimization, resulting in improved performance optimization for a variety of standard cells. The simulation took 40 minutes for 80 iterations on a computer with an Intel Core i9-7900X CPU.

Fig. 12. Comparison of standard cell rise and fall delays.

Fig. 13. Comparison of standard cell performance across optimization methods: (a) propagation delay and (b) total power.

2. Test Circuit Results

To evaluate the optimized standard cells, a 7-stage ring oscillator (RO) and a 4-bit ripple carry adder (RCA) were used as test circuits ^[12]. Total power (LP) and propagation delay and frequency (HP) were measured for the RO, while total power (LP) and propagation delay (HP) were measured for the RCA. The frequency means the oscillation frequency measured during operation ^[42]. The SPICE model was adopted from previous research ^[9, ^25, ^26, ^43, ^44]. As shown in Table 8, the RO achieved a rise-fall delay ratio close to 1:1, which is about 12% better than the simulation-based design. In addition, for HP, the propagation delay was reduced by 22.9% and the frequency was improved by 29.9%. For LP, the total power was reduced by 19.2%. The performance of the 4-bit RCA designed using various standard cells optimized by simulation-based approach and the MOBO model was also evaluated. For HP, the propagation delay was reduced by 18.6% and for LP, the total power was reduced by 11.9%. Based on these results, we can conclude that not only can timing be symmetrically improved in complex circuits but also overall performance can be significantly improved. Furthermore, performance and power consumption improvements are expected at larger complex circuit levels.

Table 8. Performance comparison of 7-stage RO and 4-bit RCA.

		HP		LP		Diff. [%]
		Sim.	MOBO	Sim.	MOBO	Diff. [%]
7-stage RO	$t_{PLH}$ [ps]	11.06	8.08	22.83	24.10	-
	$t_{pHL}$ [ps]	9.78	7.97	20.35	23.49	-
	$P$ [nW]	-	-	42.76	34.56	19.2
	$t_p$ [ps]	10.42	8.03	-	-	22.9
	Frequency [GHz]	6.85	8.9	-	-	29.9
4-bit RCA	$P$ [nW]	-	-	10.26	9.03	11.9
4-bit RCA	$t_p$ [ps]	18.92	15.41	-	-	18.6

V. CONCLUSIONS

In this work, we presented a ML-driven framework for standard cell performance optimization that integrates Sobol sampling, an ANN-based objective function model, and MOBO engine. Key Structural parameters that critically affect propagation delay and total power were selected. Under a constraint function based on the fixed cell height and technology design rules, the ANN objective function was defined to jointly minimize propagation delay and total power.

The performance optimization was performed within the MOBO framework, using an objective function and a constraint function with a sizing method to maintain rise and fall delay balance. Both propagation delay and power consumption were reduced while using 7.56$\times$ fewer samples than simulation-based approaches. Validation on various standard cells and on representative benchmark circuits (7-stage RO and 4-bit RCA) demonstrated up to a 22% reduction in propagation delay.

Beyond performance optimization, the framework is readily extensible to statistical variability analysis. By redefining the ANN objective to model variation-related metrics (e.g., threshold-voltage mismatch, layout dependent effects (LDE), or MOL/BEOL RC variability), the same MOBO framework can efficiently explore statistical tail regions of device and circuit variability distributions. This capability provides a promising alternative to Monte Carlo simulations, enabling accelerated statistical characterization with significantly fewer samples.

Future work will extend this framework to more complex standard cells and larger logic structures, such as AES blocks and high-fanout combinational paths. In addition, integrating the methodology with automated cell layout generation and explicitly modeling MOL/BEOL parasitics will further enhance its applicability to full-chip timing and power optimization in advanced semiconductor nodes.

ACKNOWLEDGMENTS

This work was supported in part by the National Research Foundation(NRF) of Korea grant funded by the Korean Government (MSIT) under Grant RS-2020-NR049544, RS-2025-16067451 and in part by Samsung Electronics Co., Ltd (IO250225-12099-01). The EDA tool was supported by the IC Design Education Center (IDEC), South Korea.

REFERENCES

Jang D. , Yakimets D. , Eneman G. , Schuddinck P. , Bardon M. G. , Raghavan P. , Speddot A. , Verkest D. , Mocuta A. , 2017, Device exploration of nanosheet transistors for sub-7-nm technology node, IEEE Transactions on Electron Devices, Vol. 64, No. 6, pp. 2707-2713

Loubet N. , Hook T. , Montanini P. , Yeung C.-W. , Kanakasabapathy S. , Guillom M. , 2017, Stacked nanosheet gate-all-around transistor to enable scaling beyond FinFET, Proc. of 2017 Symposium on VLSI Technology, pp. T230-T231

Kim S. , Guillorn M. , Lauer I. , Oldiges P. , Hook T. , Na M. , 2015, Performance trade-offs in FinFET and gate-all-around device architectures for 7-nm node and beyond, Proc. of 2015 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference, pp. 1-3

Seon Y. , Chang J. , Yoo C. , Jeon J. , 2021, Device and circuit exploration of multi-nanosheet transistor for sub-3 nm technology node, Electronics, Vol. 10, No. 2

Wang M. , Sun Y. , Li X. , Shi Y. , Hu S. , Shang E. , Chen S. , 2020, Design technology co-optimization for 3 nm gate-all-around nanosheet FETs, Proc. of Symposium on VLSI Technology, pp. 230-231

Yoon J. , Jeong J. , Lee S. , Baek R. , 2018, Systematic DC/AC performance benchmarking of sub-7-nm node FinFETs and nanosheet FETs, IEEE Journal of the Electron Devices Society, Vol. 6, pp. 942-947

Choi M. , Park J.-H. , Choi S. , Kwon K. , Lee Y. , Jang W. , Jeon J. , 2023, Study on the circuit performance of various interconnect metal materials in the latest process nodes, Journal of Semiconductor Technology and Science, Vol. 23, No. 4, pp. 215-227

Loubet N. , Kal S. , Alix C. , Pancharatnam S. , Zhou H. , Durfee C. , 2019, A novel dry selective etch of SiGe for the enablement of high performance logic stacked gate-all-around nanosheet devices, Proc. of 2019 IEEE International Electron Devices Meeting, pp. 11.4.1-11.4.4

Suk J. , Kim Y. , Do J. , Kim G. , Baek S. , Kye J. , Kim S. , 2023, Analytical parasitic resistance and capacitance models for nanosheet field-effect transistors, IEEE Transactions on Electron Devices, Vol. 70, No. 6, pp. 2941-2946

Suk J. , Kim Y. , Do J. , Kim G. , Rim W. , Baek S. , Yoon S. , Kim S. , 2024, A process-aware analytical gate resistance model for nanosheet field effect transistors, IEEE Journal of the Electron Devices Society, Vol. 12, pp. 898-904

Gupta P. , Mandadapu H. , Gourishetty S. , Abbas Z. , 2019, Robust transistor sizing for improved performances in digital circuits using optimization algorithms, Proc. of 20th International Symposium on Quality Electronic Design, pp. 85-91

Chen Y. , Jiao H. , 2019, Standard cell optimization for ultra-low-voltage digital circuits, Proc. of 2019 International Conference on IC Design and Technology, pp. 1-4

Nishizawa S. , Ishihara T. , Onodera H. , 2012, A flexible structure of standard cell and its optimization method for near-threshold voltage operation, Proc. of 2012 IEEE 30th International Conference on Computer Design, pp. 235-240

Song T. , Jung H. , Yang G. , 2022, 3 nm gate-all-around (GAA) design technology co-optimization (DTCO) for succeeding PPA by technology, Proc. of IEEE Custom Integrated Circuits Conference, pp. 1-7

Yakimets D. , Bhuwalka K. K. , Wu H. , Rzepa G. , Karner M. , Liu C. , 2024, Inflection points in GAA NS-FET to C-FET scaling considering impact of DTCO boosters, IEEE Transactions on Electron Devices, Vol. 71, No. 4, pp. 2309-2314

Xu H. , Gan W. , Cao L. , Yin H. , Wu Z. , 2022, Prediction of key metrics of stacked nanosheet nFETs using genetic algorithm-based neural networks, Proc. of 2022 IEEE International Conference on Integrated Circuits, Technologies and Applications, pp. 3-4

Ehteshamuddin M. , Sheelvardhan K. , Kumar A. , Guglani S. , Roy S. , Dasgupta A. , 2024, Machine learning-assisted multiobjective optimization of advanced node gate-all-around transistor for logic and RF applications, IEEE Transactions on Electron Devices, Vol. 71, No. 2, pp. 976-982

Kwon U. , Okagaki T. , Song Y.-S. , 2018, Intelligent DTCO (iDTCO) for next generation logic path-finding, Proc. of 2018 International Conference on Simulation of Semiconductor Processes and Devices, pp. 49-52

Lee J. , Park J. , Kim S. , Jeong H. , 2023, Bayesian learning automated SRAM circuit design for power and performance optimization, IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 70, No. 12, pp. 4949-4961

2019, Sentaurus Device User Guide, Version P-2019.03

2021, International Roadmap for Devices and Systems (IRDS) 2021 Edition

Yoon J. , Baek R. , 2020, Device design guideline of 5-nm-node FinFETs and nanosheet FETs for analog/RF applications, IEEE Access, Vol. 8, pp. 189395-189403

Sun Y. , Thompson S. E. , Nishida T. , 2007, Physics of strain effects in semiconductors and metal-oxide-semiconductor field-effect transistors, Journal of Applied Physics, Vol. 101, No. 10

Reboh S. , Coquand R. , Augendre E. , 2016, An analysis of stress evolution in stacked GAA transistors, Proc. of IEEE Silicon Nanoelectronics Workshop, pp. 206-207

Woo S. , Jeong H. , Choi J. , Cho H. , Kong J.-T. , Kim S. , 2022, Machine learning-based compact modeling for sub-3-nm-node emerging transistor, Electronics, Vol. 11

Jeong H. , Woo S. , Choi J. , Cho H. , Kim Y. , Kong J.-T. , Kim S. , 2023, Fast and expandable ANN-based compact model and parameter extraction for emerging transistors, IEEE Journal of the Electron Devices Society, Vol. 11, pp. 153-160

2020, Synopsys Custom Compiler User Guide, Version P-2020.12

Sushant S. , FreePDK3: A Novel PDK for Physical Verification at the 3nm Node, Ph.D. dissertation

Kim T. , Jeong J. , Woo S. , 2023, NS3K: A 3-nm nanosheet FET standard cell library development and its impact, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 31, No. 2, pp. 163-176

Yang G. , Jung H. , Lim J. , 2022, Standard cell design optimization with advanced MOL technology in 3 nm GAA process, Proc. of IEEE Symposium on VLSI Technology and Circuits, pp. 363-364

Shaji S. M. , Zhu L. , Yoon J. , Lim S. K. , 2023, A comparative study on front-side, buried and backside power rail topologies in 3nm technology node, Proc. of IEEE/ACM International Symposium on Low Power Electronics and Design, pp. 1-6

Lee Y. M. , Na M. H. , Chu A. , 2017, Accurate performance evaluation for the horizontal nanosheet standard cell design space beyond 7nm technology, Proc. of IEEE International Electron Devices Meeting, pp. 29.3.1-29.3.4

Chang K. , Kim T. , 2022, Analysis of impacting multi-stack standard cells on chip implementation, Proc. of 19th International SoC Design Conference, pp. 119-120

Merino J. L. , Bota S. A. , Samitier J. , Analysis of series-connected MOSFETs for gate delay optimization, Proc. of International Workshop on Power and Timing Modeling, Optimization and Simulation, pp. 1-10

Sobol I. M. , 1967, On the distribution of points in a cube and the approximate evaluation of integrals, USSR Computational Mathematics and Mathematical Physics, Vol. 7, No. 4, pp. 86-112

2021, StarRC User Guide, Version E2021.06

2020, HSPICE User Guide, Version E2020.12

Jeong H. , Choi J. , Cho H. , Woo S. , Kim Y. , Kong J.-T. , Kim S. , 2024, MOBO-driven advanced sub-3-nm device optimization for enhanced PDP performance, IEEE Transactions on Electron Devices, Vol. 71, No. 5, pp. 2881-2887

Daulton S. , Balandat M. , Bakshy E. , 2021, Parallel Bayesian optimization of multiple noisy objectives with expected hypervolume improvement, Advances in Neural Information Processing Systems, Vol. 34, pp. 2187-2200

Jeong J. , Ko J. , Song T. , 2022, A study on optimizing pin accessibility of standard cells in the post-3 nm node, Proc. of Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 1-6

Yoon J.-S. , Lee S. , Yun H. , Baek R.-H. , 2021, Digital/analog performance optimization of vertical nanowire FETs using machine learning, IEEE Access, Vol. 9, pp. 29071-29077

Gaddemane G. , 2025, Exploring GAA-nanosheet, forksheet and GAA-forksheet architectures: A TCAD-DTCO study at 90 nm and 120-nm cell height, IEEE Journal of the Electron Devices Society, Vol. 13, pp. 769-782

Choi J. , Jeong H. , Woo S. , Cho H. , Kim Y. , Kong J.-T. , Kim S. , 2024, Enhancement and expansion of the neural network-based compact model using a binning method, IEEE Journal of the Electron Devices Society, Vol. 12, pp. 65-73

Jeong H. , Choi J. , Kim Y. , Kong J.-T. , Kim S. , 2024, Efficient neural network compact modeling for novel device structure using multi-fidelity model and active learning, Electronics, Vol. 13, No. 23, pp. 4840

HyunJoon Jeong

HyunJoon Jeong received his B.S. degree in electrical engineering from Myongji University in 2020. He is currently pursuing a Ph.D. degree at the Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea. His research interests include DTCO, device modeling, cell optimization, and machine learning.

Junha Suk

Junha Suk received his B.S. degree in electronic engineering from Korea National University of Transportation in 2017 and his Ph.D. degree in electrical and computer engineering from Sungkyunkwan University in 2025. In 2025, he joined the Samsung Electronics Foundry Business, Giheung, Korea, where he is involved in the SPICE Group, PDK Development Team. His research interests include gate-all-around transistor modeling, device-circuit interaction, and PDK.

Jeong-Taek Kong

Jeong-Taek Kong received his B.S. degree in electronics engineering from Hanyang University, Seoul, Republic of Korea in 1981, an M.S. degree in electronics engineering from Yonsei University, Seoul, Republic of Korea in 1983, and a Ph.D. degree in electrical engineering from Duke University, Durham, NC in 1994. From 1983 to 2014, he was with Semiconductor Business, Samsung Electronics Co., as VP of CAE Team, Senior VP of Intellectual Property Team, and Vice Chancellor of Samsung Institute of Technology. In 2014, he became a Professor with the Department of Electronic Engineering, Hanyang University, Seoul, Republic of Korea. He is currently a Professor with the Department of Semiconductor Systems Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Republic of Korea. His research interests include various EDA tools and design methodologies.

SoYoung Kim

SoYoung Kim received her B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 1997 and her M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1999 and 2004, respectively. From 2004 to 2008, she was with Intel Corporation, Santa Clara, CA, where she worked on parasitic extraction and simulation of on-chip interconnects. From 2008 to 2009, she was with Cadence Design Systems, San Jose, CA, where she worked on developing IC power analysis tools. She is currently a Professor with the Department of Semiconductor Systems Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea. Her research interests include VLSI computer-aided design, signal integrity, power integrity, and electromagnetic interference in electronic systems.