Mobile QR Code QR CODE

Main Menu

The Journal of Semiconductor Technology and Science (JSTS) is an international, peer-reviewed, and open-access journal that is published bimonthly.
- Scope: semiconductor processes, devices, circuits, and MEMS.
- Editor-in-Chief: Prof. Woo Young Choi (ECE, Seoul National University)
- Indexed within Science Citation Index Expanded (SCIE), SCOPUS, Korea Citation Index (KCI), and other databases.

Journal Search

[

Research article

]

JSTS(Journal of Semiconductor Technology and Science)

IEIE Vol. 25, No. 05, p.469-475

ISSN (print) :

1598-1657

ISSN (online) :

2233-4866

Received : 4 Jun. 2025Revised : 13 Jul. 2025Accepted : 24 Jul. 2025

DOI :

https://doi.org/10.5573/JSTS.2025.25.5.469

Sample-Efficient Reinforcement Learning for Analog Circuit Optimization with Intrinsic Reward

OhSuMin¹ KimHyunJin¹

(Department of Electronics and Electrical Engineering, Dankook University, 152, Jukjeon-ro, Suji-gu, Yongin-si, Republic of Korea)

^* E-mail: hyunjin2.kim@gmail.com This manuscript has been prepared with an extension of the contents presented at the 32nd Korean Conference on Semiconductors (KCS 2025) for the special issue of JSTS.

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Analog circuit optimization remains a challenge due to its high-dimensional design space and the prohibitive cost of simulations. To improve sample efficiency, we propose a reinforcement learning (RL) framework that uses intrinsic rewards, enabling agents to efficiently explore novel circuit designs. Furthermore, by leveraging a conditional variational autoencoder (CVAE) for reconstruction-based intrinsic reward, our approach enhances exploration and accelerates convergence in circuit optimization. Experimental results on practical circuits demonstrate significant performance improvements over counterparts without using a reconstruction-based intrinsic reward.

Index Terms

Analog circuit optimization, conditional variational autoencoder, intrinsic reward

I. INTRODUCTION

Unlike digital circuits, analog circuit optimization still relies heavily on manual tuning and domain knowledge, highlighting the need for automation in circuit design. Among traditional approaches, knowledge-based methods ^[1,^2] have shown effectiveness for predefined circuit topologies but suffer from limited scalability and high computational costs. To reduce manual effort and improve efficiency, optimization-based methods using Bayesian optimization (BO), particle swarm optimization, and genetic algorithms have been studied. However, the searching complexity of these methods increases exponentially with the number of samples, limiting their applicability to large-scale designs. To overcome these limitations, RLbased methods have been proposed as promising solutions. For instance, a study in ^[3] introduced a framework in which the RL agent learns to design circuits. A work in ^[4] incorporated domain knowledge during training of the RL agent to guide exploration, while a study in ^[5] leveraged a multi-actor RL algorithm to improve convergence. In addition, works in ^[6-^8] adopted graph neural networks (GNNs) to further improve optimization performance across circuit topologies. Although these methods significantly improved optimization performance, they remain sample inefficient, as they require thousands of SPICE simulations to achieve convergence. In this context, sample efficiency refers to the number of simulations required for the RL agent to reach a target specification. To reduce the computational overhead while maintaining performance, improving the sample efficiency of RL-based methods is essential.

In this work, we propose a novel framework that incorporates intrinsic rewards into the RL-based analog circuit optimization method. Unlike conventional approaches that rely solely on extrinsic rewards derived from circuit performance, the proposed method utilizes intrinsic rewards to encourage the RL agent to explore novel regions of the circuit parameter space. Enhanced exploration contributes to better sample efficiency, enabling the agent to learn with fewer simulations. Furthermore, we develop a reconstruction-based intrinsic reward using CVAE, in which the intrinsic reward is formulated by the reconstruction error. Experimental results on a two-stage operational amplifier (OpAmp), a three-stage OpAmp, and a foldedcascode OpAmp show that the proposed method significantly improves sampling efficiency. The key contributions of this work can be summarized as follows:

• Our work is the first approach using intrinsic reward in the circuit optimization, improving sample efficiency by encouraging broader exploration beyond conventional performance-driven objectives.

• In addition, we propose a reconstruction-based intrinsic reward method, which effectively quantifies state novelty in high-dimensional circuit design spaces.

• The proposed framework can be easily integrated into existing RL algorithm with negligible overhead.

• Extensive experiments on practical circuits show that our approach significantly reduces the number of SPICE simulations.

II. RELATED WORKS

Efficient exploration is a key component in the training of RL agents. Traditional methods such as $\epsilon$-greedy and Boltzmann exploration promote action diversity. However, they tend to be inefficient in real-world tasks such as robot manipulation tasks and analog circuit design. To overcome these limitations, intrinsic reward methods have been introduced ^[9]. Count-based methods ^[10,^11] assign higher rewards to rarely visited states, while predictionbased methods ^[12-^14] provide intrinsic rewards based on the error between the predicted and actual next states. The latter has been widely adopted in continuous control tasks for its effectiveness in guiding exploration toward uncertain or less predictable regions. In analog circuit optimization, where the design space is continuous and highly complex, prediction-based intrinsic rewards help the agent discover diverse operating points with fewer simulations, making sample-efficient learning. CVAE ^[15] was introduced as an effective approach for learning from unlabeled data. By encoding input variables into a conditioned latent space, CVAE enables efficient policy modeling for RL agents. This property has led to widespread use of CVAE in addressing the out-of-distribution (OOD) problem in offline RL. Studies in ^[16,^17] adopted a CVAE to model the behavior policies based on pre-collected trajectories. The policy was trained within the latent space, while the decoder translated latent variables into actions. Because the latent space was trained to align with the dataset distribution, the OOD was significantly reduced. While the above studies focused on adopting CVAE for addressing the OOD, our work adopts CVAE to compute intrinsic rewards for circuit design.

III. PRELIMINARIES

Analog circuit optimization generally involves finding an optimal set of design parameters that minimize a given objective while satisfying multiple design constraints. To evaluate the degree of circuit optimization, a figure of merit (FOM) is commonly employed. The FOM aggregates performance metrics such as gain, bandwidth, and power consumption into a scalar. Then, the parameter optimization can be formulated with the design parameter $x$ and the design space ${\mathbb D}^d$ as $\hat{\mathbf x}= \text{arg min}_{x\in{\mathbb D}^d} {\rm FOM}(\mathbf x)$.

On the other hand, analog circuit optimization can be formulated as a continuous control task under the RL framework, modeled as a Markov decision process (MDP) $M=\{\pmb{S}$, $\pmb{A}$, $\pmb{P}$, $r$, $\gamma\}$. The state $\mathbf s_t \in \pmb{S}$ denotes a vector of design parameters at timestep $t$, and the action $\mathbf a_t \in \pmb{A}$ denotes an incremental change of $\mathbf s_t$. Besides, the transition function $\pmb P(\mathbf a_{t+1}\mid \mathbf s_t$, $\mathbf a_t)$ denotes the probability of transitioning to the next state given the current state and the action. The RL agent selects $\mathbf a_t$ and receives a reward $r_t \in \mathbb R$ computed from the FOM. The objective of the agent is to learn an optimal policy that maximizes the expected discounted return $R_t = \mathbb E \big[ \sum_{k=0}^T \gamma^k r_{t+k}\big]$, where $\gamma^k \in [0$, $1]$ denotes the discount factor. For sample-efficient RL, we incorporate an intrinsic reward $r^{\rm i}_t$, which encourages visiting novel or less frequently explored circuit regions. When the extrinsic reward is denoted as $r^{\rm e}_t$, the total reward used for training is defined as $r_t = r_t^{\rm e} + r_t^{\rm i}$.

IV. METHOD

1. Intrinsic Reward for Circuit Optimization

In our evaluation, conventional methods that rely solely on extrinsic rewards showed low sample efficiency and tended to converge to suboptimal local minima, resulting in low-quality circuit designs. To address this, we combine extrinsic rewards with an intrinsic reward that encourages the agent to explore less-visited regions of the circuit design space rather than familiar regions. Intrinsic reward mechanisms can be categorized into count-based and prediction-based methods. Since analog circuits have a large and continuous state space, count-based methods for estimating state visitation counts are impractical. Therefore, we propose employing prediction-based methods that enable more effective exploration in analog circuits. The prediction-based intrinsic reward is formulated as ${r}^{\mathrm{i}}_{t} = \|{\mathbf{s}}_{t+1}-{f}(\mathbf{s}_{t})\|_{2}$, where $f$ represents the prediction network, which is used to predict the next circuit state ${\mathbf{s}}_{t+1}$.

Algorithm 1 summarizes the proposed circuit optimization framework using prediction-based intrinsic reward. The policy network $\pi_\theta$, the prediction network $f$, and the initial circuit parameters ${\mathbf{s}}_0$ are first initialized. For each episode, the agent interacts with the environment and collects the trajectory (lines 1-3). Given the current circuit state ${\mathbf{s}}_{t}$, the agent samples an action ${\mathbf{a}}_{t} \sim \pi_\theta ({{\mathbf{a}}_{t}\mid \mathbf{s}}_{t})$ (line 4). The next circuit state ${\mathbf{s}}_{t+1} = {\mathbf{s}}_{t} + {\mathbf{a}}_{t}$ is obtained by applying the action to the circuit state (line 5). Here, ${\mathbf{a}}_{t}$ corresponds to a vector ${\delta \mathbf{s}}_{t}$, which is added to ${\mathbf{s}}_{t}$ to generate ${\mathbf{s}}_{t+1}$. Then, a SPICE simulation is conducted using ${\mathbf{s}}_{t+1}$, and an extrinsic reward ${r}^{\mathrm{e}}_{t}$ is computed based on FOM (lines 6-7). The prediction network f predicts the next circuit state by ${\widehat{\mathbf{s}}}_{t+1} \leftarrow f ({\mathbf{s}}_{t})$, and the intrinsic reward ${r}^{\mathrm{i}}_{t}$ is computed using the $L_{2}$ norm of the prediction error (lines 8-9). Therefore, the total reward ${r}_{t}$ is computed as the summation of ${r}^{\mathrm{e}}_{t}$ and ${r}^{\mathrm{i}}_{t}$ (line 10). Finally, $\pi_\theta$ and $f$ are jointly updated using the collected trajectory in the buffer ${D}$ (line 14).

By assigning larger intrinsic rewards to novel regions of circuit design spaces, we hypothesize that this approach improves sample efficiency and helps the agent discover high-performance designs that might otherwise be overlooked. However, advances in technology and complex device characteristics increase the sensitivity of analog circuit behavior, making accurate modeling and optimization more challenging. Thus, RL agents using predictionbased intrinsic reward can be trapped in design parameters that are highly sensitive to small variations, and minor changes can lead to abrupt performance shifts.

2. Enhancing RL Agent with ReconstructionBased Intrinsic Reward

To overcome the limitation of prediction-based intrinsic reward, we propose a {reconstruction-based intrinsic reward framework}, as shown in Fig. 1. The circuit reconstruction network is implemented using a CVAE, which enables structured encoding of circuit states conditioned on actions. Compared with the next circuit state prediction in Algorithm 1, the proposed approach evaluates how well a circuit state can be reconstructed from past experiences with the intrinsic reward defined as ${r}^{\mathrm{i}}_{t}=\|\mathbf{s}_{t+1}-g(f(\mathbf{s}_{t+1}, \mathbf{a}_{t+1}))_2$. Here, $f$ and $g$ denote the encoder and decoder of the proposed circuit reconstruction network, respectively. Thus, ${g}(f(\mathbf{s}_{t+1}, \mathbf{a}_{t+1}))$ represents the reconstructed circuit state ${\hat{\mathbf{s}}}_{t+1}$, and a higher reconstruction error indicates a novel circuit state. The overall procedure of the circuit optimization follows Algorithm 1, except for how intrinsic reward is defined. The policy network $\pi_\theta$, the encoder network ${f }$, the decoder network ${g}$, and the initial circuit parameters ${\mathbf{s}}_0$ are first initialized. A latent representation $\mathbf{z}_{t+1}$ is obtained from $f(\mathbf{s}_{t+1}, \mathbf{a}_{t+1})$ in Fig. 1. Then, $g$ reconstructs the next circuit state ${\widehat{\mathbf{s}}}_{t+1}$ using $\mathbf{z}_{t+1}$ and $\mathbf{a}_{t+1}$. Thus, the intrinsic reward ${r}^{\mathrm i}$ and the total reward $r_{t}$ are computed in the same manner. Finally, $\pi_\theta$, $f$, and $g$ are updated jointly using the trajectory in buffer $D$.

The underlying principle that exploration can be effectively guided by the state estimation error is similar to prediction-based methods. However, instead of relying on prediction errors, the proposed method utilizes reconstruction errors, allowing the RL agent to efficiently explore diverse designs while avoiding misleading configurations. Experimental results demonstrate that the proposed approach significantly improves sample efficiency and enables the discovery of optimal circuit designs.

Fig. 1. Circuit optimization method using intrinsic reward.

V. EXPERIMENTS

1. Environment and Evaluation

We adopted proximal policy optimization (PPO) ^[18] as the base RL algorithm and Adam ^[19] as the optimizer. To evaluate the effectiveness of RL-based optimization, we evaluated a random policy that selects actions uniformly at random without any learning mechanism. For comparison, we used BO ^[20] and strong intrinsic reward mechanisms, including the intrinsic curiosity module (ICM) ^[13], which computes intrinsic rewards based on forward dynamics prediction errors, and random network distillation (RND) ^[14], which measures the prediction errors of a randomly initialized network as intrinsic rewards. Additionally, we adopted traditional exploration methods such as greedy and $\epsilon$-greedy algorithms. We also evaluated NovelD ^[21], which is a state-of-the-art countbased method. However, it failed to learn the policy due to the large and continuous state space of analog circuits, thereby being excluded from the results. All models used the same base RL algorithm and neural network architecture for both the policy and value functions. The only difference among them was how intrinsic rewards were defined. Besides, PPO has approximately 0.210 M parameters, while the CVAE in the proposed method has about 0.035 M parameters, accounting for only 14\% of the total model size. To demonstrate the effectiveness of the proposed method, we employed two-stage and three-stage OpAmps with the same structure as those used in ^[22]. Besides, a folded cascode OpAmp was included to evaluate the generalization capability across different circuit topologies. The structure of the folded cascode OpAmp follows the design used in ^[4]. The circuit simulation was performed using Ngspice, while a commercial 250 nm CMOS technology was used for circuit design. Ngspice is linked with the ngspyce Python interface ^[23], which can be used to alter parameters and gather simulation results. Besides, node voltage, resistance, capacitance, and width of the transistor were used as design parameters. We measured the DC gain, 3 dB bandwidth, and power consumption during training to compute the FOM. The target specification values for each circuit are summarized in Table 1.

Table 1. Specification targets for different OpAmp topologies.

OpAmp

DC Gain

Bandwidth

Power

Two-stage Three-stage

Folded-cascode

≥ 30 dB

≥ 60 dB

≥ 45 dB

≥ 10 MHz

≥ 8 MHz

≥ 15 MHz

≤ 5 mW

≤ 10 mW

2. Experimental Results and Analysis

Figs. 2 and 3 show the performance of the proposed method compared with different methods for the twostage and three-stage OpAmps, respectively(Note that the greedy and $\epsilon $-greedy algorithms were not visualized due to high variance affecting visibility). Compared to PPO without adopting intrinsic rewards, the power consumption, gain-bandwidth product (GBW), and FOM were significantly improved when agents used an intrinsic reward during optimization. Notably, compared to other intrinsic reward methods such as ICM and RND, the proposed method demonstrates more stable convergence and higher performance. Conventional approaches, which rely on prediction errors to design intrinsic rewards, are wellsuited for game environments where state transitions are relatively smooth and predictable. However, they are less effective for the circuit optimization, where high sensitivity and nonlinear state variations make prediction-based rewards unreliable, often leading to unstable exploration. In contrast, the proposed method leverages reconstruction errors to evaluate structural differences in circuit states, enabling more reliable exploration and faster convergence while maintaining high sample efficiency.

On the other hand, Tables 2 and 3 show the average of the FOM, which represents a combined measure of three performance metrics, providing a comprehensive evaluation of optimization effectiveness. To ensure a fair evaluation, all methods were compared using an equal number of samples collected within the same runtime. Terms ${S}(\alpha_{1})$ and ${S}(\alpha_2)$ denote the number of samples required to reach the target FOM values $\alpha_{1}$ and $\alpha_{2}$, respectively. To compare sample efficiency, we set $\alpha_{1}= -3$ and $\alpha_{2} = -1$ for the two-stage OpAmp. Besides, given the increased optimization difficulty, we set $\alpha_{1} = -10$ and $\alpha_{2} = -5$ for the three-stage and folded-cascode OpAmps. The results show that the proposed method consistently outperforms all baseline approaches in both optimization performance and sample efficiency. The lower values of ${S}(\alpha_{1})$ and ${S}(\alpha_{2})$ demonstrate that the proposed method reaches the desired performance levels with significantly fewer SPICE simulations compared to baselines. Notably, while ICM and RND show minimal improvements over standard PPO and BO, they fail to learn the optimal policy effectively due to the high-dimensional and continuous space of the analog circuit. In contrast, the proposed method allows the agent to explore the design space more efficiently, leading to more stable convergence and better optimization ability.

Fig. 2. Performance comparison on a two-stage OpAmp in terms of power consumption, GBW, and FOM.

Fig. 3. Performance comparison on a three-stage OpAmp in terms of power consumption, GBW, and FOM.

Table 2. Comparison of methods for two-stage and three-stage OpAmp optimization. Bold values indicate the best performance.

Stage	Method	# of Samples	Avg. FOM	S($α_1$)	S($α_2$)
Two	Random	7,464	-32.1	6,988	7,172
	BO	2,500	-7.8	1,427	2,101
	PPO	6,653	-21.9	5,153	6,002
	PPO (greedy)	9,148	-105	N/A	N/A
	PPO (ϵ-greedy)	7,715	-64.1	N/A	N/A
	PPO+ICM	2,463	-5.9	1,794	2,246
	PPO+RND	3,157	-7.1	2,464	2,947
	PPO+Proposed	2,500	-1.9	1,351	2,080
Three	Random	4,221	-199.5	N/A	N/A
	BO	4,500	-14.3	2,830	4,388
	PPO	19,905	-31.8	10,407	12,974
	PPO (greedy)	20,201	-141.3	N/A	N/A
	PPO (ϵ-greedy)	16,203	-129.4	N/A	N/A
	PPO+ICM	9,436	-12.8	8,671	8,900
	PPO+RND	4,095	-10.5	2,255	3,652
	PPO+Proposed	9,000	-5.2	1,814	4,115

Table 3. Comparison of methods for folded-cascode OpAmp optimization. Bold values indicate the best performance.

Method	# of Samples	Avg. FOM	S($α_1$)	S($α_2$)
Random	7,912	-42.7	7,912	9,188
BO	3,800	-10.8	2,342	3,721
PPO	8,716	-27.6	6,547	8,204
PPO (greedy)	10,910	-107.1	N/A	N/A
PPO (ϵ-greedy)	10,961	-85.3	N/A	N/A
PPO+ICM	4,901	-9.7	2,091	2,834
PPO+RND	5,162	-9.9	2,114	3,018
PPO+Proposed	4,680	-4.8	1,376	2,527

VI. CONCLUSION

In this work, we proposed an RL framework for analog circuit optimization using intrinsic reward mechanisms. By leveraging state reconstruction-based intrinsic reward, the proposed method improves sample efficiency and enables more structured exploration. This approach enables the RL agent to identify structurally diverse circuit designs without relying on state prediction models, which are often unstable or inaccurate in complex design spaces. Experimental results demonstrate that our method significantly enhances sample efficiency and optimization performance compared to standard PPO, BO, and other intrinsic reward baselines such as ICM and RND.

ACKNOWLEDGMENTS

This work was supported by the IITP (Institute of Information & Communications Technology Planning & Evaluation)-ICAN (ICT Challenge and Advanced Network of HRD) grant funded by the Korea government (Ministry of Science and ICT) (IITP-2024-RS2024-00437788), K-CHIPS (Korea Collaborative & Hightech Initiative for Prospective Semiconductor Research) (1415188224, RS-2023-00301703, 23045-15TC) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea), and the IC Design Education Center. Also, this research was results of a study on the "HPC Support" Project, supported by the `Ministry of Science and ICT' and NIPA.

References

N. Horta, ``Analogue and mixed-signal systems topologies exploration using symbolic methods,'' Analog Integrated Circuits and Signal Processing, vol. 31, pp. 161-176, 2002.

N. Jangkrajarng, S. Bhattacharya, R. Hartono, and C.-J. R. Shi, ``Iprail—intellectual property reuse-based analog ic layout automation,'' Integration, vol. 36, no. 4, pp. 237-262, 2003.

H. Wang, J. Yang, H.-S. Lee, and S. Han, ``Learning to design circuits,'' arXiv preprint arXiv:1812.02734, 2018.

N. K. Somayaji, H. Hu, and P. Li, ``Prioritized reinforcement learning for analog circuit optimization with design knowledge,'' Proc. of 58th ACM/IEEE Design Automation Conference (DAC), IEEE, pp. 1231-1236, 2021.

Y. Choi, S. Park, M. Choi, K. Lee, and S. Kang, ``Ma-opt: Reinforcement learning-based analog circuit optimization using multi-actors,'' IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 71, no. 5, pp. 2045-2056, 2024.

H. Wang, K. Wang, J. Yang, L. Shen, N. Sun, H.-S. Lee, and S. Han, ``GCN-RL circuit designer: Transferable transistor sizing with graph neural networks and reinforcement learning,'' Proc. of 2020 57th ACM/IEEE Design Automation Conference (DAC), IEEE, pp. 1-6, 2020.

W. Cao, J. Gao, T. Ma, R. Ma, M. Benosman, and X. Zhang, ``Rose-opt: Robust and efficient analog circuit parameter optimization with knowledge-infused reinforcement learning,'' IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, vol. 44, no. 2, pp. 627-640, 2025.

K. Yamamoto and N. Takai, ``GNN-OPT: Enhancing automated circuit design optimization with graph neural networks,'' IEICE Transactions on Fundamentals, vol. 108, no. 5, pp. 687-689, 2025.

J. Schmidhuber, ``Formal theory of creativity, fun, and intrinsic motivation (1990-2010),'' IEEE transactions on Autonomous Mental Development, vol. 2, no. 3, pp. 230-247, 2010

A. L. Strehl and M. L. Littman, ``An analysis of modelbased interval estimation for markov decision processes,'' Journal of Computer and System Sciences, vol. 74, no. 8, pp. 1309-1331, 2008.

M. G. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos, ``Unifying count-based exploration and intrinsic motivation,'' Proc. of the 30th International Conference on Neural Information Processing Systems, pp. 1479-1487, 2016.

Y. Burda, H. Edwards, D. Pathak, A. Storkey, T. Darrell, and A. A. Efros, ``Large-scale study of curiosity-driven learning,'' arXiv preprint arXiv:1808.04355, 2018.

D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, ``Curiosity-driven exploration by self-supervised prediction,'' Proc. of International Conference on Machine Learning, pp. 2778-2787, PMLR, 2017.

Y. Burda, H. Edwards, A. Storkey, and O. Klimov, ``Exploration by random network distillation,'' Proc. of Seventh International Conference on Learning Representations, pp. 1-17, 2019.

K. Sohn, X. Yan, and H. Lee, ``Learning structured output representation using deep conditional generative models,'' Proc. of the 29th International Conference on Neural Information Processing Systems, vol. 2, pp. 3483-3491, 2015.

W. Zhou, S. Bajracharya, and D. Held, ``PLAS: Latent action space for offline reinforcement learning,'' Proc. of Conference on Robot Learning, pp. 1719-1735, PMLR, 2021.

S. Rezaeifar, R. Dadashi, N. Vieillard, L. Hussenot, O. Bachem, O. Pietquin, and M. Geist, ``Offline reinforcement learning as anti-exploration,'' Proc. of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8106-8114, 2022.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, ``Proximal policy optimization algorithms,'' arXiv preprint arXiv:1707.06347, 2017.

D. P. Kingma, ``Adam: A method for stochastic optimization,'' arXiv preprint arXiv:1412.6980, 2014.

J. Mockus, ``The application of bayesian methods for seeking the extremum,'' Towards Global Optimization, vol. 2, 117, 1998.

T. Zhang, H. Xu, X. Wang, Y. Wu, K. Keutzer, J. E. Gonzalez, and Y. Tian, ``Noveld: A simple yet effective exploration criterion,'' Advances in Neural Information Processing Systems, vol. 34, pp. 25217-25230, 2021.

Y. Wang, M. Orshansky, and C. Caramanis, ``Enabling efficient analog synthesis by coupling sparse regression and polynomial optimization,'' Proc. of the 51st Annual Design Automation Conference, pp. 1-6, 2014.

Ignacio M. Villarreal, ``NGSPYCE: Python bindings for the Ngspice simulation engine,'' 2025. Accessed: 2025-02-10.

SuMin Oh

SuMin Oh received her bachelor's degree (2024) and master's (2025) degrees in electrical and electronics engineering from Dankook University, Republic of Korea. She is currently pursuing a master's degree in the same department at Dankook University. Her current research interests reside in the realm of artificial intelligence and reinforcement learning. She is currently with Com2uS, Seoul, Republic of Korea.

HyunJin Kim

HyunJin Kim received his Ph.D. degree in electrical and electronics engineering (2010), master's (1999), and bachelor's (1997) degrees in electrical engineering from Yonsei University, Republic of Korea. He worked as the Mixed-Signal VLSI Circuit Designer at Samsung Electromechanics (2002.02-2005.01). Besides, He is a Senior Engineer in the Field of Flash Memory Controller Project at the Memory Division of Samsung Electronics (2010.04-2011.08). His current research interests reside in the realm of the lightweight neural network implementation methodology, reinforcement learning, and vision language action model.