OhSuMin1
KimHyunJin1
-
(Department of Electronics and Electrical Engineering, Dankook University, 152, Jukjeon-ro,
Suji-gu, Yongin-si, Republic of Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Analog circuit optimization, conditional variational autoencoder, intrinsic reward
I. INTRODUCTION
Unlike digital circuits, analog circuit optimization still relies heavily on manual
tuning and domain knowledge, highlighting the need for automation in circuit design.
Among traditional approaches, knowledge-based methods [1,2] have shown effectiveness for predefined circuit topologies but suffer from limited
scalability and high computational costs. To reduce manual effort and improve efficiency,
optimization-based methods using Bayesian optimization (BO), particle swarm optimization,
and genetic algorithms have been studied. However, the searching complexity of these
methods increases exponentially with the number of samples, limiting their applicability
to large-scale designs. To overcome these limitations, RLbased methods have been proposed
as promising solutions. For instance, a study in [3] introduced a framework in which the RL agent learns to design circuits. A work in
[4] incorporated domain knowledge during training of the RL agent to guide exploration,
while a study in [5] leveraged a multi-actor RL algorithm to improve convergence. In addition, works in
[6-8] adopted graph neural networks (GNNs) to further improve optimization performance
across circuit topologies. Although these methods significantly improved optimization
performance, they remain sample inefficient, as they require thousands of SPICE simulations
to achieve convergence. In this context, sample efficiency refers to the number of
simulations required for the RL agent to reach a target specification. To reduce the
computational overhead while maintaining performance, improving the sample efficiency
of RL-based methods is essential.
In this work, we propose a novel framework that incorporates intrinsic rewards into
the RL-based analog circuit optimization method. Unlike conventional approaches that
rely solely on extrinsic rewards derived from circuit performance, the proposed method
utilizes intrinsic rewards to encourage the RL agent to explore novel regions of the
circuit parameter space. Enhanced exploration contributes to better sample efficiency,
enabling the agent to learn with fewer simulations. Furthermore, we develop a reconstruction-based
intrinsic reward using CVAE, in which the intrinsic reward is formulated by the reconstruction
error. Experimental results on a two-stage operational amplifier (OpAmp), a three-stage
OpAmp, and a foldedcascode OpAmp show that the proposed method significantly improves
sampling efficiency. The key contributions of this work can be summarized as follows:
• Our work is the first approach using intrinsic reward in the circuit optimization,
improving sample efficiency by encouraging broader exploration beyond conventional
performance-driven objectives.
• In addition, we propose a reconstruction-based intrinsic reward method, which effectively
quantifies state novelty in high-dimensional circuit design spaces.
• The proposed framework can be easily integrated into existing RL algorithm with
negligible overhead.
• Extensive experiments on practical circuits show that our approach significantly
reduces the number of SPICE simulations.
II. RELATED WORKS
Efficient exploration is a key component in the training of RL agents. Traditional
methods such as $\epsilon$-greedy and Boltzmann exploration promote action diversity.
However, they tend to be inefficient in real-world tasks such as robot manipulation
tasks and analog circuit design. To overcome these limitations, intrinsic reward methods
have been introduced [9]. Count-based methods [10,11] assign higher rewards to rarely visited states, while predictionbased methods [12-14] provide intrinsic rewards based on the error between the predicted and actual next
states. The latter has been widely adopted in continuous control tasks for its effectiveness
in guiding exploration toward uncertain or less predictable regions. In analog circuit
optimization, where the design space is continuous and highly complex, prediction-based
intrinsic rewards help the agent discover diverse operating points with fewer simulations,
making sample-efficient learning. CVAE [15] was introduced as an effective approach for learning from unlabeled data. By encoding
input variables into a conditioned latent space, CVAE enables efficient policy modeling
for RL agents. This property has led to widespread use of CVAE in addressing the out-of-distribution
(OOD) problem in offline RL. Studies in [16,17] adopted a CVAE to model the behavior policies based on pre-collected trajectories.
The policy was trained within the latent space, while the decoder translated latent
variables into actions. Because the latent space was trained to align with the dataset
distribution, the OOD was significantly reduced. While the above studies focused on
adopting CVAE for addressing the OOD, our work adopts CVAE to compute intrinsic rewards
for circuit design.
III. PRELIMINARIES
Analog circuit optimization generally involves finding an optimal set of design parameters
that minimize a given objective while satisfying multiple design constraints. To evaluate
the degree of circuit optimization, a figure of merit (FOM) is commonly employed.
The FOM aggregates performance metrics such as gain, bandwidth, and power consumption
into a scalar. Then, the parameter optimization can be formulated with the design
parameter $x$ and the design space ${\mathbb D}^d$ as $\hat{\mathbf x}= \text{arg
min}_{x\in{\mathbb D}^d} {\rm FOM}(\mathbf x)$.
On the other hand, analog circuit optimization can be formulated as a continuous control
task under the RL framework, modeled as a Markov decision process (MDP) $M=\{\pmb{S}$,
$\pmb{A}$, $\pmb{P}$, $r$, $\gamma\}$. The state $\mathbf s_t \in \pmb{S}$ denotes
a vector of design parameters at timestep $t$, and the action $\mathbf a_t \in \pmb{A}$
denotes an incremental change of $\mathbf s_t$. Besides, the transition function $\pmb
P(\mathbf a_{t+1}\mid \mathbf s_t$, $\mathbf a_t)$ denotes the probability of transitioning
to the next state given the current state and the action. The RL agent selects $\mathbf
a_t$ and receives a reward $r_t \in \mathbb R$ computed from the FOM. The objective
of the agent is to learn an optimal policy that maximizes the expected discounted
return $R_t = \mathbb E \big[ \sum_{k=0}^T \gamma^k r_{t+k}\big]$, where $\gamma^k
\in [0$, $1]$ denotes the discount factor. For sample-efficient RL, we incorporate
an intrinsic reward $r^{\rm i}_t$, which encourages visiting novel or less frequently
explored circuit regions. When the extrinsic reward is denoted as $r^{\rm e}_t$, the
total reward used for training is defined as $r_t = r_t^{\rm e} + r_t^{\rm i}$.
IV. METHOD
1. Intrinsic Reward for Circuit Optimization
In our evaluation, conventional methods that rely solely on extrinsic rewards showed
low sample efficiency and tended to converge to suboptimal local minima, resulting
in low-quality circuit designs. To address this, we combine extrinsic rewards with
an intrinsic reward that encourages the agent to explore less-visited regions of the
circuit design space rather than familiar regions. Intrinsic reward mechanisms can
be categorized into count-based and prediction-based methods. Since analog circuits
have a large and continuous state space, count-based methods for estimating state
visitation counts are impractical. Therefore, we propose employing prediction-based
methods that enable more effective exploration in analog circuits. The prediction-based
intrinsic reward is formulated as ${r}^{\mathrm{i}}_{t} = \|{\mathbf{s}}_{t+1}-{f}(\mathbf{s}_{t})\|_{2}$,
where $f$ represents the prediction network, which is used to predict the next circuit
state ${\mathbf{s}}_{t+1}$.
Algorithm 1 summarizes the proposed circuit optimization framework using prediction-based intrinsic
reward. The policy network $\pi_\theta$, the prediction network $f$, and the initial
circuit parameters ${\mathbf{s}}_0$ are first initialized. For each episode, the agent
interacts with the environment and collects the trajectory (lines 1-3). Given the
current circuit state ${\mathbf{s}}_{t}$, the agent samples an action ${\mathbf{a}}_{t}
\sim \pi_\theta ({{\mathbf{a}}_{t}\mid \mathbf{s}}_{t})$ (line 4). The next circuit
state ${\mathbf{s}}_{t+1} = {\mathbf{s}}_{t} + {\mathbf{a}}_{t}$ is obtained by applying
the action to the circuit state (line 5). Here, ${\mathbf{a}}_{t}$ corresponds to
a vector ${\delta \mathbf{s}}_{t}$, which is added to ${\mathbf{s}}_{t}$ to generate
${\mathbf{s}}_{t+1}$. Then, a SPICE simulation is conducted using ${\mathbf{s}}_{t+1}$,
and an extrinsic reward ${r}^{\mathrm{e}}_{t}$ is computed based on FOM (lines 6-7).
The prediction network f predicts the next circuit state by ${\widehat{\mathbf{s}}}_{t+1}
\leftarrow f ({\mathbf{s}}_{t})$, and the intrinsic reward ${r}^{\mathrm{i}}_{t}$
is computed using the $L_{2}$ norm of the prediction error (lines 8-9). Therefore,
the total reward ${r}_{t}$ is computed as the summation of ${r}^{\mathrm{e}}_{t}$
and ${r}^{\mathrm{i}}_{t}$ (line 10). Finally, $\pi_\theta$ and $f$ are jointly updated
using the collected trajectory in the buffer ${D}$ (line 14).
By assigning larger intrinsic rewards to novel regions of circuit design spaces, we
hypothesize that this approach improves sample efficiency and helps the agent discover
high-performance designs that might otherwise be overlooked. However, advances in
technology and complex device characteristics increase the sensitivity of analog circuit
behavior, making accurate modeling and optimization more challenging. Thus, RL agents
using predictionbased intrinsic reward can be trapped in design parameters that are
highly sensitive to small variations, and minor changes can lead to abrupt performance
shifts.
2. Enhancing RL Agent with ReconstructionBased Intrinsic Reward
To overcome the limitation of prediction-based intrinsic reward, we propose a {reconstruction-based
intrinsic reward framework}, as shown in Fig. 1. The circuit reconstruction network is implemented using a CVAE, which enables structured
encoding of circuit states conditioned on actions. Compared with the next circuit
state prediction in Algorithm 1, the proposed approach evaluates how well a circuit state can be reconstructed from
past experiences with the intrinsic reward defined as ${r}^{\mathrm{i}}_{t}=\|\mathbf{s}_{t+1}-g(f(\mathbf{s}_{t+1},
\mathbf{a}_{t+1}))_2$. Here, $f$ and $g$ denote the encoder and decoder of the proposed
circuit reconstruction network, respectively. Thus, ${g}(f(\mathbf{s}_{t+1}, \mathbf{a}_{t+1}))$
represents the reconstructed circuit state ${\hat{\mathbf{s}}}_{t+1}$, and a higher
reconstruction error indicates a novel circuit state. The overall procedure of the
circuit optimization follows Algorithm 1, except for how intrinsic reward is defined. The policy network $\pi_\theta$, the
encoder network ${f }$, the decoder network ${g}$, and the initial circuit parameters
${\mathbf{s}}_0$ are first initialized. A latent representation $\mathbf{z}_{t+1}$
is obtained from $f(\mathbf{s}_{t+1}, \mathbf{a}_{t+1})$ in Fig. 1. Then, $g$ reconstructs the next circuit state ${\widehat{\mathbf{s}}}_{t+1}$ using
$\mathbf{z}_{t+1}$ and $\mathbf{a}_{t+1}$. Thus, the intrinsic reward ${r}^{\mathrm
i}$ and the total reward $r_{t}$ are computed in the same manner. Finally, $\pi_\theta$,
$f$, and $g$ are updated jointly using the trajectory in buffer $D$.
The underlying principle that exploration can be effectively guided by the state estimation
error is similar to prediction-based methods. However, instead of relying on prediction
errors, the proposed method utilizes reconstruction errors, allowing the RL agent
to efficiently explore diverse designs while avoiding misleading configurations. Experimental
results demonstrate that the proposed approach significantly improves sample efficiency
and enables the discovery of optimal circuit designs.
Fig. 1. Circuit optimization method using intrinsic reward.
V. EXPERIMENTS
1. Environment and Evaluation
We adopted proximal policy optimization (PPO) [18] as the base RL algorithm and Adam [19] as the optimizer. To evaluate the effectiveness of RL-based optimization, we evaluated
a random policy that selects actions uniformly at random without any learning mechanism.
For comparison, we used BO [20] and strong intrinsic reward mechanisms, including the intrinsic curiosity module
(ICM) [13], which computes intrinsic rewards based on forward dynamics prediction errors, and
random network distillation (RND) [14], which measures the prediction errors of a randomly initialized network as intrinsic
rewards. Additionally, we adopted traditional exploration methods such as greedy and
$\epsilon$-greedy algorithms. We also evaluated NovelD [21], which is a state-of-the-art countbased method. However, it failed to learn the policy
due to the large and continuous state space of analog circuits, thereby being excluded
from the results. All models used the same base RL algorithm and neural network architecture
for both the policy and value functions. The only difference among them was how intrinsic
rewards were defined. Besides, PPO has approximately 0.210 M parameters, while the
CVAE in the proposed method has about 0.035 M parameters, accounting for only 14\%
of the total model size. To demonstrate the effectiveness of the proposed method,
we employed two-stage and three-stage OpAmps with the same structure as those used
in [22]. Besides, a folded cascode OpAmp was included to evaluate the generalization capability
across different circuit topologies. The structure of the folded cascode OpAmp follows
the design used in [4]. The circuit simulation was performed using Ngspice, while a commercial 250 nm CMOS
technology was used for circuit design. Ngspice is linked with the ngspyce Python
interface [23], which can be used to alter parameters and gather simulation results. Besides, node
voltage, resistance, capacitance, and width of the transistor were used as design
parameters. We measured the DC gain, 3 dB bandwidth, and power consumption during
training to compute the FOM. The target specification values for each circuit are
summarized in Table 1.
Table 1. Specification targets for different OpAmp topologies.
|
OpAmp
|
DC Gain
|
Bandwidth
|
Power
|
|
Two-stage Three-stage
Folded-cascode
|
≥ 30 dB
≥ 60 dB
≥ 45 dB
|
≥ 10 MHz
≥ 8 MHz
≥ 15 MHz
|
≤ 5 mW
≤ 10 mW
≤ 10 mW
|
2. Experimental Results and Analysis
Figs. 2 and 3 show the performance of the proposed method compared with different methods for the
twostage and three-stage OpAmps, respectively(Note that the greedy and $\epsilon $-greedy
algorithms were not visualized due to high variance affecting visibility). Compared
to PPO without adopting intrinsic rewards, the power consumption, gain-bandwidth product
(GBW), and FOM were significantly improved when agents used an intrinsic reward during
optimization. Notably, compared to other intrinsic reward methods such as ICM and
RND, the proposed method demonstrates more stable convergence and higher performance.
Conventional approaches, which rely on prediction errors to design intrinsic rewards,
are wellsuited for game environments where state transitions are relatively smooth
and predictable. However, they are less effective for the circuit optimization, where
high sensitivity and nonlinear state variations make prediction-based rewards unreliable,
often leading to unstable exploration. In contrast, the proposed method leverages
reconstruction errors to evaluate structural differences in circuit states, enabling
more reliable exploration and faster convergence while maintaining high sample efficiency.
On the other hand, Tables 2 and 3 show the average of the FOM, which represents a combined measure of three performance
metrics, providing a comprehensive evaluation of optimization effectiveness. To ensure
a fair evaluation, all methods were compared using an equal number of samples collected
within the same runtime. Terms ${S}(\alpha_{1})$ and ${S}(\alpha_2)$ denote the number
of samples required to reach the target FOM values $\alpha_{1}$ and $\alpha_{2}$,
respectively. To compare sample efficiency, we set $\alpha_{1}= -3$ and $\alpha_{2}
= -1$ for the two-stage OpAmp. Besides, given the increased optimization difficulty,
we set $\alpha_{1} = -10$ and $\alpha_{2} = -5$ for the three-stage and folded-cascode
OpAmps. The results show that the proposed method consistently outperforms all baseline
approaches in both optimization performance and sample efficiency. The lower values
of ${S}(\alpha_{1})$ and ${S}(\alpha_{2})$ demonstrate that the proposed method reaches
the desired performance levels with significantly fewer SPICE simulations compared
to baselines. Notably, while ICM and RND show minimal improvements over standard PPO
and BO, they fail to learn the optimal policy effectively due to the high-dimensional
and continuous space of the analog circuit. In contrast, the proposed method allows
the agent to explore the design space more efficiently, leading to more stable convergence
and better optimization ability.
Fig. 2. Performance comparison on a two-stage OpAmp in terms of power consumption,
GBW, and FOM.
Fig. 3. Performance comparison on a three-stage OpAmp in terms of power consumption,
GBW, and FOM.
Table 2. Comparison of methods for two-stage and three-stage OpAmp optimization. Bold
values indicate the best performance.
|
Stage
|
Method
|
# of Samples
|
Avg. FOM
|
S($α_1$)
|
S($α_2$)
|
|
Two
|
Random
|
7,464
|
-32.1
|
6,988
|
7,172
|
|
BO
|
2,500
|
-7.8
|
1,427
|
2,101
|
|
PPO
|
6,653
|
-21.9
|
5,153
|
6,002
|
|
PPO (greedy)
|
9,148
|
-105
|
N/A
|
N/A
|
|
PPO (ϵ-greedy)
|
7,715
|
-64.1
|
N/A
|
N/A
|
|
PPO+ICM
|
2,463
|
-5.9
|
1,794
|
2,246
|
|
PPO+RND
|
3,157
|
-7.1
|
2,464
|
2,947
|
|
PPO+Proposed
|
2,500
|
-1.9
|
1,351
|
2,080
|
|
Three
|
Random
|
4,221
|
-199.5
|
N/A
|
N/A
|
|
BO
|
4,500
|
-14.3
|
2,830
|
4,388
|
|
PPO
|
19,905
|
-31.8
|
10,407
|
12,974
|
|
PPO (greedy)
|
20,201
|
-141.3
|
N/A
|
N/A
|
|
PPO (ϵ-greedy)
|
16,203
|
-129.4
|
N/A
|
N/A
|
|
PPO+ICM
|
9,436
|
-12.8
|
8,671
|
8,900
|
|
PPO+RND
|
4,095
|
-10.5
|
2,255
|
3,652
|
|
PPO+Proposed
|
9,000
|
-5.2
|
1,814
|
4,115
|
Table 3. Comparison of methods for folded-cascode OpAmp optimization. Bold values
indicate the best performance.
|
Method
|
# of Samples
|
Avg. FOM
|
S($α_1$)
|
S($α_2$)
|
|
Random
|
7,912
|
-42.7
|
7,912
|
9,188
|
|
BO
|
3,800
|
-10.8
|
2,342
|
3,721
|
|
PPO
|
8,716
|
-27.6
|
6,547
|
8,204
|
|
PPO (greedy)
|
10,910
|
-107.1
|
N/A
|
N/A
|
|
PPO (ϵ-greedy)
|
10,961
|
-85.3
|
N/A
|
N/A
|
|
PPO+ICM
|
4,901
|
-9.7
|
2,091
|
2,834
|
|
PPO+RND
|
5,162
|
-9.9
|
2,114
|
3,018
|
|
PPO+Proposed
|
4,680
|
-4.8
|
1,376
|
2,527
|
VI. CONCLUSION
In this work, we proposed an RL framework for analog circuit optimization using intrinsic
reward mechanisms. By leveraging state reconstruction-based intrinsic reward, the
proposed method improves sample efficiency and enables more structured exploration.
This approach enables the RL agent to identify structurally diverse circuit designs
without relying on state prediction models, which are often unstable or inaccurate
in complex design spaces. Experimental results demonstrate that our method significantly
enhances sample efficiency and optimization performance compared to standard PPO,
BO, and other intrinsic reward baselines such as ICM and RND.
ACKNOWLEDGMENTS
This work was supported by the IITP (Institute of Information & Communications
Technology Planning & Evaluation)-ICAN (ICT Challenge and Advanced Network of HRD)
grant funded by the Korea government (Ministry of Science and ICT) (IITP-2024-RS2024-00437788),
K-CHIPS (Korea Collaborative & Hightech Initiative for Prospective Semiconductor Research)
(1415188224, RS-2023-00301703, 23045-15TC) funded by the Ministry of Trade, Industry
& Energy (MOTIE, Korea), and the IC Design Education Center. Also, this research was
results of a study on the "HPC Support" Project, supported by the `Ministry of Science
and ICT' and NIPA.
References
N. Horta, ``Analogue and mixed-signal systems topologies exploration using symbolic
methods,'' Analog Integrated Circuits and Signal Processing, vol. 31, pp. 161-176,
2002.

N. Jangkrajarng, S. Bhattacharya, R. Hartono, and C.-J. R. Shi, ``Iprail—intellectual
property reuse-based analog ic layout automation,'' Integration, vol. 36, no. 4, pp.
237-262, 2003.

H. Wang, J. Yang, H.-S. Lee, and S. Han, ``Learning to design circuits,'' arXiv preprint
arXiv:1812.02734, 2018.

N. K. Somayaji, H. Hu, and P. Li, ``Prioritized reinforcement learning for analog
circuit optimization with design knowledge,'' Proc. of 58th ACM/IEEE Design Automation
Conference (DAC), IEEE, pp. 1231-1236, 2021.

Y. Choi, S. Park, M. Choi, K. Lee, and S. Kang, ``Ma-opt: Reinforcement learning-based
analog circuit optimization using multi-actors,'' IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 71, no. 5, pp. 2045-2056, 2024.

H. Wang, K. Wang, J. Yang, L. Shen, N. Sun, H.-S. Lee, and S. Han, ``GCN-RL circuit
designer: Transferable transistor sizing with graph neural networks and reinforcement
learning,'' Proc. of 2020 57th ACM/IEEE Design Automation Conference (DAC), IEEE,
pp. 1-6, 2020.

W. Cao, J. Gao, T. Ma, R. Ma, M. Benosman, and X. Zhang, ``Rose-opt: Robust and efficient
analog circuit parameter optimization with knowledge-infused reinforcement learning,''
IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, vol.
44, no. 2, pp. 627-640, 2025.

K. Yamamoto and N. Takai, ``GNN-OPT: Enhancing automated circuit design optimization
with graph neural networks,'' IEICE Transactions on Fundamentals, vol. 108, no. 5,
pp. 687-689, 2025.

J. Schmidhuber, ``Formal theory of creativity, fun, and intrinsic motivation (1990-2010),''
IEEE transactions on Autonomous Mental Development, vol. 2, no. 3, pp. 230-247, 2010

A. L. Strehl and M. L. Littman, ``An analysis of modelbased interval estimation for
markov decision processes,'' Journal of Computer and System Sciences, vol. 74, no.
8, pp. 1309-1331, 2008.

M. G. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos,
``Unifying count-based exploration and intrinsic motivation,'' Proc. of the 30th International
Conference on Neural Information Processing Systems, pp. 1479-1487, 2016.

Y. Burda, H. Edwards, D. Pathak, A. Storkey, T. Darrell, and A. A. Efros, ``Large-scale
study of curiosity-driven learning,'' arXiv preprint arXiv:1808.04355, 2018.

D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, ``Curiosity-driven exploration
by self-supervised prediction,'' Proc. of International Conference on Machine Learning,
pp. 2778-2787, PMLR, 2017.

Y. Burda, H. Edwards, A. Storkey, and O. Klimov, ``Exploration by random network distillation,''
Proc. of Seventh International Conference on Learning Representations, pp. 1-17, 2019.

K. Sohn, X. Yan, and H. Lee, ``Learning structured output representation using deep
conditional generative models,'' Proc. of the 29th International Conference on Neural
Information Processing Systems, vol. 2, pp. 3483-3491, 2015.

W. Zhou, S. Bajracharya, and D. Held, ``PLAS: Latent action space for offline reinforcement
learning,'' Proc. of Conference on Robot Learning, pp. 1719-1735, PMLR, 2021.

S. Rezaeifar, R. Dadashi, N. Vieillard, L. Hussenot, O. Bachem, O. Pietquin, and M.
Geist, ``Offline reinforcement learning as anti-exploration,'' Proc. of the AAAI Conference
on Artificial Intelligence, vol. 36, pp. 8106-8114, 2022.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, ``Proximal policy
optimization algorithms,'' arXiv preprint arXiv:1707.06347, 2017.

D. P. Kingma, ``Adam: A method for stochastic optimization,'' arXiv preprint arXiv:1412.6980,
2014.

J. Mockus, ``The application of bayesian methods for seeking the extremum,'' Towards
Global Optimization, vol. 2, 117, 1998.

T. Zhang, H. Xu, X. Wang, Y. Wu, K. Keutzer, J. E. Gonzalez, and Y. Tian, ``Noveld:
A simple yet effective exploration criterion,'' Advances in Neural Information Processing
Systems, vol. 34, pp. 25217-25230, 2021.

Y. Wang, M. Orshansky, and C. Caramanis, ``Enabling efficient analog synthesis by
coupling sparse regression and polynomial optimization,'' Proc. of the 51st Annual
Design Automation Conference, pp. 1-6, 2014.

Ignacio M. Villarreal, ``NGSPYCE: Python bindings for the Ngspice simulation engine,''
2025. Accessed: 2025-02-10.

SuMin Oh received her bachelor's degree (2024) and master's (2025) degrees in electrical
and electronics engineering from Dankook University, Republic of Korea. She is currently
pursuing a master's degree in the same department at Dankook University. Her current
research interests reside in the realm of artificial intelligence and reinforcement
learning. She is currently with Com2uS, Seoul, Republic of Korea.
HyunJin Kim received his Ph.D. degree in electrical and electronics engineering
(2010), master's (1999), and bachelor's (1997) degrees in electrical engineering from
Yonsei University, Republic of Korea. He worked as the Mixed-Signal VLSI Circuit Designer
at Samsung Electromechanics (2002.02-2005.01).
Besides, He is a Senior Engineer in the Field of Flash Memory Controller Project
at the Memory Division of Samsung Electronics (2010.04-2011.08). His current research
interests reside in the realm of the lightweight neural network implementation methodology,
reinforcement learning, and vision language action model.