Mobile QR Code QR CODE

Main Menu

The Journal of Semiconductor Technology and Science (JSTS) is an international, peer-reviewed, and open-access journal that is published bimonthly.
- Scope: semiconductor processes, devices, circuits, and MEMS.
- Editor-in-Chief: Prof. Woo Young Choi (ECE, Seoul National University)
- Indexed within Science Citation Index Expanded (SCIE), SCOPUS, Korea Citation Index (KCI), and other databases.

Journal Search

[

Research article

]

JSTS(Journal of Semiconductor Technology and Science)

IEIE Vol. 25, No. 01, p.82-93

ISSN (print) :

1598-1657

ISSN (online) :

2233-4866

Received : 18 Jun. 2024Revised : 11 Dec. 2024Accepted : 16 Dec. 2024

DOI :

https://doi.org/10.5573/JSTS.2025.25.1.82

Device Placement Optimization Based on Sequential Q-Learning Using Local Layout Effect Surrogate Models

KangKwonWoo¹ KimSoYoung²

(Department of Semiconductor and Display Engineering, Sungkyunkwan University, Suwon, Korea and System LSI Business, Samsung Electronics, Gyeonggi, Hwaseong, Korea)
(Department of Semiconductor Systems Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea)

^* E-mail: ksyoung@skku.edu

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

An automatic methodology is proposed to optimize analog device placement using reinforcement learning (RL). Device characteristics are influenced by local layout effects and the process node used; hence, physical layout information from post-layout simulation acts as the input for an artificial neural network (ANN). Trained ANNs can be implemented as surrogate models for length of diffusion and deep trench isolation, which are integrated into the reward functions of the learning agent. The Q-learning method is employed for RL. The proposed method emulates design expert expertise by sequentially applying multiple Q-learning with selected reward functions. This approach effectively completes local layout effect-aware automated placement in the early setup stage of advanced process nodes, even with limited design knowledge. Finally, two fundamental analog circuits, the folded cascode operational transconductance amplifier and comparator, are employed to demonstrate the method’s ability to achieve zero threshold voltage variation under local layout effects using dummy transistors and guard rings while maintaining area efficiency

Index Terms

Analog device placement, optimization, reinforcement learning, local layout effect, LOD, DTI, ANN, surrogate model, sequential Q-learning

I. INTRODUCTION

The design of layouts, particularly in analog design, has historically been a manual process; this is in contrast to the design of layouts in digital physical design. Despite the existence of various ML-based methods for automating analog layout design, these have not yet been widely adopted by the electronic design automation industry. The design of analog integrated circuits is highly dependent on the process used, including the specific device type and parasitic components employed. The introduction of new processes necessitates alterations to circuit topology and device placement, which in turn affects performance and the universal applicability of ML-based methods. Local layout effects (LLEs), which are not well understood in new process nodes, impact performance; understanding their influence requires extensive iterations. These effects have been reported and studied as potentially degrading or enhancing device performance, dating back to the use of legacy process nodes ^[1,^2]. In the case of sufficiently researched and mature process nodes, these effects were provided as analytical equations in the device compact model, such as BSIM, in the Process Design Kit (PDK). This allowed for the evaluation of these effects through post-simulation results based on the completed layout.

The advent of new device structures, such as FinFETs and nanosheet FETs, has necessitated the development of more sophisticated technology and novel process architectures to achieve desired performance levels ^[3]. Nevertheless, these advancements have introduced side effects that cause performance variation, especially among LLEs that are either new effects or modify existing effects. Consequently, during the process setup stage in early Design Technology Co-Optimization, accurately accounting for variation in LLEs when designing test chips is particularly challenging. As a result, inefficiently large margins, such as increased spacing and the addition of dummy transistors, are often applied. This leads to degradation from an area perspective.

This paper presents surrogate models trained with post-simulation datasets using an advanced PDK with LLE compensation. A sequential Q-learning algorithm is employed to integrate multiple surrogate models, which are optimized for device placement by minimizing LLEs and maximizing area efficiency. The rewards include considerations related to layout design, clustering by device type, tuning of input parameters, and minimization of threshold voltage variation. The objective of this paper is to introduce these rewards and demonstrate their effectiveness for achieving optimal LLE-aware device placement based on a sequential multiple Q-learning algorithm.

The rest of this paper is organized as follows. Section II shows the data generation flow for training the layout local effect surrogate model based on an artificial neural network (ANN). Section III illustrates the proposed sequential Q-learning algorithm. Section IV presents device placement results, and the paper is concluded in Section V.

II. DATA GENERATION AND LOCAL LAYOUT EFFECT SURROGATE MODEL

1. Local layout effect

The LLE is a phenomenon observed in silicon semiconductor designs, where the performance metrics of a device such as the mobility, threshold voltage, and sub-threshold swing are influenced by the placement of surrounding devices and specific parameters, including the type, width, length, and gate pitch. Moreover, LLE varies significantly across different processes ^[4]. Consequently, each process can introduce new effects, eliminate previous ones, or alter established trends. Therefore, the optimal device placement is process-specific and depends on the node being utilized. We used a Gate-all-around (GAA) logic process node and nanosheet FET devices. As depicted in Fig. 1(a), when viewed from the top, the device shows a vertical red gate layer, an active region representing the doping layer that distinguishes the device type, and a nanosheet layer that becomes the actual channel. The common overlapping area of these three layers is the target area for calculating LLE parameters, with the edge serving as the reference point, as indicated by the yellow border.

Fig. 1. Local layout effect (LLE) examples with feature parameters. (a) Targeting device channel area for calculating LLE parameters. (b) Length of diffusion (LOD) and diffusion break illustrated as ``Active cut.'' (c) Deep trench isolation effect (DTI). (d) Well proximity effect.

(a)

(b)

(c)

(d)

In Fig. 1, the three different LLEs to be integrated into the rewards of Q-learning are depicted. The length of diffusion (LOD) term in Fig. 1(b) refers to the distance from the edge of the diffusion break to the channel edge of the target device ^[5]. The LOD effect, which is caused by STI and has been modeled analytically in prior research ^[6], exhibits a similar trend to historical findings, leading to the adoption of the term. As illustrated in Fig. 1, for a multi-finger gate structure of a device, estimating this effect requires two parameters as inputs: the distance on the left, labeled ``a,'' and the distance on the right, labeled ``b.'' Deep trench isolation (DTI) effectively mitigates the gain in parasitic NPN devices in CMOS processes by maintaining a distance between the edges of channels ^[7]. While DTI proves beneficial for enhancing device performance within the context of scaling trends, it introduces new LLEs. A simple example in Fig. 1(c) shows the placement of DTI when P-type devices are arranged. Two parameters, one for the north and one for the south, are considered from the target channel edge. `${DTI}_n$' denotes the parameter for the north direction. The newly developed LLE is proportional to the size of the DTI and the stress resulting from its application to the channel. This leads to variation in device performance. Previous research analytically modeled the well proximity effect (WPE) ^[8]. The effect manifests due to scattering during ion implantation, leading to a doping gradient in the gate channel. As illustrated in Fig. 1(d), four direction parameters, ${NW}_a$, ${NW}_b$, ${NW}_n$, and ${NW}_s$, are considered from the target channel edge.

2. Dataset preparation using PDK

The dataset for the three effects previously described was generated using a PDK procured from the foundry. Consequently, the PDK was presumed to represent an optimal model-hardware correlation and proceeded through the following method. The extent to which the threshold voltage ($V_t$) of each device changes with placement was obtained through SPICE simulation. The simulation results for all possible combinations were collected by adjusting the positions of individual devices, each of which had a single gate finger, the smallest unit used in the design.

(1)

$ V_{tsat}=V_{gs} @ I_{crit}\frac{W_{eff}}{L_{eff}},~@V_{DS}\ge V_{GS}\ge 0 $

The method used to measure $V_t$, which is universally adopted by foundries for measurement, specification, and monitoring, involved the constant current method in Equation (1). The effective width and length of each device, designated as $W_{eff}$ and $L_{eff}$, respectively, are calculated in accordance with the specific device structure. The critical current, $I_{crit}$, is set to $100$ nA, and the gate-source voltage, $V_{gs}$, is incrementally increased from $0$ V to $VDD$, with $V_{DS}$ representing the input voltage, $VDD$. Once the current flowing through the channel reaches $I_{crit}\frac{W_{eff}}{L_{eff}}$, the voltage at the saturation state is defined as the threshold voltage, $V_{tsat}$.

3. Results of surrogate models trained using LASSO

In this paper, we employed an ANN model to create LLE surrogate models. The model structure used in our study is briefly illustrated in Fig. 2. The primary reason for selecting an Artificial Neural Network (ANN) in this study is its robust capability to regress any arbitrary non-linear function that defines the relationship between input process parameters and device performance, particularly in the process setup stage where Local Layout Effects (LLE) are either under ongoing research or lack compact models ^[9]. Additionally, ANNs can be easily extended into a multi-input-multi-output (MIMO) model by simply adjusting the number of nodes in the output layer ^[10].

(2)

$ {J\left(w\right)=\frac{1}{2}\sum{{(y-h\left(\pmb{\lambda }\right))}^2+\sum{\left|w\right|}}}^* $

In the loss function $J\left(w\right)$ of Equation (2), the last term, $\sum{{\left|w\right|}^*}$, represents L1 regularization, specifically the least absolute shrinkage and selection operator (LASSO). This was employed to reduce the size of the input feature vector. Reducing the size of the input vector allows for the creation of a surrogate model comprising only features that influence the output.

Fig. 2. LLE surrogate model structure.

However, while this approach enables the creation of reduced surrogate models for known LLE effects, it is more practical to use ridge regression with L2 regularization to handle unknown LLEs that may arise in advanced process nodes. Nonetheless, in this paper, we omitted this approach as the primary objective was to verify changes in trends for known effects.

The training results of the ANN model are presented in Table 1. The input features listed in the table, prior to feature selection, are the sum of the parameters inherent to the device structure and the parameters derived from the geometric positioning information resulting from device placement. Nodes represent the number of nodes in the hidden layer, while layers indicate the total number of hidden layers.

(3)

$ Relative\; error \;margin=\!\tfrac{\left|Predicted\; value\!-\!True \;value\right|}{True \;value}\!*\!100\% $

(4)

$ {Accuracy}^* \\ =\tfrac{Number\; of \; correct \; predictions \; with \; 1\% \; of \; relative \; error \; margin}{Total \; number \; of \; prediction}*100\%. $

The concept of $Accuracy^*$, as indicated in Table 1, employs the notion of a relative error margin to assess the accuracy of the trained ANN. The related equations are (3) and (4). There will always be slight errors when compared with the true values due to the limitations of computers. Hence, the concept of a relative error margin was employed. Consequently, the rationale behind the 100\% accuracy observed in Table 1 can be attributed to this concept.

Fig. 3. The trained P-type device's LLE surrogate models. (a) DTI surrogate model plot, (b) LOD surrogate model plot, and (c) WPE surrogate model plot with two parameters. The remaining parameters are fixed.

(a)

(b)

(c)

From a physical design perspective, selectable LLE features have been identified as lateral LOD and vertical DTI based on currently available knowledge. The criteria for selection were determined by how sensitively the trained LLE surrogate models respond within the given input range. Using LASSO, effects not represented in the input feature matrix related to device placement were also eliminated. Consequently, by comparing Tables 1(a) and 1(b), we observed that feature-reduced surrogate models could be developed without any loss of accuracy compared to the models without feature reduction. The results of trained LOD, DTI, and WPE surrogate models are presented in Fig. 3 as two- and three-dimensional plots. The values of the $x$-, $y$- and $z$-axes were normalized. Device placement was not considered in WPE, as its variation is minimal in comparison to the other two effects.

Table 1. Dataset for training LLE surrogate models using artificial neural networks. Inputs, nodes, and layers are hyperparameters of the ANN model. (a) Without L1 regularization and (b) with L1 regularization.

The threshold voltage of a device is calculated with the LOD and DTI surrogate models in relation to the right-side distance (${LOD}_a$) and north-side distance (${DTI}_n$) in Fig. 3 on the right side of (a) and (b). The remaining parameters are held constant. Therefore, for the LOD effect, placement is optimized to a low-gradient saturation region (${LOD}_a>{0.3}$). Conversely, effects with linear characteristics, such as DTI, are managed to minimize the area while ensuring that the effect is consistently matched across devices.

Therefore, the introduced surrogate models will act as an essential tool for automatically identifying optimal placements that account for the LLEs. This will be further elucidated in Section III.

III. SEQUENTIAL Q-LEARNING ALGORITHM

In contrast to supervised learning, where correct responses are established in advance, Q-learning is effective in scenarios where a predefined dataset is lacking, making it particularly useful in environments with scarce or inefficiently generated data, such as analog layout design using a cutting-edge process node. In analog layout design, the quantity of accumulated data is typically constrained, with the exception of a few layout modifications. Furthermore, due to concerns about potential circuit information leakage and the need to protect intellectual property (IP), it is not feasible to create large datasets of various analog layout designs for training neural networks. Moreover, analog circuits are highly susceptible to the process node utilized, such that even circuits with identical functionality may necessitate alterations in topology. Consequently, it is imperative to develop a dataset tailored to each specific process node, even for the same circuit. Hence, the implementation of reinforcement learning (RL) for device placement optimization is a fitting strategy.

1. Problem formation

1) Quantizing design space for states ($s$) and actions ($a$)

To implement a practically computable version of Q-learning, actions have been quantized into a finite action space. For device placement, five possible actions--north, south, west, east, and stationary--have been defined within this space, considering each device as a state. Due to the requirement that each action must be selectable within a finite space, the environment in which the states move is also quantized into a grid-based format. The process employed was the GAA process node, utilizing the nanosheet FET device. The available channel widths are constrained to discrete values. Although it is possible to construct an analog block using devices with disparate channel widths, this approach requires specific gate length values and the inclusion of spacers. This method is inefficient from an area perspective and introduces an additional LLE, termed tapered RX, to neighboring devices ^[11]. Furthermore, the contact-to-poly pitch (CPP) is fixed for each device gate length, and a design rule exists that only allows devices with the same gate length to be merged. Consequently, in order to minimize the size of a circuit block with a specific function, the channel width and gate length are standardized. As a result, a grid-based environment, as illustrated in Fig. 4(b), can be implemented for device placement. The horizontal grid is defined as one CPP, and the vertical grid corresponds to the channel width of the device used in the design. The blue and orange colors represent different types of devices, abstracted in the manner shown in Fig. 4(a).

Fig. 4. (a) Abstraction of device. This will be a state in the proposed Q-learning to be moved. (b) Quantized design space. P-type (blue) and N-type (yellow) are randomly placed at the initial step.

(a)

(b)

2) Reward functions with LLE surrogate models

In analog circuits, small variations in the threshold voltage ($V_{th}$) can result in significant output current deviation from the nominal value. This deviation increases in proportion to the number of devices, rendering these applications highly susceptible to threshold voltage variation. By extracting geometric parameters from the placed devices and utilizing the trained LLE surrogate models, we can directly obtain the estimated threshold voltage of the devices. This enables placement optimization under LLE considerations. Therefore, reward functions are integrated with the LLE surrogate models.

(5)

$ r^{{i}}_{x,t}= r_{area,t}+r_{wire,t}+r_{\Delta LOD,t}+r_{\Delta DTI,t}\\ \quad +r_{clustering,t}+r_{dvth,~t}+\ r_{vth\_lle,t} $

The reward of the ${i}$-th device or ${i}$-th cluster state at time step $t$ in the $x$-th sequence represented by $r^{{i}}_{x,t}$ encompasses a number of factors, including the area, wire length, LLE parameters, type clustering, sub-circuit clustering, threshold voltage difference and estimated threshold voltage by LLE surrogate models. The area reward and wire length reward ($r_{area,t}$ and $r_{wire,t}$) are set with the purpose of minimizing the area and length, respectively. They are applied as reversed rewards, where smaller related parameters result in larger rewards.

(6)

LLE parameter rewards $(r_{\Delta LOD,t}$, $r_{\Delta DTI,t})$ consisted of LOD parameter rewards and DTI parameter rewards. These rewards consider similarity of LLE parameters $( {\Delta }LOD$, ${\Delta }DTI )$ among devices ($i$ and $j$) at time step $t$. The differences in each of the LEE parameters serve as the criteria in Equation (6) for determining whether a penalty or an optimal reward value should be applied. ${\nu }_1$ and ${\nu }_2$ are hyperparameters that are controlled during reward tuning.

(7)

$ r_{clustering\_type,t}={{\Omega }}_{type,t}-\theta *{\overline{d}}_{P-type,t}-\mu *{\overline{d}}_{N-type,t}\\ r_{clustering\_ssc,t}={{\Omega }}_{ssc,t}-{\sigma }*{\overline{dx}}_{ij,t}-{\omega } * {\overline{dy}}_{ij,t} $

Clustering rewards ($r_{clustering,t}$) are used primarily for two purposes: clustering by type and clustering by same sub-circuit (ssc). Type refers to the categorization of devices into N- and P-types to maximize device matching. Within the same sub-circuits, such as differential pairs or current mirrors, or even multi-finger devices, split into multiple single-finger devices, as shown in Fig. 7(a), device pairs or groups are critical to achieve matching. Therefore, they must be as close as possible and placed in geometrically identical environments, such as symmetric and row-based arrangements. To accomplish this purpose, clustering rewards are used with LLE parameter rewards in the proposed algorithm. This will be explained in the next chapter. In Equation (7), ${{\Omega}}_t$ refers to an optimal reward that provides additional value when the device groups are placed close enough while considering layout constraints at time step t. ${\overline{d}}_{type,t}$, ${\overline{dx}}_t$, and ${\overline{d}y}_t$ in Equation (7) refer to the distance between devices of the same type, the coordinate difference in the x-axis direction, and the coordinate difference in the y-axis direction, respectively, at time step t. $\theta $, $\mu $, ${\sigma}$ and ${\omega }$ are hyperparameters that are controlled during reward tuning.

(8)

$ r_{dvth,~t}\\ = -\alpha {*\left(\sum^p_{q=1}{\sum^m_{\scriptsize \begin{array}{c} i,j=1 \\ i\neq j \end{array} }{\left|\Delta V_{th_{ij},LOD,~sub-circuit_q}\right|}}\right)}_{norm}\\ \quad -\beta *{\left(\sum^p_{q=1}{\sum^m_{\scriptsize \begin{array}{c} i,j=1 \\ i\neq j \end{array} }{\left|\Delta V_{th_{ij},DTI,~sub-circuit_q}\right|}}\right)}_{norm}\\ \quad -\delta {*\left(\sum^k_{\scriptsize \begin{array}{c} i,j=1 \\ i\neq j \end{array} }{\left|\Delta V_{th_{ij},DTI,~P-type}\right|}\right)}_{norm}\\ \quad -\zeta {*\left(\sum^l_{ \scriptsize\begin{array}{c} i,j=1 \\ i\neq j \end{array} }{\left|\Delta V_{th_{ij},DTI,~N-type}\right|}\right)}_{norm} $

The purpose of the difference in threshold voltage reward ($r_{dvth}$, $t$) is to reduce differences in estimated threshold voltage within the same sub-circuit topology and for the same type of device. This reward is calculated through Equation (8). $p$ is the number of sub-circuits in the input circuit. $m$ is the number of devices in the same sub-circuit. $k$ and $l$ is the total number of P-type and N-type devices, respectively. $V_{th,LOD}$ and $V_{th,DTI}$ are calculated using the trained LLE surrogate models explained in Section II. $\alpha $, $ \beta $, $\delta $, and $\zeta $ are hyperparameter controlled during reward tuning.

(9)

(10)

(11)

(12)

(13)

(14)

(15)

$ r_{vth\_lle,t}=r^{{i},P-type}_{vth\_lle,t}+r^{{j},N-type}_{vth\_lle,t} $

The objective of utilizing $r_{vth\_lle,t}$ is to optimize the estimated threshold voltage ($V_{th}$), which is calculated by the LOD surrogate models. As illustrated in Fig. 3(b), in contrast to DTI, there is a saturated region in proximity to the zero gradient at the maximum or minimum threshold voltage. Therefore, the objective is to maximize or minimize $V_{th, LOD}({LOD}_a,{LOD}_b)$ while simultaneously minimizing the parameters ${LOD}_a$ and ${LOD}_b$. In the case of P-type devices, Equation (9) is employed, whereas for N-type devices, Equation (11) is utilized. As a result, the outputs of Equations (9)-(12) are integrated into Equations (13) and (14) as optimal ($r_{optimal}$) and penalty ($r_{penalty}$) reward threshold values. In the case of a P-type device with a threshold voltage, $V^{{i}}_{th}$, the reward for said device ($r^{{i},P-type}_{vth\_lle,t}$) is calculated using Equation (13). Conversely, the reward for device $j$ $\left(r^{{j},N-type}_{vth\_lle,t}\right)$, an N-type device with threshold voltage $V^{{j}}_{th}$, would be calculated using Equation (14). Ultimately, like Equation (15), the reward value, $r_{vth\_lle,t}$, is determined by aggregating the outputs of the aforementioned equations. The two hyperparameters, $w_1$ and $w_2$, are calibrated during the reward tuning process.

2. Proposed algorithm

(16)

$ q^{{i}}_{x,t+1}(s^{{i}}_t,a^{{i}}_t)=(1-\alpha )\times q^{{i}}_{x,t}(s^{{i}}_t,a^{{i}}_t)\\ +\alpha \!\times\! ({\mathrm{R}}_{x,t+1}\!+\!\gamma \!\times\! {\mathop{\mathrm{max}}_{a\in a^{{i}}_t} [q^{{i}}_{x,t}(s^{{i}}_{t+1},a)]}), {i}=1,~\cdots ,~n $

(17)

$ {\mathrm{R}}_{x,t}=r^1_{x,t}+\cdots +r^{{i}}_{x,t}+\cdots +r^{\mathrm{n}}_{x,t} $

(18)

$ {{\mathbf{S}}}_{x,t}=[s^1_t,~\cdots ,~s^{{i}}_t,~\cdots ,~s^{\mathrm{n}}_t] $

(19)

$ {{\mathbf{A}}}_{x,t}=[a^1_t,~\cdots ,~a^{{i}}_t,~\cdots ,~a^n_t] $

(20)

$ r^i_{x,t}=\sum_{h\in \{Selected \;rewards\; in \;sequence \;x\}}{r^i_{h,\ t}} $

The Q-learning algorithm utilized in this paper employs Watkins-Dayan ^[12]-based Q-learning, as demonstrated in Equation (16). $\alpha $ and $\gamma $ represent the learning rate and discounted factor, respectively. Additionally, epsilon-greedy exploration is applied to balance the trade-off between exploitation and exploration, with probabilities of $1-\epsilon $ and $\epsilon $, respectively. This approach allows the agent to reinforce the evaluation of known good actions while also exploring new actions ^[13]. In this context, $x$ represents the sequence index, $t$ represents step number, ${i}$ represents the ${i}$-th device, and $n$ represents the total number of devices, respectively. The reward value, states set, and actions set of devices in sequence $x$ at step $t$ are ${\mathrm{R}}_{x,t}$, ${{\mathbf{S}}}_{x,t},$ and ${{\mathbf{A}}}_{x,t}$ in Equations (17), (18), and (19), respectively. Former related research ^[14] applied deep Q-learning that a state represents all devices' placement and agent selects actions of all devices. However, an alternative approach has been taken whereby individual Q tables ($q^{{i}}_{x,t}(s^{{i}}_t,a^{{i}}_t)$) have been applied for each state-action pair of ${i}$-th device, rather than applying a shared Q table for all devices. It was observed that selecting all actions simultaneously in a single step of Q-learning led to a failure in convergence to an optimal layout, instead resulting in a persistent oscillatory pattern. The root cause of this issue lies in the trade-off relationship between LLE rewards and traditional metrics such as area and wire length, combined with the dynamic nature of LLEs, which continuously vary and exert influence based on the real-time relative distances among all devices. Moreover, the Q-table structure enables the effective representation of state and action pairs for individual devices, thereby facilitating straightforward management. Consequently, as the number of devices increases, the number of Q-tables increases correspondingly. However, this approach may not fully capture the interactions between devices, especially when the optimal action for one device depends on the state or action of another device. To address this limitation, we applied a total reward (${\mathrm{R}}_{x,t}$) that considers all devices in the circuit, as shown in Equation (17). Fig. 5 illustrates the structure of the proposed algorithm. Based on that, in order to emulate knowledge-based design flow used by experts, a sequential algorithm is implemented to identify the optimal placement and ensure efficient convergence to the optimal solution. For this purpose, rather than simultaneously applying all reward functions previously described, each sequence selectively uses them according to Equation (20) to calculate the next reward value at step $t+1$. The agent selects the current actions for devices that yield the maximum Q values based on the current states array and the reward at sequence $x$. After taking actions, the next states at step $t+1$ include updated features for each device such as relative distances, device type, length, coordinates, and LLE parameters. Those features interact with sequentially selected rewards. Fig. 6 depicts the overall proposed algorithm.

Fig. 5. Structure of the proposed Q-learning's agent and environment.

Fig. 6. Sequential Q-learning algorithm.

IV. DEVICE PLACEMENT RESULTS

A folded cascode operational transconductance amplifier (OTA) comprising 16 transistors with a single-finger gate and a strongarm comparator comprising 12 transistors with a multi-finger gate are employed to reproduce LLE-aware automated placement in a grid-based environment. As illustrated in Fig. 7(a), during the process of parsing multi-finger device information from the netlist, multi-finger devices can be split into several single-gate finger devices. This method is frequently employed during device placement optimization, whereby the potential for merging devices on the same net, as shown in Fig. 7(b), is leveraged. As a result, the problem can be transformed into a single-finger device placement optimization with hierarchical sub-circuit clustering, as illustrated in Fig. 9(b). The proposed algorithm comprises five sequences, with the rewards utilized in each sequence as follows:

1) Sequence 1: Area

2) Sequence 2: SC clustering, LOD parameters

3) Sequence 3: Type clustering, LOD parameters

4) Sequence 4: Wire length, DTI parameters, area

5) Sequence 5: $\mathrm{\Delta }V_{th}(DVTH)$, $V_{th}(VTH\ LLE)$, area

Fig. 8(a) depicts the final rewards of each episode in each sequence, while sequence 1 plots the rewards per step. The criteria for selecting the rewards for each sequence sought to emulate the design knowledge of experts. Hereafter, each step will be explained in detail.

In sequence 1, Q-learning is conducted using only the ``Area'' reward. This is done to reduce the overall space of states available from the initial placement, preventing unnecessary exploration actions and efficiently leading to sub-optimal states.

In sequence 2, ``LOD parameters'' and ``Sub-circuit (SC) clustering'' rewards are applied simultaneously. As shown in Fig. 3(b), the LOD effect diminishes in proportion to the distance from the targeted channel edge to the diffusion break. Therefore, when devices are merged continuously, this effect is reduced. This is particularly important for device matching, where the merging of adjacent device pairs in the same sub-circuit is crucial. If a device in the circuit has a multi-finger gate feature, gate splitting is performed, and a hierarchical sub-circuit structure is applied to the device. Subsequently, sequential clustering is performed for gathering split single-finger devices as the way in Fig. 7(b), resulting in the devices being merged with one another.

Fig. 7. (a) The method of splitting a multi-finger device, MN1 into two single-finger devices, MN1 ^[1] and MN1 ^[2]. (b) The condition for merging devices, MN1 and MN2.

(a)

(b)

In sequence 3, ``LOD parameters'' and ``Type clustering'' rewards are applied for gathering same type device, such as the P- and N-types. This clustering guides the agent's action selection to find states where further merging of devices of the same type is possible, even across sub-circuits.

In sequence 4, three rewards, ``Wire length'', ``DTI parameters'', and ``Area'', are applied. The purpose of this sequence is to optimize spacing and alignment between rows, ensuring that wire lengths are minimized and uniform, particularly for wires originating from devices within the same sub-circuit.

Sequence 5 of Q-learning incorporates cluster states (${{\mathbf{C}}}_{{\mathbf{x}},{\mathbf{t}}}$), which are represented as additional dummy transistors in each row, and actions are constrained to the addition (action number: 1) or removal (action number: 0) of the outermost dummy transistors in each row, followed by the addition of a guard ring, with all other devices fixed.

In Fig. 8(b), the rewards plots indicate that the difference in the threshold voltage (DVTH) reward; reaching the saturation range of $V_{th}$ under the LLE (VTH LLE) reward can be enhanced while minimizing the loss in the Area reward. Consequently, the Q-learning agent in Sequence 5 is capable of identifying the optimal solution in a relatively short period of time. Thus, unlike Sequences 1 to 4, Sequence 5 depicts the incremental rewards accrued in the final episode, rather than the final rewards of each episode in Fig. 8(b). The final placement of the OTA and comparator utilizing the proposed algorithm are illustrated in Fig. 9. A diverse array of colors is employed to differentiate the annotated sub-circuits (current mirror, differential pair, bias, switch, latch) within initial input netlists. Splitting multi-finger devices into single-finger devices is annotated as the first level sub-circuit, and the second level sub-circuit includes the split devices, as shown in Fig. 9(b). Gray indicates dummy transistors, while brown indicates guard rings.

Fig. 8. Rewards plot of sequential Q-learning. (a) Sequence 1 rewards per step in one episode and Sequence 2 to Sequence 4, last rewards per episode. (b) Sequence 5 rewards per step in the last episode of OTA (top) and comparator (bottom).

(a)

(b)

Fig. 9. The final placement with abstracted devices using the proposed algorithm. (a) OTA. (b) Comparator.

(a)

(b)

Fig. 10. LLE-compensated placement results showing the threshold voltage variation of the core devices, indicated by red borders. The placements of the OTA and comparator are shown in (a)-(b) and (c)-(d), respectively: (a) $V_{th}$, influenced only by LOD; (b) $V_{th}$, influenced only by DTI; (c) $V_{th}$, influenced only by LOD; (d) $V_{th}$, influenced only by DTI.

(a)

(b)

(c)

(d)

Fig. 10 demonstrates that the LLE-aware placement minimizes threshold voltage variation of the core devices in the given netlist while maintaining area efficiency using only the necessary dummies and compact row-space aligned guard rings. The threshold voltage levels of the core devices that affect circuit performance are identical for each device type.

Table 2. Comparison of placement results using three methods: non-compensated (non-LLE-aware), manual with legacy design knowledge, and the proposed algorithm. All data are normalized. (a) Folded cascode OTA results. (b) Strongarm comparator results.

In Table 2, a comparison of the three different placement methods is presented, including the area, the standard deviation of the threshold voltage by type, and the area overhead. To examine the threshold voltage variation of core devices across test blocks, we calculate the standard deviation. The threshold voltage values, $V_{th,LOD}$ and $V_{th,DTI}$ are estimated by the LOD and DTI surrogate models, respectively. All data presented have been normalized to ensure confidentiality. ``Legacy manual'' refers to placing devices manually with legacy design knowledge of a mature process node, as explained in Section I. These layouts were provided by experienced analog layout experts, each with over a decade of experience in the field. The proposed method, which employs an algorithmic approach leveraging LLE surrogate models trained on process data, is capable of automatically compensating for LLE, thereby reducing threshold voltage variation to zero while simultaneously reducing the area overhead by a factor of three compared to the "Legacy Manual" method.

In summary, the automatic prediction and incorporation of various LLEs from the initial device placement obviates the necessity for post-simulation. This approach not only preserves the performance of analog designs but also enhances area competitiveness, even in the initial stages of the process setup.

V. CONCLUSIONS

This paper presents an LLE-aware device placement optimization method based on RL. In consideration of the influence of device characteristics on local layout circumstances and process nodes, physical layout information is employed as the input for an ANN. This information is used to analyze local layout effects and threshold voltage relationships, as observed from post-layout simulations. Trained ANNs were implemented as surrogate models for the LOD and DTI, which were integrated into the reward functions of the learning agent. The Q-learning method is employed for RL. This approach is effective for optimizing device placement by suppressing LLEs with regard to threshold voltage variation among devices. Moreover, the proposed method can emulate the expertise of a design expert by sequentially applying multiple Q-learning with selected reward functions. This enables the automatic generation of an optimal device placement solution in the early setup stage of the advanced process node, where legacy design knowledge would not be effective. Finally, the method can make the threshold voltage variation of devices equal to zero under local layout effects by applying dummy transistors and guard rings while maintaining area efficiency.

ACKNOWLEDGMENTS

References

G. Scott, et al., ``NMOS drive current reduction caused by transistor layout and trench isolation induced stress,'' Proceedings of IEEE International Electron Devices Meeting, Washington, DC, USA, pp. 827-830, Dec. 1999.

Y. Luo and D. K. Nayak, ``Enhancement of CMOS Performance by Process-Induced Stress,'' IEEE Transactions on Semiconductor Manufacturing, vol. 18, no. 1, pp. 63-68, Feb. 2005.

A. Veloso, et al., ``Innovations in transistor architecture and device connectivity for advanced logic scaling,'' Proc. of 2022 International Conference on IC Design and Technology (ICICDT), IEEE, 2022.

C. Ndiaye, et al., ``Layout dependent effect: Impact on device performance and reliability in recent CMOS nodes,'' Proc. of 2016 IEEE International Integrated Reliability Workshop (IIRW), IEEE, 2016.

A. Pal, et al., ``Self-aligned single diffusion break technology optimization through material engineering for advanced CMOS nodes,'' Proc. of 2020 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD). IEEE, 2020.

J. Xue, et al., ``A framework for layout-dependent STI stress analysis and stress-aware circuit optimization,'' IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 3, pp, 498-511, 2011.

M. Agam, et al., ``Physical and electrical characterization of deep trench isolation in bulk silicon and SOI substrates,'' Proc. of 2021 32nd Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC), IEEE, 2021.

Y.-M. Sheu, et al., ``Modeling the well-edge proximity effect in highly scaled MOSFETs,'' IEEE Transactions on Electron Devices, vol. 53, no. 11, pp. 2792-2798, 2006.

R. Butola, Y. Li, and S. R. Kola, ``A comprehensive technique based on machine learning for device and circuit modeling of gate-all-around nonosheet transistors,'' IEEE Open Journal of Nanotechnology, 2023.

K. Ko, J. K. Lee, M. Kang, J. Jeon, and H. Shin, ``Prediction of process variation effect for ultrascaled GAA vertical FET devices using a machine learning approach,'' IEEE Trans. Electron Devices, vol. 66, no. 10, pp. 4474-4477, Oct. 2019.

J. Kim, et al., ``Local layout effect-aware static timing analysis by use of a new sensitivity-based library,'' Proc. of 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), IEEE, 2023.

C. J. C. H. Watkins and P. Dayan, ``Q-learning,'' Machine Learning, vol. 8, pp. 279-292, 1992.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT press, 2018.

M. Ahmadi and L. Zhang, ``Analog layout placement for FinFET technology using reinforcement learning,'' Proc. of 2021 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2021.

KwonWoo Kang

KwonWoo Kang received his B.S. degree in electrical engineering from Hanyang University, Seoul, Korea, in 2017. In 2017, he joined the Samsung Electronics Semiconductor S.LSI Business, Hwaseong, Korea, where he was involved in the Design Implementation Group, Design Platform Development Team. He is currently an M.S. student at Sungkyunkwan University, Suwon, Korea. His interests include mixed signal (analog/digital) layout optimization, automation, and EDA tool enhancement.

SoYoung Kim

SoYoung Kim received his B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 1997 and her M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1999 and 2004, respectively. From 2004 to 2008, she was with Intel Corporation, Santa Clara, CA, where she worked on parasitic extraction and simulation of on-chip interconnects. From 2008 to 2009, she was with Cadence Design Systems, San Jose, CA, where she worked on developing IC power analysis tools. She is currently a Professor with the Department of Semiconductor Systems Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea. Her research interests include VLSI computer-aided design, signal integrity, power integrity, and electromagnetic interference in electronic systems.

JSTSJournal of Semiconductor Technology and Science

Journal Search

Journal XML

Journal Information

Device Placement Optimization Based on Sequential Q-Learning Using Local Layout Effect Surrogate Models

Abstract

Index Terms

I. INTRODUCTION

II. DATA GENERATION AND LOCAL LAYOUT EFFECT SURROGATE MODEL

1. Local layout effect

2. Dataset preparation using PDK

(1)

3. Results of surrogate models trained using LASSO

(2)

(3)

(4)

III. SEQUENTIAL Q-LEARNING ALGORITHM

1. Problem formation

(5)

(6)

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

(15)

2. Proposed algorithm

(16)

(17)

(18)

(19)

(20)

IV. DEVICE PLACEMENT RESULTS

V. CONCLUSIONS

ACKNOWLEDGMENTS

References

KwonWoo Kang

SoYoung Kim

Article Information (continued)

Index Terms

JSTS
Journal of Semiconductor Technology and Science