Mobile QR Code QR CODE

  1. (Department of Computer Science, Sogang University, Seoul, Korea)



Clock tree synthesis, machine learning, random forest, 3D-IC, thermal gradient

I. INTRODUCTION

As the process technology scales down, there has been rapid increase in the density and the design complexity of the integrated circuit (IC). Recently, the 3D-IC design technologies are drawing much attention due to the physical limit on process scaling. In a synchronous system, the clock tree synthesis (CTS) is performed between placement and routing in physical design. Since the clock signal connects all the clock sinks on multiple layers or dies, there are many challenging tasks in each CTS design step. In multi-layered circuits, the clock tree generated by traditional clock skew optimization methods [1-3] may suffer from heat problem especially in inner layers. One of the goals in CTS is the skew optimization. The clock skew is normally defined as the maximum difference among clock latencies from clock source to all the sink nodes. In particular, thermal variation becomes a crucial factor affecting reliability and performance due to the structural characteristics in 3D-IC [4]. The 3D-clock tree elements such as clock source, sinks, buffers, and TSVs are the major heating sources on the circuit.

Machine learning (ML) algorithms have been increasingly adopted to solve traditional design problems at various steps of circuit design processes. In back-end design, the works in [5] and [6] proposed the method to apply routing using support vector machine (SVM) and Generative adversarial networks (GANs). Recently, there are approaches applying ML algorithms to 3D-CTS. However, it has potential problems in building training data sets and predicting results. In addition, 3D-CTS on advanced nodes may require considering thermal variations for better reliability and accuracy. Our main contribution is building the framework of thermal effect prediction for constructing 3D-clock tree with the following features; 1) thermal prediction method based on random forest; 2) thermal profiling time reduction; 3) the features definition and extraction to thermal variation for better accuracy; 4) thermal effect consideration in 3D-CTS while maintaining the design targets.

The random forest based thermal prediction for 3D-CTS is proposed in this work through utilization of thermal information. Prior to CTS step, the thermal information is trained to construct the thermal weight matrices. Then, clock tree is constructed by reflecting thermal variations. The remainder of the paper is organized as follows. Section II explains existing methods of thermal-aware CTS and ML based 3D-CTS. The proposed method based on ML with thermal prediction for 3D-CTS is described in section III. Experimental results and discussions are described in section IV. Finally, conclusions are given in section V.

II. BACKGROUND & MOTIVATION

The design factors of 3D-CTS such as clock sinks, TSVs, and buffers affect the overall wire length, the power consumption, and the clock skew. Especially, TSV insertion is one of the important steps while constructing a 3D clock network in contrast with conventional 2D-CTS. The low-power and reliable clock network design is proposed in [7]. The method with TSV allocation in [8] is the multi-layer tree generation algorithm based on 2D tree topology. In 3D-IC design, the major issue is the thermal problem affected by several physical structures. Works in [9] and [10] proposed the method of skew balancing between the uniform and the worst thermal profiles. The approach based on the deferred merging embedding algorithm is proposed in [11]. It can handle skew variations caused by thermal gradients. Thermal-aware 3D symmetrical buffered CTS in [1] proposed the overall CTS with thermal consideration for reducing the skew. The delay estimations were calculated to apply Elmore delay model [12]. Also, the thermal-aware delay estimation model [10] is generated in [12].

Recently, there have been many researches applying ML algorithms for reducing analysis time and improving performance at various levels of design stages. The method [13] is proposed to predict each parameter and derive the results of CTS using SVM. There are several researches [6] and [14] that use GAN to utilize images for clock tree construction and routing. The authors in [15] proposed a deep reinforcement learning approach for global routing that uses Q-learning algorithm.

The motivation of this work is to predict the accurate thermal effect of 3D-CTS in a reasonable time. The thermal variation influences the clock skew while TSV and buffer insertion greatly impact on thermal distribution. An example of thermal variation on clock skew is illustrated in Fig. 1. The clock skew without considering thermal effect has 9ps, however, clock skew considering the thermal effect increases up to 26 ps as shown in Fig. 1(a) and (b). The CTS consists of steps such as tree topology generation, element insertion, and clock skew adjustment. The thermal-aware CTS requires an additional step which is so-called thermal profiling. For accurate analysis, this step requires extra implementation time and current state information of circuit. The approaches [1] and [9] perform profiling thermal data after CTS and take iterations if necessary as in Fig. 2(a). Clustering sinks, tree topologies, and buffers/TSV location may vary by the thermal distribution. Therefore, thermal data needs to be updated and applied at each CTS step for accuracy as described in Fig. 2(b). However, it is not only expensive but also difficult to obtain precise measurements due to a lack of circuit information such as power distribution, location of resources, and their connectivity.

The thermal information is constructed in the form of diffusion around the heating sources. In general, the range of temperature is limited due to the physical characteristics of the materials. The thermal effects can be calculated by the material constants and physical characteristic parameters with formulations such as thermal diffusivity, impedance, conductivity, and joules’ heating in [4] and [16]. For predicting thermal data, the regression algorithm is suitable for training data with reasonable accuracy. The random forest [17], as one of the supervised learning algorithms, is specialized for predicting with a few numbers of uncertain parameters. The proposed random forest based thermal prediction for 3D-CTS which includes profiling and training is illustrated in Fig. 2(c). For verifying the performance of our method, this work adopts [7] and [11] that are one of the well-known CTS algorithms as the example. The following section describes detailed steps of our method.

Fig. 1. Thermal variation on clock skew: (a) Clock skew without considering thermal variation; (b) Clock skew considering thermal variation.
../../Resources/ieie/JSTS.2023.23.3.149/fig1.png
Fig. 2. Thermal-aware CTS flows: (a) Thermal profiling iteration; (b) Thermal profiling at each step; (c) ML based thermal profiling.
../../Resources/ieie/JSTS.2023.23.3.149/fig2.png

III. RANDOM FOREST BASED THERMAL PREDICTION FOR 3D-CTS

Firstly, thermal information is trained using random forest. In training step, the primary inputs are the set of sinks and profiled thermal data. The output is thermal weight matrices that represents the degree of thermal diffusion. Secondly, the inferenced thermal data are applied to generate clock tree. The overall flow of our method is described in Fig. 3.

Fig. 3. The overall flow of our thermal-aware CTS with random forest algorithm.
../../Resources/ieie/JSTS.2023.23.3.149/fig3.png

1. Training Thermal Information using Random Forest

1) Feature Extraction and Building Training Data

The features in our work are defined as the measured circuit data related to heating. The training data generation consists of three steps. First, the structural elements of CTS are classified by their element types. Second, the area of heat effect is calculated by geometrical information including locations of clock tree elements and distances among them. Third, the thermal data are applied to the calculated area of each CTS element. The notations in the formulation are summarized in Table 1.

The set of all CTS elements is classified into elements U. An example of building training set is described in Fig. 4. For simplicity, Circuit elements such as sink, TSV, and buffer are represented as S, T, and B at corresponding locations. The set of thermal data Q is defined as geometric and temperature information on the placed circuit. In order to reflect thermal information, the circuit area is divided into small grids as shown in Fig. 4(a). Each grid q$_{n}$ has temperature values which are calculated by thermal simulation in the power distribution network.

The heat effect is normally represented as contour form and the temperature varies with the distance from the heat source. In addition, the physical design steps related to CTS such as placement, power distribution network construction, and wire routing are performed on the grid with rectangular regions. The heat-effective area is defined as the region affected by thermal diffusion. A training data set E$_{i}$ includes thermal data q$_{n}$ and its corresponding region of each type. Also, E$_{i}$ consists of a set of CTS elements and its associative parameters such as thermal resistivity, conductance, and capacitance. Each element e$_{ij}$ contains both coordinates of e$_{ij}$ and its thermal matrix. The heat-effective area with the thermal data is represented as the square matrix e$_{ij}$ that is the degree of thermal diffusion. Each type of element E$_{i}$ has unique diffusivity and suitable size M$_{i}$ which represents magnitude of the heat-effective area. Thus, thermal effect is represented as M$_{i}$${\times}$M$_{i}$ matrix form. Computing M$_{i}$ is based on the nearest neighbor graph (NNG). We utilized Euclidean distance for reflecting thermal diffusion and constructing NNG. Then, all nodes are clustered with their nearest neighbor node, and all edges are sorted by the cost of distance. Let D$_{i}$ = {d$_{1}$, d$_{2}$, …, d$_{x}$} is a set of distance costs at E$_{i}$, then M$_{i}$ is $\left\lceil d_{y}\right\rceil ,$ the median cost of D$_{i}$. If the number of elements in set E$_{i}$ is even, then d$_{y}$ is the average of two median costs. An example of M$_{i}$ calculation is described in Fig. 5. If five sinks are placed on grid, then the result of applying matrix form is described in Fig. 5(a). The NNG of all sinks is constructed based on the cost of distance and sorted by ascending order. The edges (B, C) and (C, E) are the median values in the list of costs, 4.00 and 4.24, of which the average is 4.12. The M$_{i}$ is decided as 5 to round up the average described in Fig. 5(b) and (c).

Table 1. Notation in formulation

U

a finite set of all CTS elements. ( = {E1, E2, …, Ei} )

Ei

a set of same type of each CTS elements. ( = {ei1, ei2, …, eij} )

eij

the information structure of an element ( ∈ Ei )

eijM×M

a M-by-M matrix of thermal data on eij.

Q

a set of thermal data on circuit. ( = {q1, q2, …, qn} )

qn

temperature of a grid within of circuit. ( ∈ Q )

qinit

initial temperature.

tmax

maximum temperature bound of training set

tmin

minimum temperature bound of training set

tinit

normalized initial temperature.

Di

a set of distance costs at element set Ei ( ={d1, d2, …, dx} )

dy

a distance cost in Di ( ∈ Di )

Mi

the size of matrix on ei

Fig. 4. Building training set on circuit with classification of CTS elements: (a) The grid of thermal data after profiling; (b) An example of the training set and shaded overlapped grids.
../../Resources/ieie/JSTS.2023.23.3.149/fig4.png
Fig. 5. Computing matrix size: (a) Unique size of matrix on sinks; (b) NNG of sinks; (c) M$_{i}$ calculation with median cost.
../../Resources/ieie/JSTS.2023.23.3.149/fig5.png
Algorithm 1: Building training set
../../Resources/ieie/JSTS.2023.23.3.149/al1.png

The profiled thermal data can be inconsistent due to peripheral temperature. This may lead to unexpected or inaccurate training results. To maintain consistent training sets, temperature values are normalized to reflect the amounts of variation. Thus, the initial temperatures are unified at entire training sets, since each temperature is measured on different initial temperatures. The marks of q and t are denoted as profiled temperature and normalized value. The matrix e$_{ij}$ is denoted by j$^{th}$ element of the type i. The value in the center grid of e$_{ij}$ is denoted as the initial temperature q$^{ij}$$_{init}$, and it is normalized into t$^{ij}$$_{init}$. The l$^{th}$ temperature and normalized value in eij are denoted as q$^{ij}$$_{l}$ and t$^{ij}$$_{l}$. The normalizing process is performed with computation by the ratio of q$^{ij}$$_{init}$ to t$^{ij}$$_{init}$ with the bound of t$_{min}$ and t$_{max}$. The values in e$_{ij}$ represents the degree of thermal variation in the neighbor area.

An example of the final training set is illustrated in Fig. 4(b). The overlapped grids (OG) that are shaded in the example have more heat influence. The pseudo-code of building training set is summarized in Algorithm 1. The primary inputs are the CTS elements on all layers with the corresponding thermal data and the thermal information matrices are generated as outputs of our algorithm. The time complexity of the outer loop, as matrix construction, is O(M) where M is the number of element types. The time complexity of the inner loop as thermal data normalization takes O(T) where T is the number of grids on matrix. The overall time complexity of the proposed process of building training data becomes O(M·T).

2) Training Thermal Effect with Random Forest

Prior to applying ML, the training sets are classified into matrices of corresponding types. The thermal tendency of training set is computed by aggregating matrices of same types by random forest. The weight matrices representing the degree of thermal diffusion at each type of element are achieved as the result of training.

Due to unpredictable next operating conditions, we consider the uniformed circuit operation conditions focusing on the thermal profiling steps and the characteristic parameters related to thermal. The training parameters of thermal effect are calculated by models of thermal distribution. The thermal effect matrix Q$_{M}$ with matrix size M is computed by applying the matrix form in [4] as:

(1)
../../Resources/ieie/JSTS.2023.23.3.149/eq1.png

where G$_{M}$, T$_{M}$, and Q$_{M}$ are M-by-M matrices that represent the thermal resistivity, temperature nodes, and heat sources respectively. The Q is calculated by both the thermal diffusivity equation and thermal profiling with the power distribution network. The coefficient of thermal diffusion [16], called thermal diffusivity, is adopted on the training data, and it is calculated by the heating parameters. As the learning step, the thermal data are represented by the variation of thermal and thermal diffusivity. The thermal diffusivity is calculated from Fourier’s law. Then, thermal diffusivity in conductivity for analyzing thermal diffusion is given by the following Eq. (2), and the thermal diffusivity equation is transformed into Eq. (2a).

(2)
../../Resources/ieie/JSTS.2023.23.3.149/eq2.png
(2a)
../../Resources/ieie/JSTS.2023.23.3.149/eq2a.png

where ${\alpha}$$_{diff}$, ${\rho}$, and c$_{p}$ represent the thermal diffusivity, materials density, and specific heat capacity. q$_{v}$ is the volumetric heat source(W/m$^{3}$), and ${\Delta}$T is the variation of temperature. The ${\Delta}$T$_{norm}$ is defined as the value of normalized thermal variation. It can be calculated by subtracting two values, the initial value t$^{ij}$$_{init}$ and the normalized value t$^{ij}$$_{l}$. The calculation is modified by substituting ${\Delta}$T by ${\Delta}$T$_{norm}$ from Eq. (2). The ${\delta}$ is defined multiplicative inverse of normalizing ratio. The coefficient ${\alpha}$$_{diff}$ should be adjusted into ${\beta}$$_{diff}$ that is calculated by ${\delta}$ · ${\alpha}$$_{diff}$. Then, each ${\Delta}$T$_{norm}$ on all matrices in E$_{i}$ is calculated to thermal effect vectors by the matrix form applying to Eq. (1). The calculation of thermal effect is described as following Eq. (3). The temperature T$_{k}$$^{i}$ is computed to model the thermal effect of neighbor elements. In the i$^{th}$ grid of clock tree elements, the data T$_{k}$$^{i}$ included thermal variation at grid k is calculated as below:

(3)
../../Resources/ieie/JSTS.2023.23.3.149/eq3.png

where l is the neighbor grid of k, g is denoted as the thermal impedance of elements or thermal impedance between grid k and j. The Q is defined as generated heat, C is matrices of heat absorbing capability for each layer, and s is the coefficient of the domain. The term sC$_{k}$$^{i}$ is added for representing different layers at 3D-IC. The thermal effect on each layer matrix Q can be calculated using Eq. (1). Also, the sets of thermal diffusivity using Eq. (2a) on each grid are input vectors which represents mutual heating effect. The correlation of thermal variation and diffusivity is trained by random forest.

The training process in this work utilizes the bootstrap aggregating technique that divides and combines trees. To construct the forest, we apply the boosting as one of the ensemble models [17] specialized in regression and classification. Let’s assume that N is the number of training set, then each training set becomes the root of each decision tree. For maintaining accuracy, prior to begin learning steps, the optimal N is measured by integer linear programing (ILP). The training process consists of three steps; generation of N base learners through bootstrap, training by decision trees, and computing training results from all decision trees.

Firstly, the N base learners are generated with bootstrap. The forest specification is determined by forest parameters which become the number of decision trees and the size of forest as tree depth. Secondly, the N base learners are trained by random decision tree. The random node optimization is performed by objective function and split function according to the width and depth of the forest. The nodes of each tree have their split functions that consist of thermal effect parameters in Eq. (1). The probability of Q is generated to maintain their randomness by the expected thermal variant of the target due to overestimation. All the trees that proceeded above steps are combined to the form random forest. Finally, learning results are achieved by taking average of output values of all decision trees. The thermal weight matrix, defined as Y$_{i}$, represents the correlation of thermal effect at E$_{i}$ by distance from the center of the matrix. The process is iterated to fill Y$_{i}$ with the learning results. The larger weight has a greater impact from thermal. The weights can be added to other weights as Eq. (3) when the matrices are overlapped at the same grid. An example of the learning thermal effect is shown in Fig. 6. In the Y$_{i}$, grids (e$_{\mathrm{(3,4)}}$, e$_{\mathrm{(4,3)}}$) which are the same distance from target element e$_{i}$ have different weights due to their randomness.

Fig. 6. Learning process to obtain thermal weight matrix with random forest.
../../Resources/ieie/JSTS.2023.23.3.149/fig6.png

2. Application of Thermal Weight Matrix in 3D-CTS

The Y$_{i}$ generated in the previous step represents the thermal sensitivity of each clock tree element. Thermal variations are updated immediately using Y$_{i}$ with no additional measurement or profiling at each tree construction step. In each step, Y$_{i}$ correspond to the circuit with target resources of clock tree. The overall flow of our application process is described in Fig. 7.

1) Clock Tree Topology Generation and Elements Insertion with Thermal Weight Matrix

The geometrical information of each sink is paired with its corresponding Y$_{i}$. An example of the clock tree synthesis with trained thermal data is represented in Fig. 8. Sinks and matrices are placed as shown in Fig. 8(a). As the initial step, clock sink clustering is generally performed to create clock tree topologies in CTS. The clock skew can be reduced with avoiding hot spot region. For this reason, the heating degree of elements are calculated by the average values in Y$_{i}$ on each clock tree elements. Fig. 8(b) shows clock tree topology without updating thermal data and with updating thermal data. C and D are paired without thermal consideration. However, with thermal in clock sink paring, D and E are paired as described in the right in Fig. 8(b). In this case, sinks are clustered for avoiding hotspot regions overlapped Y$_{i}$.

The clock tree elements are inserted after topology generation step. Each element that may cause heating effect on clock tree is placed by Y$_{i}$. In this step, we assume that each element is placed on middle of two nodes. In general, the geometrical coordinates of elements are determined by next step. However, the thermal distribution can be predicted in advance by Y$_{i}$. Fig. 9 describes the determining process of TSV location between T$_{a}$ and T$_{b}$. The location is selected by the number of OGs and hotspot regions between the candidates. The hotspot region and thermal distribution are estimated by the OGs and weight matrices placed on candidate location. The Y$_{i}$ are applied to the candidates, after the hotspot regions are predicted. When TSV is inserted on T$_{a}$, the number of OGs in left subtree of T$_{a}$ is 18, and right subtree has 15 OGs. Thus, the total number of OGs becomes 33 and the number of hotspot regions with the most OGs is 1. On the other hand, when TSV is inserted on T$_{b}$, OGs in the left subtree of T$_{b}$ are 12, and the right subtree has 18 OGs. Thus, the total number of OGs becomes 30 and the number of hotspot regions with the most OGs is 3. Even though inserting TSV on T$_{a}$ has an advantage in skew balancing, inserting TSV on T$_{b}$ has a great performance in power consumption. In this case, the element locations are determined by the design constraints in CTS. Thus, the weight functions for determining the TSV location are generated by the constraints. The weight functions consist of the maximum degree of heating regions and thermal gap between the left and right subtree.

Fig. 7. The proposed thermal weight matrix application flow in 3D-CTS.
../../Resources/ieie/JSTS.2023.23.3.149/fig7.png
Fig. 8. Application of thermal weight matrix at each CTS step.
../../Resources/ieie/JSTS.2023.23.3.149/fig8.png
Fig. 9. Determining TSV location with thermal weight matrices and predicting the thermal effect with weight matrix.
../../Resources/ieie/JSTS.2023.23.3.149/fig9.png

At first, the maximum weight of OGs on an element type a of Y$_{a}$ is represented by Max(Y$_{a}$), and it is calculated by the number of multilayers. The matrices more than half have OG due to the matrix size obtained by the median cost of distance. Thus, overlapped level (OL) can be classified into three types, as described in Eq. (4).

(4)
../../Resources/ieie/JSTS.2023.23.3.149/eq4.png

The i is the overlapped level in function OL. The OL(Y$_{a}$,i) is the function of classifying weight by overlapped level in Y$_{a}$ and it returns the sum of weights. The L$_{max,a}$ is denoted the maximum level of OGs on Y$_{a}$. In Eq. (4), the function Max(Y$_{a}$) calculates the maximum value, it consists of three conditions according to the number of OGs on Y$_{a}$. Due to the matrix size, the number of grids on hot spot region has less than that of lower overlapped level. Also, the number of OGs is decreased by increasing overlapped level. When new elements are inserted, the threshold level is set to prevent hot region to be underestimated. Thus, the maximum level is restricted by the threshold level. Fig. 10 describes an example of overlapped condition in Max(Y). If there are no OGs (L$_{max,a}$ = 0), Max(Y$_{a}$) is set to default value 0 as represented in Fig. 10(a). If L$_{max,a}$ is 2, Max(Y$_{a}$) becomes the summation of weights on OGs as in Fig. 10(b). If L$_{max,a}$ is greater than 2, Max(Y$_{a}$) becomes the summation of weights on OGs as shown in Fig. 10(c) and (d) respectively. When more than two elements are inserted, the difference in L$_{max}$ among elements can lead to a faulty prediction due to the number of OGs. For example, as shown in Fig. 10(b) and (c), the maximum overlapped levels generated by the candidate Y$_{b}$ and Y$_{c}$, are 2 and 3 respectively. The max operations on Y$_{b}$ and Y$_{c}$ perform to calculate thermal weight with the summation for 24 and 8 grids in Fig. 10(b). The sum of weights is influenced by the number of grids, so this calculation can be insufficient to consider the hotspot region. Thus, L$_{max}$ needs to be fixed as the maximum overlapped level among all candidates for insertion. In this example, if L$_{max}$ is set to 3, Y$_{b}$ and Y$_{c}$ have no grids of maximum level and 8 grids of maximum level 3. For the other case, two candidates Y$_{c}$ and Y$_{d}$ are constructed as in the Fig. 10(c) and (d). The matrix Y$_{c}$ includes 8 grids of overlapped level 3. The matrix Y$_{d}$ contains 4 grids of overlapped level 3 and 4 grids of overlapped level 4. In this case, as mentioned above, the L$_{max}$is determined by 4. However, when the threshold level L$_{max}$ are set as 2, the maximum operations of Fig. 10(c) and (d) are performed on 8 grids.

Fig. 10. Overlapped conditions in Max(Y): (a) Max(Y$_{a}$) = 0; (b) Max(Y$_{b}$) = 0; (c) Max(Y$_{c}$) = 3; (d) Max(Y$_{d}$) = 4.
../../Resources/ieie/JSTS.2023.23.3.149/fig10.png
Fig. 11. The extraction of quadrant in weight matrix using angle of left/right subtree edge: (a) An obtuse angle of subtree edge; (b) An acute angle of subtree edges.
../../Resources/ieie/JSTS.2023.23.3.149/fig11.png

At second, the Eq. (5) considers the clock skew caused by unbalanced thermal variation at the edges of the subtree. The function of thermal gap Gap(Y$_{a}$) between left and right subtree represents the degree of skew balance that is the difference in thermal distribution at child nodes as shown in Eq. (5).

(5)
../../Resources/ieie/JSTS.2023.23.3.149/eq5.png

where v is defined as the weight located within the quadrants including subtree and neighbor. When a node is connected to other nodes by edges, the irrelevant area is excluded by the angle of subtree edges for accuracy and efficiency. The matrix Y$_{a}$ can be divided into quadrants, and the quadrants corresponding to left and right subtree are obtained as described in Fig. 11. If one of the subtree edges belongs to a quadrant, it has two adjacent quadrants which can affect its child nodes. For example, when the angle of subtree edges is obtuse as shown in Fig. 11(a), left subtree edge belongs to the 3$^{\mathrm{rd}}$ quadrant, and its adjacent quadrant is the 2$^{\mathrm{nd}}$ and the 4$^{\mathrm{th}}$ quadrant. Also, right subtree edge belongs to the 1$^{\mathrm{st}}$ quadrant, and its adjacent quadrant is the 2$^{\mathrm{nd}}$ and the 4$^{\mathrm{th}}$quadrant. On the contrary, when the angle of subtree edges is acute as shown in Fig. 11(b), both left and right subtree edges belong to the 3$^{\mathrm{rd}}$ quadrant, and their adjacent quadrants are same as the 2$^{\mathrm{nd}}$ and the 4$^{\mathrm{th}}$ quadrant. Thus, the 1$^{\mathrm{st}}$ quadrant is excluded in weights calculation because it includes less information related to subtree edges (irrelevant area). Therefore, the influence degree at each subtree is represented as summation of weights including its own and adjacent quadrants, and Gap(Y$_{a}$) becomes the thermal gap between two subtrees. Finally, the weight function W(Y$_{a}$) through Eqs. (4) and (5) is formulated for elements insertion as follows:

(6)
../../Resources/ieie/JSTS.2023.23.3.149/eq6.png

where w$_{max}$, and w$_{gap}$ are the weight of Max(Y$_{a}$), and Gap(Y$_{a}$). The ${\gamma}$ is the constant value. The W(Y) can be utilized to consider the thermal effect before elements are inserted in advance. Also, Eq. (6) can be reused in element insertion step for selecting candidate in detailed location.

2) Clock Skew Adjustment with Thermal Weight Matrix

The clock arrival time is measured by delay estimation on clock tree topology. The difference of clock arrival time needs to be reduced to improve circuit performance. Also, the clock can be skewed intentionally to resolve violations, it is called useful clock skew. When considering heating effect, the thermal weight matrix is adopted by the various clock skew adjustment techniques such as TSV/buffer insertion, and wire routing strategies. In general, TSV/buffer insertion is proceeded to determine geometrical location of TSV/buffer with considering skew balancing and power consumption by each path delay. Then, wire routing is performed with various constraints such as obstacle avoiding, skew balancing, and thermal distribution. For example, TSV positions are reallocated and buffers are inserted to control skew. The hotspot regions result in increasing resistance and consequently may cause delay degradation as shown in Fig. 8(c). Thus, the path delay is measured with the changed resistance applying thermal weight matrix Y. Thus, the multiplicative inverse of normalizing ratio ${\delta}$ is used for both the absolute temperature scale (K) and the change in temperature. The temperature effect according to the unit length of interconnect is represented by joules’ heating equation as follows:

(7)
../../Resources/ieie/JSTS.2023.23.3.149/eq7.png

where R is the interconnect resistance, I is the root mean square occurred by insertion. The heating effects can lead to increase current. The term I$^{2}$·R represents the power dissipation on the interconnects, and R$_{\theta }$ is the thermal impedance of the interconnect line to the chip. The resistance R of unit wire length can be expressed as linear relation with the wire width and height using the Eq. (2) and transformed Eq. (7) as follows:

(8)
../../Resources/ieie/JSTS.2023.23.3.149/eq8.png

where ${\rho}$$_{0}$, w$_{m}$, and t$_{m}$, are the unit length resistance at the reference temperature, width and height of wire, and ${\beta}$ is the temperature dependence weight of resistance (1/$^{\mathrm{o}}$C). The y$_{rf}$ is the average value in thermal weight matrix that belongs to a path segment for delay calculation. Also, the degree of thermal change from the initial temperature is represented by y$_{rf}$. The updated temperature is calculated by multiplication of T and y$_{rf}$. With R transformed by Eq. (8), the path delay can be estimated by delay formulation. After delay estimation, the clock skew optimization as TSV/buffer insertion is performed with Eq. (6). In the wire routing, the clock skew is sensitive to the wire length and routing path due to thermal variation as shown in Fig. 8(d). In general, the routing algorithms are processed by object function. When wire routing is performed, the grids belonged to wire should require additional weight for considering the thermal effect. Our method calculates the weight for each CTS element using Y$_{i}$ obtained by random forest in advance. In addition, the proposed method can be applied to entire CTS steps by weight function.

IV. EXPERIMENTAL RESULTS

The verification for our method is made up of two parts: performance of learning process and CTS performance guarantee. Our method has been verified on 45 nm process technology parameters based on a predictive technology model (PTM) [19]. For implementation, C/C++, Python, HSPICE, and MATLAB on quad core 3.1 GHz Linux machine was utilized. The nominal supply voltage and the initial temperature is set to 1.1 V and randomness from -25$^{\circ}$C to 125$^{\circ}$C respectively. A chip with the size of 6cm$^{2}$ is divided into a uniform grid to obtain the distributed temperature map by Hotspot 6.0 thermal simulator [22]. The technology parameters of wire have unit resistance r$_{0}$ = 0.1 ${\Omega}$/${\mu}$m and unit capacitance $c_{0}$ = 0.2 fF/${\mu}$m. The TSV parameters have unit resistance r$_{v}$ = 0.035 ${\Omega}$, and c$_{0}$ = 15.48 fF. The parameters related to buffer have input capacitance c$_{b}$ = 9 fF, intrinsic delay t$_{d}$ = 15 ps, and output driving resistance r$_{b}$ = 66 ${\Omega}$. For a fair comparison, the parameter value for NN-3D [7] and the via bound are set to ${\alpha}$ = 0.1 in [1] and minimum value. The indicators of error rate are adopted with both mean square error (MSE) and mean absolute error (MAE) which can measure the amount of error in statistical prediction models as following Eqs. (9) and (10):

(9)
../../Resources/ieie/JSTS.2023.23.3.149/eq9.png
(10)
../../Resources/ieie/JSTS.2023.23.3.149/eq10.png

where n is the number of the test cases, $\hat{Y}_{i}$ and Y$_{i}$ are the vectors of predicted and observed values, y$_{i}$ and x$_{i}$ are the predicted and actual values.

For detailed implementation with random forest, we set the parameters of the proposed method such as thermal variation parameters, tree width and depth. The values of tree width and depth are calculated by ILP at width range of 10 to 500 and depth range of 3 to 100. We selected the width and depth parameters as 100 and 20 obtained from the lowest MAE. The values of tree width and depth are decreased at each 300 and 50, however, each value shared almost no decrease at over 500 and 100.

The effectiveness of the proposed learning method is compared with other learning algorithms in accuracy and runtime. The comparison group consists of regression and boosting algorithms. The regression algorithms are support vector machine (SVM), least absolute shrinkage and selection operator (Lasso), ridge regression, and multivariate adaptive regression splines (MARS). The boosting algorithms are extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), and light gradient boosting machine (LightGBM). In accuracy, each learning algorithm is evaluated by MSE and MAE according to the number of learning instances as represented in Table 2. At the 50 instances, MARS has the highest MSE and Ridge represents the highest MAE. At the 1000 instance, AdaBoost has both the highest MSE and MAE respectively. Furthermore, our method maintains both MSE and MAE below 1 in the entire test case. In ratio, the average of SVM which is one of the regressions, are set 1 as the reference point for comparing our method with other algorithms in accuracy.

Table 3 shows the runtime comparing our method with seven other models. For fair comparison, two conditions are set to this experiment. First, we increase the number of test cases until the MAE is close to 1 in accuracy. Second, we performed 1000 times iteration for all models to revise error rate due to unexpected test values and calculated the average values of measurements. A training set was constructed about 0.31 seconds. The number of training set affects both the runtime of each model and the preparing time for training set. Ridge algorithm is performed until 700 training sets because MAE converges to 2 as the number of test cases increases. Our method is sufficient to use only 50 training sets for precise prediction of thermal effect. In runtime comparison, the other models except for XGBoost are executed within 1 second. Lasso regression has the fastest execution time as 0.039 seconds. However, it has the third highest value in MSE and MAE. The proposed method which applies to boosting method is implemented within 0.381 seconds. In learning comparison, our method improved prediction accuracy and performed in outstanding runtime.

Table 2. The accuracy comparison with other ML algorithms using MSE/MAE

# of instances

MSE / MAE

SVM

Lasso

Ridge

MARS

XGBoost

AdaBoost

Our

50

11.779 / 2.213

8.405 / 2.328

16.286 / 3.137

20.411 / 3.060

8.756 / 2.584

3.177 / 1.196

0.004 / 0.059

100

9.891 / 2.550

6.096 1.618

21.318 3.077

1.204 / 0.820

7.272 / 1.958

3.043 / 1.472

0.023 / 0.084

300

10.060 / 1.572

7.481 / 2.039

10.351 / 2.382

10.110 / 2.140

9.542 / 1.546

9.940 / 1.996

0.013 / 0.070

500

4.739 / 1.029

7.208 / 2.171

16.026 / 2.535

6.269 / 1.679

4.944 / 1.540

11.193 / 2.594

0.011 / 0.071

700

16.192 / 2.086

8.513 / 2.154

8.433 / 2.182

10.211 / 2.287

7.423 / 1.713

7.034 / 2.036

0.016 / 0.080

1000

8.114 / 1.415

9.864 / 2.233

10.925 / 2.295

7.929 / 1.973

7.248 / 1.868

16.511 / 3.200

0.010 / 0.072

Avg.

10.129 / 1.811

7.928 / 2.091

13.890 / 2.601

9.356 / 1.993

7.531 / 1.868

8.483 / 2.083

0.015 / 0.073

Ratio

1.000 / 1.000

0.783 / 1.155

1.371 / 1.437

0.924 / 1.101

0.743 / 1.032

0.837 / 1.150

0.001 / 0.040

Table 3. The runtime comparison with other ML algorithms

Model type

Verifying 1000 times when MAE = 1

# of training set

MSE

MAE

Runtime(s)

SVM

500

3.398

1.021

0.568

Lasso

100

5.138

1.858

0.039

Ridge

700

10.707

2.035

0.092

MARS

100

1.623

1.042

0.241

XGBoost

500

8.076

1.653

1.168

AdaBoost

100

5.299

1.832

0.308

LightGBM

500

9.888

1.884

0.640

Our

50

0.065

0.197

0.381

Table 4. The skew comparison with other CTS algorithms

Bench-mark

# of sinks

Skew (ps)

Skew reduction comparison with our method (%)

Without considering thermal [7][11]

With considering thermal [18]

Considering thermal with ML (our)

[7][11]

[18]

r1

267

93.2470

75.3259

79.1123

15.16%

-5.03%

r2

598

111.1972

81.5505

77.9883

29.86%

4.37%

f11

121

46.3255

34.0915

33.5218

27.64%

1.67%

f21

117

50.8296

34.1723

35.2710

30.61%

-3.22%

f31

273

65.3832

50.7124

47.5306

27.30%

6.27%

f32

190

52.2513

38.6632

38.7558

25.83%

-0.24%

Avg.

69.8723

52.4193

52.0300

25.54%

0.74%

Ratio

1

0.7502

0.74464

Next, we construct clock tree applying existing methods, then the clock skew and construction time of clock tree are measured respectively. The proposed method is evaluated on four ISPD’09 benchmark circuits f11, f21, f31, f32 [20] and IBM benchmark circuits r1, r2 [21]. For estimating 3D-IC, 2D-IC in ISPD/IBM benchmark are transformed into two-layered 3D structure. The clock sinks in benchmark circuits were placed by randomly partitioning into two-layers. The results in Table 4 and 5 are average values with 1000 times. The clock skew of proposed method is compared with that of other CTS algorithms as described in Table 4.

In skew, the clock tree without considering thermal is constructed by combining [7] and [11]. For fair comparison, our method is applied to consider thermal effect with [7] and [11]. Our method reduced about 25.54% of clock skew compared to the method without thermal consideration and the [18] showed skew reduction as 24.98%. The average skew reduction rate of proposed method was represented similar to [18], however, our method showed better performance when the number of sinks are increased. Also, the degree of skew reduction in our method depends on CTS algorithms.

Table 5. The runtime comparison with other CTS algorithms

Bench-mark

# of sinks

Runtime (s)

Runtime reduction compared to our method (%)

w/o considering thermal [7][11]

Thermal profiling iteration w/o ML

Thermal profiling at each stage w/o ML

Considering thermal with ML (our)

r1

267

48.0532

222.7577

481.4068

52.1281

-8.48%

76.60%

89.17%

r2

598

826.4700

6098.4006

3317.2875

863.8535

-4.52%

85.83%

73.96%

f11

121

5.1906

27.2515

278.7895

7.2308

-39.31%

73.47%

97.41%

f21

117

4.4604

24.2528

272.6368

6.4590

-44.81%

73.37%

97.63%

f31

273

65.4941

298.0960

501.0753

70.4884

-7.63%

76.35%

85.93%

f32

190

16.8741

79.3434

374.4031

19.4463

-15.24%

75.49%

94.81%

Avg.

161.0904

1125.0170

870.9332

169.9344

-5.49%

84.89%

80.49%

Ratio

1.00

6.98

5.41

1.05

To represent the runtime of the proposed method, we compared it with the CTS methods considering thermal using. One of the comparison targets is implemented to iterate thermal information profiling until obtaining optimal solution. The other comparison target is implemented to profile thermal data at each CTS stage. We constructed the clock trees with combining [7] and [11] as described in Table 5. The clock tree construction is performed until skew reduces below 3% compared to our method. The proposed method considering thermal with applying ML is increased as about 5% on average than the method without considering thermal effect. All methods in runtime are increased according to the number of clock sinks. Especially, in r2, the iteration method is increased exponentially as the number of clock sinks. However, CTS with thermal profiling at each stage is increased proportionally according to the number of clock sinks, thus, it has a large execution time due to profiling steps. Our method requires an average runtime about 80% less than other methods with thermal effect consideration.

In summary, proposed method improves the accuracy, and reduces runtime. Our random forest-based thermal effect prediction for 3D-CTS shows almost the similar performance compared with existing method in skew when applied to existing CTS algorithm. In addition, our method can be applied to the other CTS algorithms with less additional runtime.

V. CONCLUSIONS

This paper presented a method of random forest-based thermal effect prediction for 3D-CTS. First, we proposed the algorithm of training set generation with the thermal weight matrix. Second, the method of thermal effect prediction with random forest is applied to 3D-CTS. Finally, we reduced the runtime with accurate prediction in the thermal variation. This paper shows significant improvement accuracy for thermal effect prediction, while maintaining the design purposes. Runtime is decreased compared to the existing method with the thermal profiling. Moreover, the proposed method shows almost the similar performance in skew. In future work, the complex impact of variations can cause other critical issues in 3D-IC, and we will handle challenge problems.

ACKNOWLEDGMENTS

This work was supported by Samsung Electronics Co., Ltd. (IO201210-07940-01). Also, the EDA tool was supported by the IC Design Education Center (IDEC), Korea.

References

1 
D. K. Oh, et al, "Thermal-aware 3D Symmetrical Buffered Clock Tree Synthesis," in ACM Trans. on Design Automation of Electronic Systems (TODAES), vol. 24, no. 3, pp. 1-22, May., 2019.DOI
2 
F. W. Chen, and T. Hwang. "Clock-tree synthesis with methodology of reuse in 3D-IC," in ACM Journal on Emerging Tech. in Computing Systems (JETC), vol. 10, no. 3, pp. 1-22, April. 2014.DOI
3 
Yang Fan, et al. "Time-efficient and TSV-aware 3D gated clock tree synthesis based on self-tuning spectral clustering," in IEEE 60th Int. MWSCAS. 2017, pp. 1200-1203.DOI
4 
Todri, Aida, et al. "A study of tapered 3-D TSVs for power and thermal integrity," in IEEE Trans. on VLSI Sys., vol. 21, no. 2, pp. 306-319, Feb. 2012.DOI
5 
Zhou, Quan, et al. "An accurate detailed routing routability prediction model in placement," in 2015 6th ASQED. IEEE, pp. 119-122.DOI
6 
Yu, Cunxi, et al, "Painting on placement: Forecasting routing congestion using conditional generative adversarial nets," in Proc. of the 56th Annual Design Automation Conf. 2019. pp. 1-6.DOI
7 
Zhao Xin, et al, "Low-power and reliable clock network design for through-silicon via (TSV) based 3D ICs," in IEEE Trans. on Components, Packaging and Manufacturing Technology. vol. 1, no. 2, pp. 247-259, Dec. 2010.DOI
8 
Tak-Yung Kim, and Taewhan Kim. "Clock tree synthesis for TSV-based 3D IC designs," in ACM Trans. on Design Automation of Electronic Systems (TODAES), vol. 16, no. 4, pp. 1-21, Oct. 2011.DOI
9 
Minsik Cho, et al, "TACO: Temperature aware clock-tree optimization," in IEEE/ACM Int.Conf. on Computer-Aided Design, 2005, pp. 582-587.DOI
10 
Ni, Min, et al, "Self heating-aware optimal wire sizing under Elmore delay model," in 2007 DATE, Nice, France, April. 16-20, 2007.DOI
11 
Chao Ting-Hai, et al. "Zero skew clock routing with minimum wirelength," in IEEE Transaction on Circuits and Systems II: Analog and Digital Signal Proc., vol. 39, no. 11, pp. 799-814, Nov. 1992.DOI
12 
Elmore, William C. "The transient response of damped linear networks with particular regard to wideband amplifiers," in Journal of applied physics, vol. 19, no. 1, pp. 55-63, April. 1948.DOI
13 
A. B. Kahng, et al, "High-dimensional meta-modeling for prediction of clock tree synthesis outcomes," in 2013 ACM/IEEE Int.Workshop on System Level Interconnect Prediction. IEEE, pp. 1-7.DOI
14 
Lu, Yi-Chen, et al. "GAN-CTS: A generative adversarial framework for clock tree prediction and optimization," in 2019 IEEE/ACM Int. Conf. on Computer-Aided Design (ICCAD), pp. 1-8.DOI
15 
Liao, et al. "A deep reinforcement learning approach for global routing." Journal of Mechanical Design, vol 142, no. 6, pp. 1-12, Jun. 2020.DOI
16 
Zhang, Xing, et al, "Effective thermal conductivity and thermal diffusivity of nanofluids containing spherical and cylindrical nanoparticles," in Experimental Thermal and Fluid Science, vol. 31, no. 6, pp. 593-599, May. 2007.DOI
17 
Ho, T. “Random Decision Forests,” in Proc. of the Third International Conf. on Document Analysis and Recognition, Aug. 1995, pp. 278-282.DOI
18 
Minz, Jacob, et al, "Buffered clock tree synthesis for 3D ICs under thermal variations," in ASP-DAC Conf., March. 2008, pp. 504-509.DOI
19 
(2022). Predictive Technology Model (PTM). [Online]. Available : http://ptm.asu.edu.URL
20 
(2009). International Technology Roadmap for Semiconductors (ISPD) Contest. [Online]. Available : http://ispd.cc/contests/09/ispd09cts.html.URL
21 
(2000). IBM Benchmark. [Online]. Available : http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/BST.URL
22 
(2021). Hotspot 6.0. [Online]. Available : http://lava.cs.virginia.edu/HotSpot/URL

Author

Myeongwoo Jin
../../Resources/ieie/JSTS.2023.23.3.149/au1.png

Myeongwoo Jin received the B.S. degree and Master degree in Computer Science and Engineering from Sogang University, Korea in 2012 and 2014. He is currently pursuing a Ph.D degree in Computer Science and Engineering at Sogang University, Korea. His research interests are variation-aware timing analysis, design for reliability enhancement, interconnect variation, 3D-IC design, clock tree and thermal variation.

Deokkeun Oh
../../Resources/ieie/JSTS.2023.23.3.149/au2.png

Deokkeun Oh received the B.S. degree, Master degree, and Ph.D degree in Computer Science and Engineering from Sogang University, Korea in 2012, 2014, and 2020, respectively. He is with Samsung Electronics, Hwaseong-si, in 2020, where he is currently a CAE Engineer. His research interests are variation-aware timing analysis, interconnect and process variation, Monte-Carlo analysis, design for reliability enhancement, clock tree, and low power.

Juho Kim
../../Resources/ieie/JSTS.2023.23.3.149/au3.png

Juho Kim received B.S degree and Ph.D degree in Computer and Information Science from University of Minnesota in 1987 and 1995, respectively. After getting Ph.D degree, he worked as a senior member of technical staff at Cadence Design System until 1997. Professor Kim joined the department of computer science and engineering in Sogang University, Seoul, Korea in 1997, and he was a department chair from 2005 to 2008. His research interests are variation-aware timing analysis, and low power design.