Mobile QR Code QR CODE

  1. (School of Electronic Engineering, Kumoh National Institute of Technology, Korea)



TSV, deep learning, object detection, YOLO, inspection system

I. INTRODUCTION

Through-Silicon Via (TSV) is a technology that creates direct connections between different layers or chips in a 3D integrated circuit by filling vertical holes through a wafer with metal. The technology offers advantages over conventional wire bonding methods, including smaller package size, reduced power consumption, and higher integration density.

TSVs have been shown to play a critical role in a variety of applications, including high-performance computing, memory consolidation, and heterogeneous integration. By increasing the data transfer bandwidth between stacked layers, TSVs enable faster and more efficient data processing [1,2].

However, various defects can occur during the etching processes for TSV formation, which can critically degrade the electrical and mechanical reliability of semiconductor devices and reduce overall manufacturing yield. In particular, defects that arise after dry etching can have severe consequences on subsequent processing steps, making it essential to understand their root causes and to establish precise detection techniques.

One of the key defects in TSV etching is etch non-uniformity, which is primarily caused by nonuniform plasma distribution or wafer-level loading effects. This leads to variations in TSV depth and diameter, resulting in discrepancies in electrical characteristics, signal distortion, and capacitance imbalance across the TSV array [3].

Additionally, degradation of the etching mask or contamination from micro-particles during the process can lead to micro-masking, forming localized regions of etch inhibition. These defects result in rough sidewalls, irregular profiles, and uneven etching, all of which can contribute to void formation or delamination during subsequent metal filling processes [4].

Etch residues and structural non-uniformities also increase electrical resistance, cause signal delay, and may lead to leakage currents, ultimately compromising the performance and reliability of TSV-integrated high-density semiconductor devices.

Therefore, defects such as etch non-uniformity, etch residue, and micro-masking possess well-defined causes and mechanisms. Accurate and reproducible detection technologies are indispensable to mitigate the adverse effects of these defects.

Existing approaches such as scanning electron microscopy (SEM) and X-ray inspection [2] have been widely employed to detect TSV defects. While these top-view inspection methods provide high spatial resolution, they have several limitations that restrict their use in high-volume manufacturing. SEM is a destructive technique that requires long inspection times, resulting in low throughput. X-ray imaging lacks the resolution necessary to detect nanoscale polymer residues or subtle morphological anomalies, and its high sensitivity to vibration reduces its suitability for in-line inspection. Consequently, both methods are costly and impractical for real-time defect detection.

To overcome these limitations, a real-time TSV 3D shape defect inspection system is proposed by utilizing deep learning-based object detection models. In this work, we consider YOLOv8 and YOLOv10, which are the latest network architectures among the YOLO network family. YOLO family networks are constructed in a one-stage manner, and thus they have the advantage of fast inference time. Hence, those networks are suitable for building a real-time TSV 3D shape defect inspection system.

As a result, the proposed DHM-based method, which utilizes a real-time deep learning-based object detection method, offers several advantages, including being non-destructive, robust to mechanical vibrations, and enabling fast image acquisition, providing practical advantages for real-time in-line implementation in semiconductor manufacturing processes.

Even though all the networks under consideration belong to the YOLO family, the network architecture of each individual network exhibits unique characteristics. Specifically, the YOLOv8 model utilizes an anchor-free detection structure, while the YOLOv10 model employs a dual-head structure and a non-maximum suppression (NMS)-free learning structure. The differences can result in different inference and detection performance. Through extensive evaluations, we provide a guideline of which network is appropriate for inference time or detection performance.

In addition, to collect real datasets, we utilize Digital Holographic Microscopy (DHM), which is a holography-based interferometric measurement device that enables high-resolution 3D data acquisition. DHM is used to measure TSV-patterned wafer samples and generate 3D Point Cloud Data (PCD) accurately capturing the geometry of TSV structures. Our data collection scheme allows us to obtain real TSV defect patterns, which ensures the reliability and practical applicability of the trained network performance.

II. REAL-TIME TSV 3D SHAPE DEFECT INSPECTION SYSTEM

Fig. 1 illustrates the overall workflow representative inference results of the proposed TSV 3D shape defect inspection system. At the center of the figure, a schematic diagram presents the algorithmic flow of the system, starting with wafer loading, stage positioning, and 3D shape measurement using DHM. The acquired 3D image is processed by a YOLO-based object detection module that classifies each TSV pattern and identifies defects in real-time. The inspection results are then evaluated by the main controller, which determines whether to continue or terminate the inspection process.

On the left side of the schematic, a photograph of the actual system setup is provided, showing the physical configuration composed of the optical inspection unit, precision stage, and control modules. On the right side, a screen-shot of the inspection software presents a visualization interface for monitoring TSV defect results and the corresponding PCD.

Conventional TSV defect inspection methods have relied on X-ray or SEM imaging, which provide high-resolution analysis but suffer from slow inspection speeds and are not suitable for real-time, large-area analysis. In contrast, the proposed system integrates DHM and deep learning-based object detection to enable fast and accurate 3D defect inspection of TSV, offering a significant advantage over traditional approaches. To address the slow inference time problem of existing inspection systems based on SEM and X-ray inspection, we propose the adoption of deep learning-based object detection schemes for TSV 3D shape defect inspection system as in Fig. 1. Over the past decade, various deep learning networks have been developed for object detection, among which the YOLO family has been particularly optimized for fast inference. In particular, YOLOv8 and YOLOv10 are examined in detail through extensive evaluations.

The objective of this study is to investigate whether deep learning-based schemes that provide rapid inference times can overcome the slow inspection time problem of existing methods. Through extensive evaluations, the performance of YOLOv8 and YOLOv10 models is compared, which are trained with our real TSV pattern dataset obtained from real 8-inch silicon wafers. By analyzing the evaluation results, a guideline will be provided on which network is appropriate for inference time or detection performance.

In the following subsection, we explain in detail the architectures of the YOLO family of networks and the differences between YOLOv8 and YOLOv10 that are considered for the TSV 3D shape defect inspection system.

Fig. 1. Overview of the proposed TSV 3D shape defect inspection system.

../../Resources/ieie/JSTS.2025.25.6.645/fig1.png

1. YOLO Networks for TSV 3D Defect Recognition

Following the introduction of YOLOv1 in 2016, the YOLO family of networks has undergone continuous development and remains a prevalent object detection model in contemporary research [5-11]. The YOLO family of networks is structured in a one-stage manner, i.e., it simultaneously detects and classifies the location and type of objects in input images. In contrast, the two-stage object detection network recognizes candidate regions of objects and classifies the types of the proposed object regions. Therefore, due to the one-stage structure, the YOLO family networks have the advantage of fast inference times. This study explores the potential of the YOLO family of networks to achieve rapid inference, a key advantage of these networks. To this end, two recent models within the YOLO series are examined: YOLOv8 and YOLOv10. Distinguishing features of these networks include an anchor-free detection structure and an NMS-free training and dual-head structure, respectively. These different features can result in different inference times and detection performance.

Figs. 2 and 3 presents the architectures of YOLOv8 and YOLOv10. The architecture of these networks comprises three distinct components: the Backbone, Neck, and Head. The primary function of the Backbone and Neck components is to extract features from input images, while the Head component is responsible for detecting the location and type of the object. The Backbone is a CNN based network responsible for extracting key features from input images. The Neck combines feature maps extracted by the Backbone to enhance the detection of small objects and multi-scale objects while serving as a bridge between the Backbone and the Head. Lastly, the Head predicts the locations and classes of objects based on the feature maps received from the Neck [12].

Fig. 2. YOLOv10 architecture.

../../Resources/ieie/JSTS.2025.25.6.645/fig2.png

Fig. 3. YOLOv8 architecture.

../../Resources/ieie/JSTS.2025.25.6.645/fig3.png

Even though they have a similar overall architecture, the two networks adopt different structures in detail. As in [12], YOLOv8 employs cross stage partial darknet (CSP-Darknet) as its Backbone and path aggregation network (PANet) as its Neck. In addition, the model adopts an anchor-free detection mechanism for the Head, which eliminates the need for anchor boxes, thereby reducing inference time. Additionally, YOLOv8 utilizes the NMS to efficiently eliminate duplicate bounding boxes, enabling the detection of even small objects. It operates using a single-head classification mechanism, where the anchor-free detection Head outputs the class probabilities for each bounding box [13]. The final class is determined using the softmax function, allowing the model to classify objects of various sizes with high accuracy, even without anchor boxes. In contrast, YOLOv10 utilizes an improved version of cross stage partial network (CSPNet) as its Backbone while it adopts a similar PANet as its Neck. For the Head, YOLOv10 incorporates a dual-head structure that eliminates the need for NMS-free, which is different from YOLOv8, simplifies the post-processing pipeline, and improve both accuracy and speed. In Fig. 3, the dual-head structure combines one-to-many and one-to-one strategies during training, enhancing classification precision. The primary goal of this architect-ture is to optimize the balance between speed and accuracy by removing the NMS-free process and simplifying the overall post-processing steps [11].

Due to the differences in the Head part, there are some differences in loss functions. YOLOv8 uses a multi-task loss function composed of bounding box loss, class probability loss, and objectness loss [12]. In contrast, YOLOv10 employs box regression loss because it utilizes NMS-free training. The box regression loss can improve the accuracy of prediction of bounding boxes. The NMS-free training with box regression loss allows YOLOv10 not to conduct additional NMS-free process. Therefore, this enables faster inference time as well as efficient training. This enables YOLOv10 to achieve a superior balance between processing speed and accuracy in comparison to YOLOv8 [11].

III. DATASET: TSV PCD

1. PCD Data Collection in Real Environment

To collect TSV PCD for training YOLO networks, an 8-inch wafer with various TSV patterns was prepared, as shown in Fig. 4. The figure displays the wafer along with sample measurement results obtained using the DHM. The dataset consists of 1,000 PCD images with a resolution of 4880 × 3720, each covering a field of view (FOV) of 0.585 × 0.446 mm. As depicted in the top right of Fig. 4, each image includes more than 20 TSV holes within this FOV.

Fig. 4. An 8-inch wafer with defective TSV training.

../../Resources/ieie/JSTS.2025.25.6.645/fig4.png

2. Data Collection, Label, and Data Augmentation

For training and validation, different TSV patterns were generated on 8-inch silicon wafers. The generated TSV patterns are categorized as a base pattern and variations of it. The base pattern is a hole with a diameter of 50μm and a depth of 0.5 μm. Variations of the base pattern are created by varying the diameter and changing the shape of the base pattern.

The variations of the TSV pattern are categorized into five distinct classes, consisting of one Pass class and four Fail classes, as shown in Fig. 5. The Pass pattern includes qualified holes with a diameter deviation of 10% or less and no visible defects. The Fail 1 pattern refers to holes with a diameter more than 10% larger than the standard pattern. The Fail 2 pattern indicates holes with a diameter more than 10% smaller than the standard. The Fail 3 pattern includes shapes that deviate from the standard circular form, such as polygons or ellipses. Finally, the Fail 4 pattern consists of partially imaged holes that are not fully captured within the camera’s FOV.

After collecting 1,000 images via DHM, we increase the data volume by applying data augmentation methods such as vertical flipping, horizontal flipping or a combination of the two. The augmentation results in 4,000 augmented images that are used for training and validation. From the 4,000 images, the five patterns are extracted and the amount of data per class is as follows: 27,404 for Pass, 14,308 for Fail 1, 9,556 for Fail 2, 32,768 for Fail 3, and 27,636 for Fail 4.

Fig. 5. Classification of TSV PCD.

../../Resources/ieie/JSTS.2025.25.6.645/fig5.png

These defect types are distinguished by their geometric irregularities and are closely associated with the degradation of electrical reliability. Furthermore, they are considered to be key failure mechanisms and have been reported in existing studies on the reliability of TSVs [1].

Specifically, voids or incomplete filling can increase TSV resistance, leading to signal delay or attenuation [14]. Oxide pinholes along the sidewall can create leakage paths or shorts [15]. Excessive diameter variation or non-circular geometry can induce non-uniform current density and local stress, accelerating electromigration and reliability failures [16].

In the context of TSV manufacturing, these structural anomalies are well known to function as major failure mechanisms, directly impacting production yield and long-term device reliability [1].

For this study, we fabricated a test wafer containing a variety of representative TSV defects (e.g. diameter deviation, shape distortion and void-inducing etch residue). All PCDs were collected from this single wafer to verify the proof-of-concept of the proposed system. In order to further enhance generalizability, future studies will collect additional data from multiple wafers fabricated under realistic manufacturing conditions.

IV. EVALUATION RESULTS

For the training, 80% of the total 4,000 images were used and 20% were utilized as validation data. The input data was resized to 640 x 488 to match the first layer of the networks. For 300 training epochs, networks are optimized by using Adam optimizer with a learning rate of 1e−4.

1. Performance Metrics

For performance evaluation, several performance metrics are used: processing time, precision, recall, F1 score, and Fβ score of processing time and detection performance.

Processing time refers to the average inference time required by each model to analyze a single image or frame, indicating the suitability of the model for real-time applications.

Precision refers to the ratio of correctly predicted defect instances to the total number of predicted defect instances. It reflects how accurately the model identifies true defects among all its positive predictions.

Recall is the ratio of correctly predicted defect instances to the total number of actual defect instances. It indicates the model’s ability to detect as many actual defects as possible.

F1 score is the harmonic mean of precision and recall, providing a balanced measure when precision and recall are equally important. By introducing a weighting factor β to F1 score, Fβ score is introduced, which can control the relative importance of recall over precision [17]. The performance metric is defined as follows.

(1)
F β = ( 1 + β 2 ) precision × recall precision + ( β 2 × recall )

This metric is particularly useful when prioritizing recall (i.e., detecting all true defects) is more important than precision, such as in safety-critical defect inspection tasks. By adjusting β, more emphasis can be placed on either detection recall or precision, depending on the system requirements.

2. Performance Analysis

In actual defect inspection tasks, Recall is a critical performance metric, as it reflects the model’s ability to detect all possible defective cases without omission. Therefore, as shown in Table 1, we compared the recall values achieved by each YOLO model. All models demonstrated recall values exceeding 0.99, indicating high sensitivity across variants. Based on these results, we further analyzed each model at its highest Recall point by comparing additional performance metrics, including processing time, precision, F1 score, and Fβ scores, to evaluate the trade-off between detection accuracy and inference efficiency.

Table 1. Highest recall achieved by each YOLO model.

Model V10-n V8-n V10-s V8-s V10-m V10-b V10-l V8-m V10-x V8-l V8-x
Recall 0.99203 0.99569 0.99917 0.99914 0.99922 0.99962 0.99923 0.99996 0.99991 0.99995 0.99989

Fig. 6 presents the evaluation of YOLOv8 and YOLOv10 models. Figs. 6(a)-6(c) show the change in processing time, the best precision at the highest recall and the best F1 score at the highest mean average precision (mAP) as the number of network parameters is varied, respectively.

Fig. 6(a) shows that the processing time increases as the number of parameters increases. This is natural because more parameters require more computational operations. For a small processing time, the number of network parameters should be reduced. Therefore, YOLOv10-n achieves the fastest processing time of 0.18601 seconds. In Fig. 6(b) and Fig. 6(c), the precision and F1 score improve as the number of parameters increases. This is because more parameters allow more rich features to be extracted, thus improving accuracy. As a result, YOLOv8-l achieves the highest precision and F1 score of 0.99997 and 0.99989, respectively.

Fig. 6. (a) Parameters vs processing time, (b) parameters vs precision, and c) parameters vs F1 score.

../../Resources/ieie/JSTS.2025.25.6.645/fig6.png

In Fig. 7, we evaluate YOLOv8 and YOLOv10 by considering both processing time and detection performance, we consider the performance metric in Eq. (1). This metric is a harmonic average of processing time and detection performance, and thus the closer the value of this index is to 1, the shorter the processing time and the higher the detection performance. Therefore, to select a network with good processing time and detection performance, a network with an index close to 1 is selected.

In Fig. 7, the comparison of Fβ scores shows that the YOLOv10-n model achieved the highest score of 0.92565.

As a result, through extensive evaluations, we confirm that YOLOv10-n should be used when a TSV 3D shape inspection system requires fast inspection time or a combination of good inference time and good detection accuracy while YOLOv8-l is recommended for the best detection performance.

In summary, through comprehensive experiments, we confirmed that YOLOv10-n is the most suitable model when the inspection system prioritizes fast inference time or a balance between inference speed and detection accuracy. On the other hand, YOLOv8-l demonstrates superior precision and F1 score, making it more appropriate in cases where high detection accuracy is critical. Therefore, the selection of the optimal model should be guided by the specific performance requirements of the target application.

Fig. 7. Parameter vs Fβ score.

../../Resources/ieie/JSTS.2025.25.6.645/fig7.png

3. Inference Results of the Proposed TSV 3D Shape Inspection System

Fig. 8 shows the detection results, where each TSV hole is localized with a black bounding box and classified into one of four categories: Pass, Fail1, Fail2, or Fail3. The class label and corresponding confidence score are displayed as text above each box. The confidence score reflects both the class probability and localization accuracy of the detected region.

As in Fig 8, most of the detected objects have a confidence score above 0.98, demonstrating the model’s robust capability in identifying even subtle differences between defective and non-defective TSV patterns. These results validate the effectiveness of the proposed system for accurate and real-time TSV PCD inspection in practical semiconductor manufacturing environments.

Fig. 8. Inference result of the proposed TSV 3D shape defect inspection system.

../../Resources/ieie/JSTS.2025.25.6.645/fig8.png

V. CONCLUSIONS AND FUTURE WORKS

In this study, we proposed a real-time TSV PCD inspection system utilizing deep learning-based object detection models. By employing YOLOv8 and YOLOv10 architectures, we aimed to overcome the limitations of conventional inspection methods such as high cost, slow processing speed, and destructive nature. To enable practical and reliable training, we constructed a high-quality dataset consisting of real PCD from 8-inch silicon wafers using DHM.

Extensive experiments were conducted to evaluate the detection performance and inference speed of each model using metrics such as precision, F1 score, and Fβ score. The results showed that YOLOv8-l achieved the highest detection accuracy, while YOLOv10-n provided the fastest processing speed and the best trade-off between speed and accuracy based on Fβ.

Based on these findings, our research is expected to contribute to future advancements in 3D semiconductor defect inspection by demonstrating the feasibility of real-time, non-destructive, and high-precision inspection using lightweight deep learning architectures. Furthermore, the proposed framework and evaluation results will be of great help in supporting the development and deployment of AI-based inspection systems in practical semiconductor manufacturing environments.

For future work, we plan to collect larger datasets from diverse wafers fabricated under real manufacturing conditions to improve the reliability of the proposed system. In addition, we will develop a hybrid inspection strategy for detecting defects on TSV inner sidewalls, thereby addressing the limitations of top-view imaging and enhancing the robustness of the inspection framework.

ACKNOWLEDGEMENTS

This work was supported by the RISE project of Kumoh National Institute of Technology.

References

1 
Wang J., Duan F., Lv Z., Chen S., Yang X., Chen H., Liu J., 2023, A short review of through-silicon via (TSV) interconnects: Metrology and analysis, Applied Sciences, Vol. 13, No. 14, pp. 8301DOI
2 
Shang H., Sun S., 2017, Three-dimensional integrated circuit (3D IC) key technology: Through-silicon via (TSV), Nano Research, Vol. 12, No. 6, pp. 1831-1840DOI
3 
Richter H., Pfitzner L., Pfeffer M., Bauer A., Siegert J., Bodner T., 2016, Advanced detection method for polymer residues on semiconductor substrates: 3D/TSV/interposer: Through silicon via and packaging, Proc. of the IEEE Advanced Semiconductor Manufacturing Conference (ASMC), pp. 472-474DOI
4 
Kim S., Lee J., Hwang Y., Park Y., Lee J., 2016, Integrated clean for TSV: Comparison between dry process and wet processes and their electrical qualification, Proc. of the IEEE Electronic Packaging Technology Conference (EPTC), pp. 311-314DOI
5 
Redmon J., Divvala S., Girshick R., Farhadi A., 2016, You only look once: Unified, real-time object detection, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779-788DOI
6 
Redmon J., Farhadi A., 2017, YOLO9000: Better, faster, stronger, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263-7271DOI
7 
Redmon J., Farhadi A., 2018, YOLOv3: An incremental improvement, arXiv preprint arXiv:1804.02767DOI
8 
Bochkovskiy A., Wang C.-Y., Liao H.-Y. M., 2020, YOLOv4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934DOI
9 
Wang C.-Y., Bochkovskiy A., Liao H.-Y. M., 2022, YOLOv6: A single-stage object detection framework for industrial applications, arXiv preprint arXiv:2209.02976DOI
10 
Wang C.-Y., Bochkovskiy A., Liao H.-Y. M., 2022, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, arXiv preprint arXiv:2207.02696DOI
11 
Wang A., Chen H., Liu L., Chen K., Lin Z., Han J., Ding G., 2024, YOLOv10: Real-time end-to-end object detection, arXiv preprint arXiv:2405.14458DOI
12 
Smith A., 2023, Real-time flying object detection with YOLOv8, arXiv preprint arXiv:2305.09972DOI
13 
Hussain M., 2024, YOLOv5, YOLOv8 and YOLOv10: The go-to detectors for real-time vision, arXiv preprint arXiv:2407.02988DOI
14 
Xia Q., Zhang X., Ma B., Tao K., Zhang H., Yuan W., Ramakrishna S., Ye T., 2024, A state-of-the-art review of through-silicon vias: Filling materials, filling processes, performance, and integration, Advanced Engineering Materials, Vol. 27, No. 1, pp. 2401799DOI
15 
Chakrabarty K., Deutsch S., Thapliyal H., Ye F., 2012, TSV defects and TSV-induced circuit failures: The third dimension in test and design-for-test, Proceedings of the IEEE, Vol. 100, No. 6, pp. 1720-1735DOI
16 
Cheng Z., Ding Y., Xiao L., Wang X., Chen Z., 2019, Comparative evaluations on scallop-induced electric-thermo-mechanical reliability of through-silicon-vias, Microelectronics Reliability, Vol. 103, pp. 113512DOI
17 
van Rijsbergen C. J., 1979, Information RetrievalGoogle Search
Kyeong Beom Park
../../Resources/ieie/JSTS.2025.25.6.645/au1.png

Kyeong Beom Park is an Assistant at the R&D Center of Gooil Solution. He received his B.S. degree in optical engineering from Kumoh National Institute of Technology and is currently pursuing an M.E. degree in electronic and electrical engineering. His research interests include deep learning applications in industrial systems and advanced semiconductor technologies.

Jae Yeol Lee
../../Resources/ieie/JSTS.2025.25.6.645/au2.png

Jae Yeol Lee is an Executive Director and Chief of the R&D Center at Gooil Solution. He received his M.S. degree in metallurgical engineering from Kyungpook National University in 2000. At Gooil Solution, he has been leading the development of semiconductor and display process and inspection equipment. His research interests focus on enhancing inspection accuracy and automation in high-precision manufacturing environments.

Harim Lee
../../Resources/ieie/JSTS.2025.25.6.645/au3.png

Harim Lee received his B.S. degree in electrical engineering from Kyungpook National University, Daegu, South Korea, in 2013, an M.S. degree in IT convergence engineering from the Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2015, and a Ph.D. degree from the School of Electrical and Computer Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea, in August 2020. Since September 2021, he has been an Assistant Professor with the School of Electronic Engineering, Kumoh National Institute of Technology, Gumi, South Korea. His research interests include Intelligent system based on Deep Learning and Implementing deep neural networks on FPGA using Verilog HDL.