Real-time TSV 3D Shape Defect Inspection System Using Deep Learning based Fast Object
Detection
ParkKyeong Beom1
LeeJae Yeol1
LeeHarim1*
-
(School of Electronic Engineering, Kumoh National Institute of Technology, Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index terms
TSV, deep learning, object detection, YOLO, inspection system
I. INTRODUCTION
Through-Silicon Via (TSV) is a technology that creates direct connections between
different layers or chips in a 3D integrated circuit by filling vertical holes through
a wafer with metal. The technology offers advantages over conventional wire bonding
methods, including smaller package size, reduced power consumption, and higher integration
density.
TSVs have been shown to play a critical role in a variety of applications, including
high-performance computing, memory consolidation, and heterogeneous integration. By
increasing the data transfer bandwidth between stacked layers, TSVs enable faster
and more efficient data processing [1,2].
However, various defects can occur during the etching processes for TSV formation,
which can critically degrade the electrical and mechanical reliability of semiconductor
devices and reduce overall manufacturing yield. In particular, defects that arise
after dry etching can have severe consequences on subsequent processing steps, making
it essential to understand their root causes and to establish precise detection techniques.
One of the key defects in TSV etching is etch non-uniformity, which is primarily caused
by nonuniform plasma distribution or wafer-level loading effects. This leads to variations
in TSV depth and diameter, resulting in discrepancies in electrical characteristics,
signal distortion, and capacitance imbalance across the TSV array [3].
Additionally, degradation of the etching mask or contamination from micro-particles
during the process can lead to micro-masking, forming localized regions of etch inhibition.
These defects result in rough sidewalls, irregular profiles, and uneven etching, all
of which can contribute to void formation or delamination during subsequent metal
filling processes [4].
Etch residues and structural non-uniformities also increase electrical resistance,
cause signal delay, and may lead to leakage currents, ultimately compromising the
performance and reliability of TSV-integrated high-density semiconductor devices.
Therefore, defects such as etch non-uniformity, etch residue, and micro-masking possess
well-defined causes and mechanisms. Accurate and reproducible detection technologies
are indispensable to mitigate the adverse effects of these defects.
Existing approaches such as scanning electron microscopy (SEM) and X-ray inspection
[2] have been widely employed to detect TSV defects. While these top-view inspection
methods provide high spatial resolution, they have several limitations that restrict
their use in high-volume manufacturing. SEM is a destructive technique that requires
long inspection times, resulting in low throughput. X-ray imaging lacks the resolution
necessary to detect nanoscale polymer residues or subtle morphological anomalies,
and its high sensitivity to vibration reduces its suitability for in-line inspection.
Consequently, both methods are costly and impractical for real-time defect detection.
To overcome these limitations, a real-time TSV 3D shape defect inspection system is
proposed by utilizing deep learning-based object detection models. In this work, we
consider YOLOv8 and YOLOv10, which are the latest network architectures among the
YOLO network family. YOLO family networks are constructed in a one-stage manner, and
thus they have the advantage of fast inference time. Hence, those networks are suitable
for building a real-time TSV 3D shape defect inspection system.
As a result, the proposed DHM-based method, which utilizes a real-time deep learning-based
object detection method, offers several advantages, including being non-destructive,
robust to mechanical vibrations, and enabling fast image acquisition, providing practical
advantages for real-time in-line implementation in semiconductor manufacturing processes.
Even though all the networks under consideration belong to the YOLO family, the network
architecture of each individual network exhibits unique characteristics. Specifically,
the YOLOv8 model utilizes an anchor-free detection structure, while the YOLOv10 model
employs a dual-head structure and a non-maximum suppression (NMS)-free learning structure.
The differences can result in different inference and detection performance. Through
extensive evaluations, we provide a guideline of which network is appropriate for
inference time or detection performance.
In addition, to collect real datasets, we utilize Digital Holographic Microscopy (DHM),
which is a holography-based interferometric measurement device that enables high-resolution
3D data acquisition. DHM is used to measure TSV-patterned wafer samples and generate
3D Point Cloud Data (PCD) accurately capturing the geometry of TSV structures. Our
data collection scheme allows us to obtain real TSV defect patterns, which ensures
the reliability and practical applicability of the trained network performance.
II. REAL-TIME TSV 3D SHAPE DEFECT INSPECTION SYSTEM
Fig. 1 illustrates the overall workflow representative inference results of the proposed
TSV 3D shape defect inspection system. At the center of the figure, a schematic diagram
presents the algorithmic flow of the system, starting with wafer loading, stage positioning,
and 3D shape measurement using DHM. The acquired 3D image is processed by a YOLO-based
object detection module that classifies each TSV pattern and identifies defects in
real-time. The inspection results are then evaluated by the main controller, which
determines whether to continue or terminate the inspection process.
On the left side of the schematic, a photograph of the actual system setup is provided,
showing the physical configuration composed of the optical inspection unit, precision
stage, and control modules. On the right side, a screen-shot of the inspection software
presents a visualization interface for monitoring TSV defect results and the corresponding
PCD.
Conventional TSV defect inspection methods have relied on X-ray or SEM imaging, which
provide high-resolution analysis but suffer from slow inspection speeds and are not
suitable for real-time, large-area analysis. In contrast, the proposed system integrates
DHM and deep learning-based object detection to enable fast and accurate 3D defect
inspection of TSV, offering a significant advantage over traditional approaches. To
address the slow inference time problem of existing inspection systems based on SEM
and X-ray inspection, we propose the adoption of deep learning-based object detection
schemes for TSV 3D shape defect inspection system as in Fig. 1. Over the past decade, various deep learning networks have been developed for object
detection, among which the YOLO family has been particularly optimized for fast inference.
In particular, YOLOv8 and YOLOv10 are examined in detail through extensive evaluations.
The objective of this study is to investigate whether deep learning-based schemes
that provide rapid inference times can overcome the slow inspection time problem of
existing methods. Through extensive evaluations, the performance of YOLOv8 and YOLOv10
models is compared, which are trained with our real TSV pattern dataset obtained from
real 8-inch silicon wafers. By analyzing the evaluation results, a guideline will
be provided on which network is appropriate for inference time or detection performance.
In the following subsection, we explain in detail the architectures of the YOLO family
of networks and the differences between YOLOv8 and YOLOv10 that are considered for
the TSV 3D shape defect inspection system.
Fig. 1. Overview of the proposed TSV 3D shape defect inspection system.
1. YOLO Networks for TSV 3D Defect Recognition
Following the introduction of YOLOv1 in 2016, the YOLO family of networks has undergone
continuous development and remains a prevalent object detection model in contemporary
research [5-11]. The YOLO family of networks is structured in a one-stage manner, i.e., it simultaneously
detects and classifies the location and type of objects in input images. In contrast,
the two-stage object detection network recognizes candidate regions of objects and
classifies the types of the proposed object regions. Therefore, due to the one-stage
structure, the YOLO family networks have the advantage of fast inference times. This
study explores the potential of the YOLO family of networks to achieve rapid inference,
a key advantage of these networks. To this end, two recent models within the YOLO
series are examined: YOLOv8 and YOLOv10. Distinguishing features of these networks
include an anchor-free detection structure and an NMS-free training and dual-head
structure, respectively. These different features can result in different inference
times and detection performance.
Figs. 2 and 3 presents the architectures of YOLOv8 and YOLOv10. The architecture of these networks
comprises three distinct components: the Backbone, Neck, and Head. The primary function
of the Backbone and Neck components is to extract features from input images, while
the Head component is responsible for detecting the location and type of the object.
The Backbone is a CNN based network responsible for extracting key features from input
images. The Neck combines feature maps extracted by the Backbone to enhance the detection
of small objects and multi-scale objects while serving as a bridge between the Backbone
and the Head. Lastly, the Head predicts the locations and classes of objects based
on the feature maps received from the Neck [12].
Fig. 2. YOLOv10 architecture.
Fig. 3. YOLOv8 architecture.
Even though they have a similar overall architecture, the two networks adopt different
structures in detail. As in [12], YOLOv8 employs cross stage partial darknet (CSP-Darknet) as its Backbone and path
aggregation network (PANet) as its Neck. In addition, the model adopts an anchor-free
detection mechanism for the Head, which eliminates the need for anchor boxes, thereby
reducing inference time. Additionally, YOLOv8 utilizes the NMS to efficiently eliminate
duplicate bounding boxes, enabling the detection of even small objects. It operates
using a single-head classification mechanism, where the anchor-free detection Head
outputs the class probabilities for each bounding box [13]. The final class is determined using the softmax function, allowing the model to
classify objects of various sizes with high accuracy, even without anchor boxes. In
contrast, YOLOv10 utilizes an improved version of cross stage partial network (CSPNet)
as its Backbone while it adopts a similar PANet as its Neck. For the Head, YOLOv10
incorporates a dual-head structure that eliminates the need for NMS-free, which is
different from YOLOv8, simplifies the post-processing pipeline, and improve both accuracy
and speed. In Fig. 3, the dual-head structure combines one-to-many and one-to-one strategies during training,
enhancing classification precision. The primary goal of this architect-ture is to
optimize the balance between speed and accuracy by removing the NMS-free process and
simplifying the overall post-processing steps [11].
Due to the differences in the Head part, there are some differences in loss functions.
YOLOv8 uses a multi-task loss function composed of bounding box loss, class probability
loss, and objectness loss [12]. In contrast, YOLOv10 employs box regression loss because it utilizes NMS-free training.
The box regression loss can improve the accuracy of prediction of bounding boxes.
The NMS-free training with box regression loss allows YOLOv10 not to conduct additional
NMS-free process. Therefore, this enables faster inference time as well as efficient
training. This enables YOLOv10 to achieve a superior balance between processing speed
and accuracy in comparison to YOLOv8 [11].
III. DATASET: TSV PCD
1. PCD Data Collection in Real Environment
To collect TSV PCD for training YOLO networks, an 8-inch wafer with various TSV patterns
was prepared, as shown in Fig. 4. The figure displays the wafer along with sample measurement results obtained using
the DHM. The dataset consists of 1,000 PCD images with a resolution of 4880 × 3720,
each covering a field of view (FOV) of 0.585 × 0.446 mm. As depicted in the top right
of Fig. 4, each image includes more than 20 TSV holes within this FOV.
Fig. 4. An 8-inch wafer with defective TSV training.
2. Data Collection, Label, and Data Augmentation
For training and validation, different TSV patterns were generated on 8-inch silicon
wafers. The generated TSV patterns are categorized as a base pattern and variations
of it. The base pattern is a hole with a diameter of 50μm and a depth of 0.5 μm. Variations
of the base pattern are created by varying the diameter and changing the shape of
the base pattern.
The variations of the TSV pattern are categorized into five distinct classes, consisting
of one Pass class and four Fail classes, as shown in Fig. 5. The Pass pattern includes qualified holes with a diameter deviation of 10% or less
and no visible defects. The Fail 1 pattern refers to holes with a diameter more than
10% larger than the standard pattern. The Fail 2 pattern indicates holes with a diameter
more than 10% smaller than the standard. The Fail 3 pattern includes shapes that deviate
from the standard circular form, such as polygons or ellipses. Finally, the Fail 4
pattern consists of partially imaged holes that are not fully captured within the
camera’s FOV.
After collecting 1,000 images via DHM, we increase the data volume by applying data
augmentation methods such as vertical flipping, horizontal flipping or a combination
of the two. The augmentation results in 4,000 augmented images that are used for training
and validation. From the 4,000 images, the five patterns are extracted and the amount
of data per class is as follows: 27,404 for Pass, 14,308 for Fail 1, 9,556 for Fail
2, 32,768 for Fail 3, and 27,636 for Fail 4.
Fig. 5. Classification of TSV PCD.
These defect types are distinguished by their geometric irregularities and are closely
associated with the degradation of electrical reliability. Furthermore, they are considered
to be key failure mechanisms and have been reported in existing studies on the reliability
of TSVs [1].
Specifically, voids or incomplete filling can increase TSV resistance, leading to
signal delay or attenuation [14]. Oxide pinholes along the sidewall can create leakage paths or shorts [15]. Excessive diameter variation or non-circular geometry can induce non-uniform current
density and local stress, accelerating electromigration and reliability failures [16].
In the context of TSV manufacturing, these structural anomalies are well known to
function as major failure mechanisms, directly impacting production yield and long-term
device reliability [1].
For this study, we fabricated a test wafer containing a variety of representative
TSV defects (e.g. diameter deviation, shape distortion and void-inducing etch residue).
All PCDs were collected from this single wafer to verify the proof-of-concept of the
proposed system. In order to further enhance generalizability, future studies will
collect additional data from multiple wafers fabricated under realistic manufacturing
conditions.
IV. EVALUATION RESULTS
For the training, 80% of the total 4,000 images were used and 20% were utilized as
validation data. The input data was resized to 640 x 488 to match the first layer
of the networks. For 300 training epochs, networks are optimized by using Adam optimizer
with a learning rate of 1e−4.
1. Performance Metrics
For performance evaluation, several performance metrics are used: processing time,
precision, recall, F1 score, and Fβ score of processing time and detection performance.
Processing time refers to the average inference time required by each model to analyze
a single image or frame, indicating the suitability of the model for real-time applications.
Precision refers to the ratio of correctly predicted defect instances to the total
number of predicted defect instances. It reflects how accurately the model identifies
true defects among all its positive predictions.
Recall is the ratio of correctly predicted defect instances to the total number of
actual defect instances. It indicates the model’s ability to detect as many actual
defects as possible.
F1 score is the harmonic mean of precision and recall, providing a balanced measure
when precision and recall are equally important. By introducing a weighting factor
β to F1 score, Fβ score is introduced, which can control the relative importance of
recall over precision [17]. The performance metric is defined as follows.
This metric is particularly useful when prioritizing recall (i.e., detecting all true
defects) is more important than precision, such as in safety-critical defect inspection
tasks. By adjusting β, more emphasis can be placed on either detection recall or precision,
depending on the system requirements.
2. Performance Analysis
In actual defect inspection tasks, Recall is a critical performance metric, as it
reflects the model’s ability to detect all possible defective cases without omission.
Therefore, as shown in Table 1, we compared the recall values achieved by each YOLO model. All models demonstrated
recall values exceeding 0.99, indicating high sensitivity across variants. Based on
these results, we further analyzed each model at its highest Recall point by comparing
additional performance metrics, including processing time, precision, F1 score, and
Fβ scores, to evaluate the trade-off between detection accuracy and inference efficiency.
Table 1. Highest recall achieved by each YOLO model.
|
Model
|
V10-n
|
V8-n
|
V10-s
|
V8-s
|
V10-m
|
V10-b
|
V10-l
|
V8-m
|
V10-x
|
V8-l
|
V8-x
|
|
Recall
|
0.99203
|
0.99569
|
0.99917
|
0.99914
|
0.99922
|
0.99962
|
0.99923
|
0.99996
|
0.99991
|
0.99995
|
0.99989
|
Fig. 6 presents the evaluation of YOLOv8 and YOLOv10 models. Figs. 6(a)-6(c) show the change in processing time, the best precision at the highest recall and
the best F1 score at the highest mean average precision (mAP) as the number of network
parameters is varied, respectively.
Fig. 6(a) shows that the processing time increases as the number of parameters increases. This
is natural because more parameters require more computational operations. For a small
processing time, the number of network parameters should be reduced. Therefore, YOLOv10-n
achieves the fastest processing time of 0.18601 seconds. In Fig. 6(b) and Fig. 6(c), the precision and F1 score improve as the number of parameters increases. This is
because more parameters allow more rich features to be extracted, thus improving accuracy.
As a result, YOLOv8-l achieves the highest precision and F1 score of 0.99997 and 0.99989,
respectively.
Fig. 6. (a) Parameters vs processing time, (b) parameters vs precision, and c) parameters
vs F1 score.
In Fig. 7, we evaluate YOLOv8 and YOLOv10 by considering both processing time and detection
performance, we consider the performance metric in Eq. (1). This metric is a harmonic average of processing time and detection performance,
and thus the closer the value of this index is to 1, the shorter the processing time
and the higher the detection performance. Therefore, to select a network with good
processing time and detection performance, a network with an index close to 1 is selected.
In Fig. 7, the comparison of Fβ scores shows that the YOLOv10-n model achieved the highest
score of 0.92565.
As a result, through extensive evaluations, we confirm that YOLOv10-n should be used
when a TSV 3D shape inspection system requires fast inspection time or a combination
of good inference time and good detection accuracy while YOLOv8-l is recommended for
the best detection performance.
In summary, through comprehensive experiments, we confirmed that YOLOv10-n is the
most suitable model when the inspection system prioritizes fast inference time or
a balance between inference speed and detection accuracy. On the other hand, YOLOv8-l
demonstrates superior precision and F1 score, making it more appropriate in cases
where high detection accuracy is critical. Therefore, the selection of the optimal
model should be guided by the specific performance requirements of the target application.
Fig. 7. Parameter vs Fβ score.
3. Inference Results of the Proposed TSV 3D Shape Inspection System
Fig. 8 shows the detection results, where each TSV hole is localized with a black bounding
box and classified into one of four categories: Pass, Fail1, Fail2, or Fail3. The
class label and corresponding confidence score are displayed as text above each box.
The confidence score reflects both the class probability and localization accuracy
of the detected region.
As in Fig 8, most of the detected objects have a confidence score above 0.98, demonstrating the
model’s robust capability in identifying even subtle differences between defective
and non-defective TSV patterns. These results validate the effectiveness of the proposed
system for accurate and real-time TSV PCD inspection in practical semiconductor manufacturing
environments.
Fig. 8. Inference result of the proposed TSV 3D shape defect inspection system.
V. CONCLUSIONS AND FUTURE WORKS
In this study, we proposed a real-time TSV PCD inspection system utilizing deep learning-based
object detection models. By employing YOLOv8 and YOLOv10 architectures, we aimed to
overcome the limitations of conventional inspection methods such as high cost, slow
processing speed, and destructive nature. To enable practical and reliable training,
we constructed a high-quality dataset consisting of real PCD from 8-inch silicon wafers
using DHM.
Extensive experiments were conducted to evaluate the detection performance and inference
speed of each model using metrics such as precision, F1 score, and Fβ score. The results
showed that YOLOv8-l achieved the highest detection accuracy, while YOLOv10-n provided
the fastest processing speed and the best trade-off between speed and accuracy based
on Fβ.
Based on these findings, our research is expected to contribute to future advancements
in 3D semiconductor defect inspection by demonstrating the feasibility of real-time,
non-destructive, and high-precision inspection using lightweight deep learning architectures.
Furthermore, the proposed framework and evaluation results will be of great help in
supporting the development and deployment of AI-based inspection systems in practical
semiconductor manufacturing environments.
For future work, we plan to collect larger datasets from diverse wafers fabricated
under real manufacturing conditions to improve the reliability of the proposed system.
In addition, we will develop a hybrid inspection strategy for detecting defects on
TSV inner sidewalls, thereby addressing the limitations of top-view imaging and enhancing
the robustness of the inspection framework.
ACKNOWLEDGEMENTS
This work was supported by the RISE project of Kumoh National Institute of Technology.
References
Wang J., Duan F., Lv Z., Chen S., Yang X., Chen H., Liu J., 2023, A short review of
through-silicon via (TSV) interconnects: Metrology and analysis, Applied Sciences,
Vol. 13, No. 14, pp. 8301

Shang H., Sun S., 2017, Three-dimensional integrated circuit (3D IC) key technology:
Through-silicon via (TSV), Nano Research, Vol. 12, No. 6, pp. 1831-1840

Richter H., Pfitzner L., Pfeffer M., Bauer A., Siegert J., Bodner T., 2016, Advanced
detection method for polymer residues on semiconductor substrates: 3D/TSV/interposer:
Through silicon via and packaging, Proc. of the IEEE Advanced Semiconductor Manufacturing
Conference (ASMC), pp. 472-474

Kim S., Lee J., Hwang Y., Park Y., Lee J., 2016, Integrated clean for TSV: Comparison
between dry process and wet processes and their electrical qualification, Proc. of
the IEEE Electronic Packaging Technology Conference (EPTC), pp. 311-314

Redmon J., Divvala S., Girshick R., Farhadi A., 2016, You only look once: Unified,
real-time object detection, Proc. of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 779-788

Redmon J., Farhadi A., 2017, YOLO9000: Better, faster, stronger, Proc. of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263-7271

Redmon J., Farhadi A., 2018, YOLOv3: An incremental improvement, arXiv preprint arXiv:1804.02767

Bochkovskiy A., Wang C.-Y., Liao H.-Y. M., 2020, YOLOv4: Optimal speed and accuracy
of object detection, arXiv preprint arXiv:2004.10934

Wang C.-Y., Bochkovskiy A., Liao H.-Y. M., 2022, YOLOv6: A single-stage object detection
framework for industrial applications, arXiv preprint arXiv:2209.02976

Wang C.-Y., Bochkovskiy A., Liao H.-Y. M., 2022, YOLOv7: Trainable bag-of-freebies
sets new state-of-the-art for real-time object detectors, arXiv preprint arXiv:2207.02696

Wang A., Chen H., Liu L., Chen K., Lin Z., Han J., Ding G., 2024, YOLOv10: Real-time
end-to-end object detection, arXiv preprint arXiv:2405.14458

Smith A., 2023, Real-time flying object detection with YOLOv8, arXiv preprint arXiv:2305.09972

Hussain M., 2024, YOLOv5, YOLOv8 and YOLOv10: The go-to detectors for real-time vision,
arXiv preprint arXiv:2407.02988

Xia Q., Zhang X., Ma B., Tao K., Zhang H., Yuan W., Ramakrishna S., Ye T., 2024, A
state-of-the-art review of through-silicon vias: Filling materials, filling processes,
performance, and integration, Advanced Engineering Materials, Vol. 27, No. 1, pp.
2401799

Chakrabarty K., Deutsch S., Thapliyal H., Ye F., 2012, TSV defects and TSV-induced
circuit failures: The third dimension in test and design-for-test, Proceedings of
the IEEE, Vol. 100, No. 6, pp. 1720-1735

Cheng Z., Ding Y., Xiao L., Wang X., Chen Z., 2019, Comparative evaluations on scallop-induced
electric-thermo-mechanical reliability of through-silicon-vias, Microelectronics Reliability,
Vol. 103, pp. 113512

van Rijsbergen C. J., 1979, Information Retrieval

Kyeong Beom Park is an Assistant at the R&D Center of Gooil Solution. He received
his B.S. degree in optical engineering from Kumoh National Institute of Technology
and is currently pursuing an M.E. degree in electronic and electrical engineering.
His research interests include deep learning applications in industrial systems and
advanced semiconductor technologies.
Jae Yeol Lee is an Executive Director and Chief of the R&D Center at Gooil Solution.
He received his M.S. degree in metallurgical engineering from Kyungpook National University
in 2000. At Gooil Solution, he has been leading the development of semiconductor and
display process and inspection equipment. His research interests focus on enhancing
inspection accuracy and automation in high-precision manufacturing environments.
Harim Lee received his B.S. degree in electrical engineering from Kyungpook National
University, Daegu, South Korea, in 2013, an M.S. degree in IT convergence engineering
from the Pohang University of Science and Technology (POSTECH), Pohang, South Korea,
in 2015, and a Ph.D. degree from the School of Electrical and Computer Engineering,
Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea, in
August 2020. Since September 2021, he has been an Assistant Professor with the School
of Electronic Engineering, Kumoh National Institute of Technology, Gumi, South Korea.
His research interests include Intelligent system based on Deep Learning and Implementing
deep neural networks on FPGA using Verilog HDL.