Mobile QR Code QR CODE
Title Fault-tolerant GEMM Acceleratorbased on Microarchitectural Fault Analysis for Resource-constrained Devices
Authors (Sunyoung Park) ; (Hannah Yang) ; (Hana Kim) ; (Hyunji Kim) ; (Ji-Hoon Kim)
DOI https://doi.org/10.5573/JSTS.2025.25.3.318
Page pp.318-324
ISSN 1598-1657
Keywords Neural networks; systolic array; functional safety; fault-tolerant; fault mitigation
Abstract As semiconductor technologies advances to the nanoscale, the likelihood of hardware faults increases, posing significant challenges in safety-critical applications such as autonomous driving and medical devices that are heavily rely on neural networks. To address this issue, we propose a fault-tolerant general matrix multiplication (GEMM) accelerator designed for resource-constrained edge devices. First, we introduce a high-low bit swapping mechanism (HL-Swap) to improve the fault resilience of registers in critical hardware components. Second, we quantify the impact of fault characteristics on accuracy degradation and propose a microarchitectural location-aware strategy that disables row-column operations (RC-Off). The proposed hardware is implemented in Samsung 28nm FDSOI technology, operating at a 1.0 V supply voltage with a 250 MHz clock frequency. Through tests utilizing 1000 random faults injected into the systolic array, we show that our proposed GEMM accelerator significantly mitigates accuracy degradation with hardware overhead of 2.4% and 8.9% for RC-Off and HL-Swap, respectively.
In particular, compared to the conventional GEMM, a 63% improvement in performance was achieved in a scenario with a faulty PE rate (FPR) of 6%.