Efficient and Robust SEM Image Denoising for Wafer Defect Inspection

Hyunwoong Bae$^\textrm{1}$, Jaeseok Byun$^2$, Yongwoo Lee$^3$, and Taesup Moon$^{1,2,4}$
$^1$ Interdisciplinary Program in Artificial Intelligence, Seoul National University.
$^2$ Department of Electrical and Computer Engineering, Seoul National University. $^3$ Samsung Electronics.
$^4$ Artificial Intelligence Institute of Seoul National University (AIIS) / Automation and Systems Research Institute (ASRI) / Institute of New Media and Communications (INMC), Seoul National University.

Figure 1. Qualitative comparison of denoising results. The first row shows denoising performance on normal structured (F01) input images, while the second row presents results on structurally different (SDF01) images. The middle section illustrates the denoising results, and the right section displays the final circle detection outcomes, with “Good” detection marked in green and “Bad” detection marked in red. ReNIn demonstrates superior denoising and edge-preserving capabilities across both image types, whereas other methods fail to maintain structural integrity in SDF01 images, resulting in poor detection performance.

Abstract

Noise in scanning electron microscopy (SEM) often obscures details that are critical for accurate wafer defect inspection. Deep learning-based denoising methods have been widely used to address this problem, but they have two major limitations in SEM image denoising: lack of both efficient and powerful denoising methods, and poor generalization to image structures that are unseen during training.

We propose Relaxed Noise2Noise with Input dropout (ReNIn), which includes components that address the above two issues. Firstly, our Relaxed Noise2Noise (RN2N) framework provides a much better trade-off between the denoising performance and training data collection costs. Secondly, our input dropout method improves generalization, enhancing performance on images structurally different from the training data while maintaining strong results on normal images.

Background

Figure2. Training data collection cost vs. Denoising performance (PSNR). We compare models on our SEM wafer dataset, using PSNR as the evaluation metric. Each point on the x-axis represents the number of raw noisy images (F#) needed to obtain a target image. Figure3. Generalization issue of Noise2Noise (N2N). The upper-right image shows a test image similar to the training set (regular circles), while the lower-right represents a structurally different test image (irregular circles). N2N struggles with the latter, particularly in preserving edges.

Noise from SEM images complicates wafer defect inspection, a crucial task in semiconductor manufacturing. Traditional approaches involve averaging multiple noisy frames to obtain a clean image, a method that is resource-intensive and expensive. Deep learning-based techniques, like Noise2Noise (N2N) ([Lehtinen et al., 2018])(#Lehtinen), have gained traction but face challenges regarding generalization and efficiency.

Key Challenges:

  1. Efficiency: There is a trade-off between collecting training data (the number of raw noisy images) and denoising performance (measured in PSNR). See Figure2.
  2. Generalization: Models trained on regular structures often fail when applied to images with irregular patterns. See Figure3.

Solution: ReNIn

Figure3. Overall training procedure of ReNIn. Here, $\odot$ denotes element-wise multiplication. Note that the masking strategy is only applied in the training phase, not during the inference phase.

ReNIn incorporates two main innovations:

  1. RN2N (Relaxed Noise2Noise): A middle ground between supervised denoisers and N2N to reduce the cost of collecting training data without sacrificing performance.
  2. Input Dropout: This technique improves generalization to unseen structures by randomly dropping pixels during training, forcing the model to learn better representations.


Experimental Results

In our experiments, the ReNIn method demonstrates superior denoising performance compared to supervised learning, especially on challenging structurally different (SDF01) images. As shown in both quantitative (PSNR, SSIM) and qualitative results, ReNIn produces clearer images while preserving edge details, outperforming other methods that either struggle with noise removal or lose structural integrity. It achieves results nearly indistinguishable from those of supervised learning but at a significantly lower data collection cost ($8 \times$ cheaper), demonstrating its strong generalization ability, further enhanced by the use of input dropout.

PSNR(dB)/SSIM results

Since the evaluation images consist of one F01 and two SDF01 images, they are referred to as 1st F01, 1st SDF01, and 2nd SDF01 for convenience. The “average” column in PSNR/SSIM shows the averaged PSNR/SSIM values of the three evaluation images.
The best results for each image are marked in bold, while the second ones are underlined, except for the median filter (F32), which utilized high-cost F32 images as input.

Category Model 1st F01 1st SDF01 2nd SDF01 Average
Conventional Median filter (F32) 22.64 / 0.3382 21.70 / 0.2616 21.19 / 0.2409 21.84 / 0.2802
  Median filter (F01) 20.44 / 0.3143 14.09 / 0.2206 17.56 / 0.2130 17.36 / 0.2493
  BM3D [Dabov et al., 2007] 21.34 / 0.2974 15.12 / 0.2008 17.70 / 0.1905 18.05 / 0.2296
Deep learning-based Noise2Void [Krull et al., 2019] 14.56 / 0.0792 12.20 / 0.0500 13.35 / 0.0540 13.37 / 0.0611
  FBI-denoiser [Byun et al., 2021] 14.71 / 0.0827 12.30 / 0.0505 13.42 / 0.0537 13.48 / 0.0623
  Noise2Noise [Lehtinen et al., 2018] 21.92 / 0.2990 14.40 / 0.1150 16.63 / 0.1403 17.65 / 0.1848
  Supervised learning 23.66 / 0.3668 16.24 / 0.1727 17.77 / 0.1830 19.22 / 0.2408
  ReNIn (Ours) 23.05 / 0.3446 20.39 / 0.2508 19.12 / 0.2181 20.85 / 0.2712

Failure rate (FR) results

Details are the same as Table PSNR(dB)/SSIM results.

Category Model 1st F01 1st SDF01 2nd SDF01 Average
Conventional Median filter (F32) 0.00% 0.00% 0.00% 0.00%
  Median filter (F01) 12.50% 36.57% 52.31% 33.80%
  BM3D [Dabov et al., 2007] 8.33% 34.26% 67.59% 36.73%
Deep learning-based Noise2Void [Krull et al., 2019] 0.93% 96.30% 87.04% 61.42%
  FBI-denoiser [Byun et al., 2021] 1.85% 96.30% 83.33% 60.49%
  Noise2Noise [Lehtinen et al., 2018] 0.00% 100.00% 100.00% 66.67%
  Supervised learning 0.00% 100.00% 100.00% 66.67%
  ReNIn (Ours) 0.00% 0.93% 2.31% 1.08%

Qualitative results

See Figure 1 for a qualitative comparison of denoising results with supervised learning.


Reference

  • Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K. (2007). Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing, pages 2080–2095.
  • Krull, A., Buchholz, T.-O., and Jug, F. (2019). Noise2void-learning denoising from single noisy images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2129–2137. Institute of Electrical and Electronics Engineers/The Computer Vision Foundation.
  • Byun, J., Cha, S., and Moon, T. (2021). FBI-denoiser: Fast blind image denoiser for poisson-gaussian noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5768–5777. Institute of Electrical and Electronics Engineers/The Computer Vision Foundation.
  • Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., and Aila, T. (2018). Noise2Noise: Learning image restoration without clean data. In International Conference on Machine Learning, pages 4620–4631. International Machine Learning Society.

Note: This project page is under construction. Content is for demonstration purposes only.