IKDDiT Logo

IKDDiT

Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer

Yuan-Fu Yang1 · Hsiu-Hui Hsiao2
1National Yang Ming Chiao Tung University,  2National Taiwan University of Science and Technology
Diffusion Transformer Photolithography Overlay Map Generation Semiconductor Manufacturing

Introduction

IKDDiT explores how diffusion models and knowledge distillation can improve overlay map generation in semiconductor photolithography. This page provides an accessible overview of the motivation, the core design, and representative results, with figures consolidated from the paper and its supplementary material.

High-level overview of IKDDiT pipeline

Core Idea

Rather than relying on a heavy standalone model, IKDDiT uses a distilled teacher network to inject semiconductor-specific priors into a compact diffusion generator. The result is a model that is both efficient and accurate for overlay map synthesis.

Architecture of the IKDDiT, which leverages a pre-trained text encoder εφt and an image encoder εφi, developed through unified contrastive learning, to generate conditional tokens. These tokens are subsequently processed by the teacher and student DiT encoders to perform a self-supervised discriminative process using Dφ within the joint embedding space.

In short: a compact diffusion model enhanced by knowledge transfer for manufacturing data.

IKDDiT

Architecture

Architecture of the IKDDiT, which utilizes pre-trained text encoder εφt and image encoder εφi, developed through unified contrastive learning, to generate conditional tokens. These tokens are then processed through the teacher and student DiT encoders to perform a self-supervised discriminative process using Dφ within the joint embedding space.

Architecture of IKDDiT

Training efficiency

To evaluate the convergence behavior of our IKDDiT model, we compare FID scores across training stages against state-of-the-art baselines. All models, in the XL configuration, are trained with a batch size of 64 for up to 578.1k iterations. As shown in Figure 5, IKDDiT exhibits consistently faster convergence. At 250k iterations, IKDDiT reaches an FID of 11.6, already surpassing DiT, MDT, and MaskDiT, which only achieve FID scores of 14.1, 12.2, and 11.9, respectively, after 500k iterations. Furthermore, IKDDiT attains an FID of 6.8 at 500k iterations, outperforming all competing methods. These results demonstrate that IKDDiT converges nearly twice as fast, underscoring the effectiveness of incorporating self-supervised discrimination into DiT training.

Training

Results

Scalability and Model Configurations.
Figure 4
Model Scaling on Training Loss
Figure 5

Visualization Result

Qualitative Comparison on Overlay Map Generation.
Visual2
Representative Results by Our Proposed Model.
Visual

Resources

BibTeX

@inproceedings{IKDDiT2025,
  author    = {Yuan-Fu Yang and Hsiu-Hui Hsiao},
  title     = {Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year      = {2025},
  note      = {To appear}
}

Replace with the official venue and full bibliographic information once finalized.