Introduction
IKDDiT explores how diffusion models and knowledge distillation can improve overlay map generation in semiconductor photolithography. This page provides an accessible overview of the motivation, the core design, and representative results, with figures consolidated from the paper and its supplementary material.

Core Idea
Rather than relying on a heavy standalone model, IKDDiT uses a distilled teacher network to inject semiconductor-specific priors into a compact diffusion generator. The result is a model that is both efficient and accurate for overlay map synthesis.
In short: a compact diffusion model enhanced by knowledge transfer for manufacturing data.

Architecture
Architecture of the IKDDiT, which utilizes pre-trained text encoder εφt and image encoder εφi, developed through unified contrastive learning, to generate conditional tokens. These tokens are then processed through the teacher and student DiT encoders to perform a self-supervised discriminative process using Dφ within the joint embedding space.

Training efficiency
To evaluate the convergence behavior of our IKDDiT model, we compare FID scores across training stages against state-of-the-art baselines. All models, in the XL configuration, are trained with a batch size of 64 for up to 578.1k iterations. As shown in Figure 5, IKDDiT exhibits consistently faster convergence. At 250k iterations, IKDDiT reaches an FID of 11.6, already surpassing DiT, MDT, and MaskDiT, which only achieve FID scores of 14.1, 12.2, and 11.9, respectively, after 500k iterations. Furthermore, IKDDiT attains an FID of 6.8 at 500k iterations, outperforming all competing methods. These results demonstrate that IKDDiT converges nearly twice as fast, underscoring the effectiveness of incorporating self-supervised discrimination into DiT training.

Results


Visualization Result


Resources
BibTeX
@inproceedings{IKDDiT2025, author = {Yuan-Fu Yang and Hsiu-Hui Hsiao}, title = {Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year = {2025}, note = {To appear} }
Replace with the official venue and full bibliographic information once finalized.