Diffusion Knows Transparency (DKT)

This repository contains the weights for DKT (Diffusion Knows Transparency), a foundation model for transparent-object, in-the-wild, and arbitrary-length video depth and normal estimation.

Project Page | GitHub | Paper

Introduction

DKT repurposes generative video priors from large-scale diffusion models into robust, temporally coherent perception tasks. By learning a video-to-video translator for depth and normals via lightweight LoRA adapters, it achieves zero-shot SOTA results on benchmarks involving transparency, such as ClearPose and DREDS.

Usage

To use this model, please clone the GitHub repository and follow the installation instructions.

from dkt.pipelines.pipelines import DKTPipeline
import os
from tools.common_utils import save_video

# Initialize the pipeline
pipe = DKTPipeline()

# Path to your input video
demo_path = 'examples/1.mp4'
prediction = pipe(demo_path)

# Save the output
save_dir = 'logs'
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, 'demo.mp4')
save_video(prediction['colored_depth_map'], output_path, fps=25)

Citation

@article{dkt2025,
  title   = {Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation},
  author  = {Shaocong Xu and Songlin Wei and Qizhe Wei and Zheng Geng and Hong Li and Licheng Shen and Qianpu Sun and Shu Han and Bin Ma and Bohan Li and Chongjie Ye and Yuhang Zheng and Nan Wang and Saining Zhang and Hao Zhao},
  journal = {arXiv preprint arXiv:2512.23705},
  year    = {2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Spaces using Daniellesry/DKT-Normal-14B 3

Collection including Daniellesry/DKT-Normal-14B