|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- stabilityai/stable-diffusion-2 |
|
pipeline_tag: depth-estimation |
|
--- |
|
<!-- # DepthMaster: Taming Diffusion Models for Monocular Depth Estimation |
|
|
|
|
|
This repository represents the official implementation of the paper titled "DepthMaster: Taming Diffusion Models for Monocular Depth Estimation". --> |
|
|
|
<!-- [![Website](doc/badges/badge-website.svg)](https://marigoldmonodepth.github.io) |
|
[![Paper](https://img.shields.io/badge/arXiv-PDF-b31b1b)](https://arxiv.org/abs/2312.02145) --> |
|
|
|
<!-- [![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0) --> |
|
|
|
<h1 align="center"><strong>DepthMaster: Taming Diffusion Models for Monocular Depth Estimation</strong></h1> |
|
<p align="center"> |
|
<a href="https://indu1ge.github.io/ziyangsong">Ziyang Song*</a>, |
|
<a href="https://orcid.org/0009-0001-6677-0572">Zerong Wang*</a>, |
|
<a href="https://orcid.org/0000-0001-7817-0665">Bo Li</a>, |
|
<a href="https://orcid.org/0009-0007-1175-5918">Hao Zhang</a>, |
|
<a href="https://ruijiezhu94.github.io/ruijiezhu/">Ruijie Zhu</a>, |
|
<a href="https://orcid.org/0009-0004-3280-8490">Li Liu</a>, |
|
<a href="https://pengtaojiang.github.io/">Peng-Tao Jiang†</a>, |
|
<a href="http://staff.ustc.edu.cn/~tzzhang/">Tianzhu Zhang†</a>, |
|
<br> |
|
*Equal Contribution, †Corresponding Author |
|
<br> |
|
University of Science and Technology of China, vivo Mobile Communication Co., Ltd. |
|
<br> |
|
<b>Arxiv 2025</b> |
|
</p> |
|
<!-- [Ziyang Song*](https://indu1ge.github.io/ziyangsong), |
|
[Zerong Wang*](), |
|
[Bo Li](https://orcid.org/0000-0001-7817-0665), |
|
[Hao Zhang](https://orcid.org/0009-0007-1175-5918), |
|
[Ruijie Zhu](https://ruijiezhu94.github.io/ruijiezhu/), |
|
[Li Liu](https://orcid.org/0009-0004-3280-8490) |
|
[Tianzhu Zhang](http://staff.ustc.edu.cn/~tzzhang/) |
|
[Peng-Tao Jiang](https://pengtaojiang.github.io/) --> |
|
|
|
|
|
|
|
<div align="center"> |
|
<a href='https://arxiv.org/abs/2501.02576'> |
|
<img src='https://img.shields.io/badge/Paper-arXiv-red'> |
|
</a> |
|
<a href='https://indu1ge.github.io/DepthMaster_page/'> |
|
<img src='https://img.shields.io/badge/Project-Page-Green'> |
|
</a> |
|
<a href='https://github.com/indu1ge/DepthMaster'> |
|
<img src='https://img.shields.io/badge/GitHub-Repository-blue?logo=github'> |
|
</a> |
|
<a href='https://www.apache.org/licenses/LICENSE-2.0'> |
|
<img src='https://img.shields.io/badge/License-Apache--2.0-929292'> |
|
</a> |
|
</div> |
|
|
|
|
|
|
|
<!-- We present Marigold, a diffusion model, and associated fine-tuning protocol for monocular depth estimation. Its core principle is to leverage the rich visual knowledge stored in modern generative image models. Our model, derived from Stable Diffusion and fine-tuned with synthetic data, can zero-shot transfer to unseen data, offering state-of-the-art monocular depth estimation results. --> |
|
|
|
|
|
![teaser](assets/framework.png) |
|
|
|
<!-- >We present DepthMaster, a tamed single-step diffusion model designed to enhance the generalization and detail preservation abilities of depth estimation models. Through feature alignment, we effectively prevent the overfitting to texture details. By adaptively enhance --> |
|
>We present DepthMaster, a tamed single-step diffusion model that customizes generative features in diffusion models to suit the discriminative depth estimation task. We introduce a Feature Alignment module to mitigate overfitting to texture and a Fourier Enhancement module to refine fine-grained details. DepthMaster exhibits state-of-the-art zero-shot performance and superior detail preservation ability, surpassing |
|
other diffusion-based methods across various datasets. |
|
|
|
|
|
## 🎓 Citation |
|
|
|
Please cite our paper: |
|
|
|
```bibtex |
|
@article{song2025depthmaster, |
|
title={DepthMaster: Taming Diffusion Models for Monocular Depth Estimation}, |
|
author={Song, Ziyang and Wang, Zerong and Li, Bo and Zhang, Hao and Zhu, Ruijie and Liu, Li and Jiang, Peng-Tao and Zhang, Tianzhu}, |
|
journal={arXiv preprint arXiv:2501.02576}, |
|
year={2025} |
|
} |
|
``` |
|
|
|
## Acknowledgements |
|
|
|
The code is based on [Marigold](https://github.com/prs-eth/Marigold). |
|
|
|
## 🎫 License |
|
|
|
This work is licensed under the Apache License, Version 2.0 (as defined in the [LICENSE](LICENSE.txt)). |
|
|
|
By downloading and using the code and model you agree to the terms in the [LICENSsE](LICENSE.txt). |
|
|
|
[![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0) |