Quiet
  • HOME
  • ARCHIVE
  • CATEGORIES
  • TAGS
  • LINKS
  • ABOUT

Tesla

  • HOME
  • ARCHIVE
  • CATEGORIES
  • TAGS
  • LINKS
  • ABOUT
Quiet主题
  • Paper
  • Computer Vision
  • 3D Reconstruction

DUSt3R

Tesla
Paper Computer Vision

2025-08-01 16:08:39

DUSt3R

Type: Paper
Notebook: Paper (https://www.notion.so/Paper-17de7e7bfd4c80e2bba1f0fe3a6c1131?pvs=21)

Motivation

The traditional SfM task (estimating the position of a 3D point given a sparse set of correspondences of multiple images and their image features) will divide it into sub-tasks, including the parameterization, matching, feature.

But the previous sub-tasks will feed the error into the next sub-task

So DUSt3R construct the network model end-to-end

Input & Output

Two images from two views

PointMap (HxWx3)

From the camera to the object, the position (x,y) records the 3D coordinate of the closest object, while it will be sheltered by the Translucent object

ConfidenceMap (HxW)

The true probability of each point in Pointmap

Network

  1. Put the image into ViT encoder with shared weights to generate two tokens F1 and F2
  2. The Transformer Decoder will perform self-attention and then exchange info via cross-attention
  3. Output pointmaps and confidence maps.

Loss Function

3D Regression Loss

Lregr=∣∣1zXiˉj−1zXij∣∣L_{regr}=||\frac 1 z \bar{X_i}^j-\frac 1 z X_i^j|| Lregr​=∣∣z1​Xi​ˉ​j−z1​Xij​∣∣

1z\frac 1 zz1​ is the normalization factor, representing the average distance of the 3D points from the origin

It is the 3D distance error between the true points and PointMap points

Confidence-aware Loss

Lconf=∑v=1,2∑i∈DvCiv,1lregr(v,i)−αlog⁡Civ,1L_{conf}=\sum_{v=1,2} \sum_{i\in D^v} C_i^{v,1}l_{regr}(v,i)-\alpha \log C_i^{v,1} Lconf​=v=1,2∑​i∈Dv∑​Civ,1​lregr​(v,i)−αlogCiv,1​

Multiplication of the confidence value and the 3D distance error, so that the confidence with larger distance will be lowered

上一篇

EvGGS

下一篇

Hexo Wiki

©2025 By Tesla
Quiet主题