Quiet
  • HOME
  • ARCHIVE
  • CATEGORIES
  • TAGS
  • LINKS
  • ABOUT

Tesla

  • HOME
  • ARCHIVE
  • CATEGORIES
  • TAGS
  • LINKS
  • ABOUT
Quiet主题
  • Paper
  • Computer Vision
  • SLAM
  • Event

EGS-SLAM

Tesla
Paper Computer Vision

2025-08-19 00:53:48

Content

  1. Background
  2. Model
  3. Blur-Aware Tracking
    1. Camera Trajectory
    2. Photometric Loss
      1. Blurred Image
      2. CRF
    3. Event Loss
      1. Event Frame
      2. Loss Function
    4. Depth Loss
    5. Pose Optimization
  4. Mapping
    1. Keyframe Management
    2. 3DGS Map Updating
  5. Experiment
    1. Replica
    2. DEVD

Background

  • Feature-based (ORB-SLAM):
    • Sensitive to motion blur
    • Weak texture reconstruction
  • Deblur GS/NeRF:
    • offline method + image-only
  • GS-SLAM:
    • Vulnerable to intense motion blur
  • Event-based (IncEventGS, E-NeRF):
    • Offline method due to SfM
  • EGS-SLAM:
    • Online, anti-blurred SLAM system

Model

The first part is 3D Gaussian map representation, where each scene is represented by Gaussian points (including position, covariance matrix, opacity, and color parameters). When a new scene is input, it continuously reduces the photometric, event, and depth loss for rendering, thereby optimizing the start and end poses of exposure. Once the pose optimization converges in tracking, it checks whether to select it as a keyframe and updates the entire keyframe queue. After generating a new frame, it initializes some new 3D Gaussian point clouds and optimizes them jointly with various losses.

Blur-Aware Tracking

Camera Trajectory

T(η)=[Slerp(R0,Rτ,ητ)(1−ητ)t0+ητtτ01]T(\eta)= \begin{bmatrix} \mathrm{Slerp}(R_0,R_\tau,\frac{\eta}{\tau}) & (1-\frac{\eta}{\tau})\mathbf{t}_0+\frac{\eta}{\tau}\mathbf{t}_\tau \\ 0 & 1 \end{bmatrix} T(η)=[Slerp(R0​,Rτ​,τη​)0​(1−τη​)t0​+τη​tτ​1​]

Propose a temporal interpolation model for intra-exposure trajectory since we need precise camera pose for each event.

Photometric Loss

Blurred Image

Blurred image = Integral of sharp images during exposure time

I~(u)=∫0τI(T(η),u)dη≈1K∑k=0K−1I(T(ηk),u)\tilde{\boldsymbol{I }}(\boldsymbol{u})=\int_0^\tau\mathcal{I}(\mathbf{T}(\eta),\boldsymbol{u})d\eta\approx\frac{1}{K}\sum_{k=0}^{K-1}\mathcal{I}(\mathbf{T}(\eta_k),\boldsymbol{u}) I~(u)=∫0τ​I(T(η),u)dη≈K1​k=0∑K−1​I(T(ηk​),u)

CRF

Learnable camera response function (CRF) bridges the HDR image and the LDR space by surpress the overexposure or underexposure.

CRFleaky(I~(u))={αI~(u),if I~(u)<0Interp(I~(u),Q)if 0≤I~(u)≤1,−αI~(u)+α+1,if I~(u)>1\begin{aligned} \mathrm{CRF}_\mathrm{leaky}(\tilde{\boldsymbol{I}}(u)) & = \begin{cases} \alpha\tilde{\boldsymbol{I}}(\boldsymbol{u}), & \mathrm{if~}\tilde{\boldsymbol{I}}(\boldsymbol{u})<0 \\ \mathrm{Interp}(\tilde{\boldsymbol{I}}(\boldsymbol{u}),\boldsymbol{Q}) & \mathrm{if~}0\leq\tilde{\boldsymbol{I}}(\boldsymbol{u})\leq1, \\ -\frac{\alpha}{\sqrt{\tilde{\boldsymbol{I}}(\boldsymbol{u})}}+\alpha+1, & \mathrm{if~}\tilde{\boldsymbol{I}}(\boldsymbol{u})>1 & \end{cases} \end{aligned} CRFleaky​(I~(u))​=⎩⎨⎧​αI~(u),Interp(I~(u),Q)−I~(u)​α​+α+1,​if I~(u)<0if 0≤I~(u)≤1,if I~(u)>1​​​

Event Loss

Event Frame

  • Ek(u)E_k(u)Ek​(u) is the ground truth event frame by aggerating the events in the exposure time τ\tauτ

  • E^k(u)=log⁡(B^(T(ηk),u))−log⁡(B^(T(ηk−1),u))\hat{E}_k(\boldsymbol{u})=\log(\hat{B}(T(\eta_k),\boldsymbol{u}))-\log(\hat{B}(T(\eta_{k-1}),\boldsymbol{u}))E^k​(u)=log(B^(T(ηk​),u))−log(B^(T(ηk−1​),u)), where B^\hat{B}B^ is the predicted grayscale image from the Gaussian map. And the rendered event frame is the log difference of the two consecutive frames.

Loss Function

  • Here θ\thetaθ is the predefined threshold to trigger events, which is also the parameter to scale the event data and the brightness change.

LHE=1K∑n=0K−1∑Ek(u)≠0∥θ⋅Ek(u)−E^k(u)∥1L_{HE}=\frac{1}{K}\sum_{n=0}^{K-1}\sum_{\boldsymbol{E}_{k}(\boldsymbol{u})\neq0}\left\|\theta\cdot E_{k}(\boldsymbol{u})-\hat{E}_{k}(\boldsymbol{u})\right\|_{1} LHE​=K1​n=0∑K−1​Ek​(u)=0∑​​θ⋅Ek​(u)−E^k​(u)​1​

  • The “No-Event” loss is used to penalize the event prediction when there are no events in the current frame to surpress the artifacts and accerate convergence.

LNE=1K∑n=0K−1∑Ek(u)=0∥E^k(u)∥1L_{NE}=\frac{1}{K}\sum_{n=0}^{K-1}\sum_{\boldsymbol{E}_{k}(\boldsymbol{u})=0}\left\|\hat{E}_{k}(u)\right\|_{1} LNE​=K1​n=0∑K−1​Ek​(u)=0∑​​E^k​(u)​1​

  • Total event loss:

LE=LHE+λNELNEL_E=L_{HE}+\lambda_{NE}L_{NE} LE​=LHE​+λNE​LNE​

Depth Loss

  • We choose the minimum depth value among the k bins:

LD=min⁡k∥Dobs−D(T(ηk))∥1L_D=\min_k\|D_{\mathrm{obs}}-\mathcal{D}(T(\eta_k))\|_1 LD​=kmin​∥Dobs​−D(T(ηk​))∥1​

Pose Optimization

  • T0∗,Tτ∗=arg⁡min⁡T0,Tτ(λELE+λID(λILI+λDLD))T_{0}^{*},T_{\tau}^{*}=\arg\min_{T_{0},T_{\tau}}\left(\lambda_{E}L_{E}+\lambda_{ID}\left(\lambda_{I}L_{I}+\lambda_{D}L_{D}\right)\right) T0∗​,Tτ∗​=argT0​,Tτ​min​(λE​LE​+λID​(λI​LI​+λD​LD​))

Mapping

Keyframe Management

  • Calculate IoU between current frame and previous key frame
  • Remove historical key frames having low IoU with current
  • If no frame removed, remove the most redundant frame

3DGS Map Updating

  • After the keyframe is selected, we will update the 3D Gaussian map by:

Lmap=λisoLiso+∑w∈W′(λELEw+λID(λILIw+λDLDw))L_{map}=\lambda_{iso}L_{iso}+\sum_{w\in\mathbf{W}^{\prime}}(\lambda_{E}L_{E}^{w}+\lambda_{ID}(\lambda_{I}L_{I}^{w}+\lambda_{D}L_{D}^{w})) Lmap​=λiso​Liso​+w∈W′∑​(λE​LEw​+λID​(λI​LIw​+λD​LDw​))

Need to note that Liso=1∣G∣∑i=1∣G∣∥si−1⋅s‾i∥L_{iso}=\frac{1}{|\mathbb{G}|}\sum_{i=1}^{|\mathbb{G}|}\|s_i-1\cdot\overline{s}_i\|Liso​=∣G∣1​∑i=1∣G∣​∥si​−1⋅si​∥, which is to penalize the skinny Gaussians.

Experiment

Replica

This dataset is synthesized, and it is used for evaluating the performance of the proposed method.

DEVD

This dataset is real-world and created by the authors, which contains 6 scenes with different motion speeds.

上一篇

Linux

下一篇

VGGT

©2025 By Tesla
Quiet主题