BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects


NVIDIA
CVPR 2023

Abstract

We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. Our method works for arbitrary rigid objects, even when visual texture is largely absent. The object is assumed to be segmented in the first frame only. No additional information is required, and no assumption is made about the interaction agent. Key to our method is a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation capturing both geometry and appearance. A dynamic pool of posed memory frames is automatically maintained to facilitate communication between these threads. Our approach handles challenging sequences with large pose changes, partial and full occlusion, untextured surfaces, and specular highlights. We show results on HO3D, YCBInEOAT, and BEHAVE datasets, demonstrating that our method significantly outperforms existing approaches.




Video




Overview

Framework overview. First, features are matched between consecutive segmented images, to obtain a coarse pose estimate (Sec. 3.1). Some of these posed frames are stored in a memory pool, to be used and refined later (Sec. 3.2). A pose graph is dynamically created from a subset of the memory pool (Sec. 3.3); online optimization refines all the poses in the graph jointly with the current pose. These updated poses are then stored back in the memory pool. Finally, all the posed frames in the memory pool are used to learn a Neural Object Field (in a separate thread) that models both geometry and visual texture (Sec. 3.4) of the object, while adjusting their previously estimated poses.

Interactive Results
Dataset
HO3D AP13 HO3D SM1 BEHAVE Chair BEHAVE Table YCBinEOAT
Ground Truth Ours BundleTrack DROID SDF2SDF NICESLAM KinectFusion

Ground Truth Ours BundleTrack DROID SDF2SDF NICESLAM KinectFusion

automatically synchronized with video above

Our Method in the Wild
Pumpkin doll (captured by iPhone)

Tracking and reconstruction over time

Final reconstruction (interactable)
Milk jug (captured by Intel RealSense)

Tracking and reconstruction over time

Final reconstruction (interactable)
Static Scene Results
Our method not only works for the more challenging dynamic scenes, but also applies to commonly considered static scenes (moving camera). We achieve better or comparable results to those methods specially designed for static scenes.
Ours KinectFusion DROID-SLAM NICE-SLAM

BibTeX

@article{wen2023bundlesdf,
  title={BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects},
  author={Wen, Bowen and Tremblay, Jonathan and Blukis, Valts and Tyree, Stephen and Muller, Thomas and Evans, Alex and Fox, Dieter and Kautz, Jan and Birchfield, Stan},
  journal={CVPR},
  year={2023}
}
Contact
For questions, feel free to contact Bowen Wen (bowenw@nvidia.com)