Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion
Abstract
VisualSync aligns unposed, unsynchronized videos using multi-view dynamics and epipolar constraints, outperforming existing methods with millisecond accuracy.
Today, people can easily record memorable moments, ranging from concerts, sports events, lectures, family gatherings, and birthday parties with multiple consumer cameras. However, synchronizing these cross-camera streams remains challenging. Existing methods assume controlled settings, specific targets, manual correction, or costly hardware. We present VisualSync, an optimization framework based on multi-view dynamics that aligns unposed, unsynchronized videos at millisecond accuracy. Our key insight is that any moving 3D point, when co-visible in two cameras, obeys epipolar constraints once properly synchronized. To exploit this, VisualSync leverages off-the-shelf 3D reconstruction, feature matching, and dense tracking to extract tracklets, relative poses, and cross-view correspondences. It then jointly minimizes the epipolar error to estimate each camera's time offset. Experiments on four diverse, challenging datasets show that VisualSync outperforms baseline methods, achieving an median synchronization error below 50 ms.
Community
VisualSync (NeurIPS 2025) aligns unsynchronized multi-view videos by matching object motion with epipolar cues. The synchronized outputs can benefit dynamic reconstruction, novel view synthesis, and multi-view data engines.
Project Page: https://stevenlsw.github.io/visualsync/
Code: https://github.com/stevenlsw/visualsync
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Kineo: Calibration-Free Metric Motion Capture From Sparse RGB Cameras (2025)
- MV-TAP: Tracking Any Point in Multi-View Videos (2025)
- C4D: 4D Made from 3D through Dual Correspondences (2025)
- RocSync: Millisecond-Accurate Temporal Synchronization for Heterogeneous Camera Systems (2025)
- DKPMV: Dense Keypoints Fusion from Multi-View RGB Frames for 6D Pose Estimation of Textureless Objects (2025)
- EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting Scenes (2025)
- Human3R: Everyone Everywhere All at Once (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper