SimpleNews.ai

ZipMap Achieves 20x Faster 3D Reconstruction with Linear-Time Complexity

Thursday, March 5, 2026

Researchers from Google and UC Berkeley have published ZipMap, a new 3D reconstruction model that processes over 700 frames in under 10 seconds on a single GPU—more than 20 times faster than current state-of-the-art methods while maintaining or exceeding their accuracy. The breakthrough addresses a critical bottleneck in feed-forward transformer models, which previously scaled quadratically with input image count.

Linear Complexity Replaces Quadratic Bottleneck

Previous state-of-the-art methods like VGGT (Visual Geometry Grounded Transformer) suffer from O(n²) computational complexity, requiring processing of all image pairs in a collection. For large-scale reconstructions with hundreds of frames, this quadratic scaling becomes computationally prohibitive. ZipMap introduces a stateful feed-forward architecture that achieves O(n) linear complexity by compressing entire image sequences into compact hidden scene states.

The model employs test-time training layers to "zip" image collections into this compact representation in a single forward pass. This stateful approach enables bidirectional 3D reconstruction without sacrificing the accuracy gains that made quadratic-time methods successful.

Performance Metrics and Real-Time Capabilities

ZipMap demonstrates significant performance improvements across multiple dimensions:

  • Reconstructs 700+ frames in under 10 seconds on a single H100 GPU
  • Achieves over 20x speedup compared to VGGT
  • Maintains accuracy matching or exceeding quadratic-time methods
  • Supports real-time scene-state querying from compact hidden representations
  • Extends to sequential streaming reconstruction for continuous video

The compact scene representation allows querying 3D information without full reconstruction, opening applications in robotics, augmented reality, virtual reality, and autonomous systems where real-time performance is essential.

Research Team and Availability

The paper, published March 4, 2026, on arXiv, represents a collaboration between Google Research and UC Berkeley. Authors include Haian Jin, Rundi Wu, Tianyuan Zhang, Ruiqi Gao, Jonathan T. Barron, Noah Snavely, and Aleksander Holynski.

Key Takeaways

  • ZipMap achieves linear O(n) computational complexity compared to quadratic O(n²) scaling in previous state-of-the-art 3D reconstruction methods
  • The model reconstructs over 700 frames in under 10 seconds on a single H100 GPU, representing a 20x speedup over VGGT
  • ZipMap matches or exceeds the accuracy of slower quadratic-time methods while maintaining real-time performance
  • The stateful architecture compresses entire image sequences into compact hidden scene states that enable querying without full reconstruction
  • Applications include robotics, AR/VR, and autonomous systems requiring real-time 3D scene understanding