ZipMap Achieves 20x Faster 3D Reconstruction with Linear-Time Complexity

Researchers from Google and UC Berkeley have published ZipMap, a new 3D reconstruction model that processes over 700 frames in under 10 seconds on a single GPU—more than 20 times faster than current state-of-the-art methods while maintaining or exceeding their accuracy. The breakthrough addresses a critical bottleneck in feed-forward transformer models, which previously scaled quadratically with input image count.

Linear Complexity Replaces Quadratic Bottleneck

Previous state-of-the-art methods like VGGT (Visual Geometry Grounded Transformer) suffer from O(n²) computational complexity, requiring processing of all image pairs in a collection. For large-scale reconstructions with hundreds of frames, this quadratic scaling becomes computationally prohibitive. ZipMap introduces a stateful feed-forward architecture that achieves O(n) linear complexity by compressing entire image sequences into compact hidden scene states.

The model employs test-time training layers to "zip" image collections into this compact representation in a single forward pass. This stateful approach enables bidirectional 3D reconstruction without sacrificing the accuracy gains that made quadratic-time methods successful.

Performance Metrics and Real-Time Capabilities

ZipMap demonstrates significant performance improvements across multiple dimensions:

Reconstructs 700+ frames in under 10 seconds on a single H100 GPU
Achieves over 20x speedup compared to VGGT
Maintains accuracy matching or exceeding quadratic-time methods
Supports real-time scene-state querying from compact hidden representations
Extends to sequential streaming reconstruction for continuous video

The compact scene representation allows querying 3D information without full reconstruction, opening applications in robotics, augmented reality, virtual reality, and autonomous systems where real-time performance is essential.

Research Team and Availability

The paper, published March 4, 2026, on arXiv, represents a collaboration between Google Research and UC Berkeley. Authors include Haian Jin, Rundi Wu, Tianyuan Zhang, Ruiqi Gao, Jonathan T. Barron, Noah Snavely, and Aleksander Holynski.

Key Takeaways

ZipMap achieves linear O(n) computational complexity compared to quadratic O(n²) scaling in previous state-of-the-art 3D reconstruction methods
The model reconstructs over 700 frames in under 10 seconds on a single H100 GPU, representing a 20x speedup over VGGT
ZipMap matches or exceeds the accuracy of slower quadratic-time methods while maintaining real-time performance
The stateful architecture compresses entire image sequences into compact hidden scene states that enable querying without full reconstruction
Applications include robotics, AR/VR, and autonomous systems requiring real-time 3D scene understanding

Linear Complexity Replaces Quadratic Bottleneck

Performance Metrics and Real-Time Capabilities

ZipMap demonstrates significant performance improvements across multiple dimensions:

Reconstructs 700+ frames in under 10 seconds on a single H100 GPU

Achieves over 20x speedup compared to VGGT

Maintains accuracy matching or exceeding quadratic-time methods

Supports real-time scene-state querying from compact hidden representations

Extends to sequential streaming reconstruction for continuous video

Key Takeaways

ZipMap achieves linear O(n) computational complexity compared to quadratic O(n²) scaling in previous state-of-the-art 3D reconstruction methods

The model reconstructs over 700 frames in under 10 seconds on a single H100 GPU, representing a 20x speedup over VGGT

ZipMap matches or exceeds the accuracy of slower quadratic-time methods while maintaining real-time performance

The stateful architecture compresses entire image sequences into compact hidden scene states that enable querying without full reconstruction

Applications include robotics, AR/VR, and autonomous systems requiring real-time 3D scene understanding