Meta Releases Large-Scale Codec Avatars with Pre/Post-Training for Photorealistic VR

Meta's Reality Labs published research on April 2, 2026, introducing Large-Scale Codec Avatars (LCA), the first application of foundation model pre-training paradigms to 3D avatar generation. The paper, accepted to CVPR 2026, presents a pre/post-training approach that pretrains on 1 million in-the-wild videos before post-training on high-quality studio data. The 46-author team demonstrated photorealistic avatars with finger-level articulation and emergent capabilities including relightability and garment physics.

Pre/Post-Training Paradigm Solves Quality-Scale Trade-off

High-quality 3D avatar modeling traditionally faces a critical trade-off. Multi-view studio data enables high-fidelity modeling with precise expression control but struggles to generalize due to limited scale and domain gaps. Conversely, large-scale models trained on millions of in-the-wild samples generalize across identities but produce low-quality avatars due to inherent 3D ambiguities.

LCA resolves this by borrowing the pre/post-training paradigm proven successful with large language models and vision foundation models. The team pretrains on 1 million in-the-wild videos to learn broad priors over appearance and geometry, then post-trains on high-quality curated data to enhance expressivity and fidelity. This represents the first successful application of this approach to 3D avatar modeling at scale.

System Delivers Precise Control Across Demographics and Styles

The LCA system demonstrates several key capabilities:

Generalizes to world-scale populations in a feedforward manner for efficient inference
Works across diverse hair styles, clothing, and demographics
Provides precise, fine-grained facial expressions
Enables finger-level articulation control
Maintains strong identity preservation

The feedforward generalization allows the model to process new subjects efficiently without requiring per-subject optimization, making it practical for real-world VR applications.

Emergent Capabilities Suggest Deep 3D Understanding

The research team observed several capabilities that emerged without direct supervision:

Generalization to relightability (responding correctly to different lighting conditions)
Support for loose garments in unconstrained inputs
Zero-shot robustness to stylized imagery

These emergent properties suggest the model has learned meaningful 3D priors from scale alone, similar to how large language models develop reasoning capabilities through pretraining. The emergence of relightability and garment physics without explicit training on these features indicates the model captures fundamental physical principles.

Massive Team Effort Demonstrates Research Scale

The paper lists 46 authors including Junxuan Li, Rawal Khirodkar, and Chengan He from Meta's Reality Labs. The lead researcher described it on X as "a massive team effort" that resulted in "learning photorealistic avatars from millions of videos." The post received 282 likes and 49 retweets, indicating strong interest from the research community.

The project page includes demo videos showing the quality of generated avatars across different subjects and conditions. The research represents a significant advance in bringing photorealistic avatar technology closer to practical deployment in virtual reality applications.

Key Takeaways

Meta published Large-Scale Codec Avatars research on April 2, 2026, accepted to CVPR 2026, introducing pre/post-training for 3D avatar generation
The system pretrains on 1 million in-the-wild videos before post-training on high-quality studio data for enhanced fidelity
LCA provides finger-level articulation control and generalizes across hair styles, clothing, and demographics
Emergent capabilities include relightability and garment physics support without direct supervision
The 46-author team from Meta Reality Labs demonstrates the massive research effort required for foundation-scale avatar models

Pre/Post-Training Paradigm Solves Quality-Scale Trade-off

System Delivers Precise Control Across Demographics and Styles

The LCA system demonstrates several key capabilities:

Generalizes to world-scale populations in a feedforward manner for efficient inference

Works across diverse hair styles, clothing, and demographics

Provides precise, fine-grained facial expressions

Enables finger-level articulation control

Maintains strong identity preservation

The feedforward generalization allows the model to process new subjects efficiently without requiring per-subject optimization, making it practical for real-world VR applications.

Emergent Capabilities Suggest Deep 3D Understanding

The research team observed several capabilities that emerged without direct supervision:

Generalization to relightability (responding correctly to different lighting conditions)

Support for loose garments in unconstrained inputs

Zero-shot robustness to stylized imagery

Massive Team Effort Demonstrates Research Scale

Key Takeaways

Meta published Large-Scale Codec Avatars research on April 2, 2026, accepted to CVPR 2026, introducing pre/post-training for 3D avatar generation

The system pretrains on 1 million in-the-wild videos before post-training on high-quality studio data for enhanced fidelity

LCA provides finger-level articulation control and generalizes across hair styles, clothing, and demographics

Emergent capabilities include relightability and garment physics support without direct supervision

The 46-author team from Meta Reality Labs demonstrates the massive research effort required for foundation-scale avatar models