Meta's Reality Labs published research on April 2, 2026, introducing Large-Scale Codec Avatars (LCA), the first application of foundation model pre-training paradigms to 3D avatar generation. The paper, accepted to CVPR 2026, presents a pre/post-training approach that pretrains on 1 million in-the-wild videos before post-training on high-quality studio data. The 46-author team demonstrated photorealistic avatars with finger-level articulation and emergent capabilities including relightability and garment physics.
Pre/Post-Training Paradigm Solves Quality-Scale Trade-off
High-quality 3D avatar modeling traditionally faces a critical trade-off. Multi-view studio data enables high-fidelity modeling with precise expression control but struggles to generalize due to limited scale and domain gaps. Conversely, large-scale models trained on millions of in-the-wild samples generalize across identities but produce low-quality avatars due to inherent 3D ambiguities.
LCA resolves this by borrowing the pre/post-training paradigm proven successful with large language models and vision foundation models. The team pretrains on 1 million in-the-wild videos to learn broad priors over appearance and geometry, then post-trains on high-quality curated data to enhance expressivity and fidelity. This represents the first successful application of this approach to 3D avatar modeling at scale.
System Delivers Precise Control Across Demographics and Styles
The LCA system demonstrates several key capabilities:
- Generalizes to world-scale populations in a feedforward manner for efficient inference
- Works across diverse hair styles, clothing, and demographics
- Provides precise, fine-grained facial expressions
- Enables finger-level articulation control
- Maintains strong identity preservation
The feedforward generalization allows the model to process new subjects efficiently without requiring per-subject optimization, making it practical for real-world VR applications.
Emergent Capabilities Suggest Deep 3D Understanding
The research team observed several capabilities that emerged without direct supervision:
- Generalization to relightability (responding correctly to different lighting conditions)
- Support for loose garments in unconstrained inputs
- Zero-shot robustness to stylized imagery
These emergent properties suggest the model has learned meaningful 3D priors from scale alone, similar to how large language models develop reasoning capabilities through pretraining. The emergence of relightability and garment physics without explicit training on these features indicates the model captures fundamental physical principles.
Massive Team Effort Demonstrates Research Scale
The paper lists 46 authors including Junxuan Li, Rawal Khirodkar, and Chengan He from Meta's Reality Labs. The lead researcher described it on X as "a massive team effort" that resulted in "learning photorealistic avatars from millions of videos." The post received 282 likes and 49 retweets, indicating strong interest from the research community.
The project page includes demo videos showing the quality of generated avatars across different subjects and conditions. The research represents a significant advance in bringing photorealistic avatar technology closer to practical deployment in virtual reality applications.
Key Takeaways
- Meta published Large-Scale Codec Avatars research on April 2, 2026, accepted to CVPR 2026, introducing pre/post-training for 3D avatar generation
- The system pretrains on 1 million in-the-wild videos before post-training on high-quality studio data for enhanced fidelity
- LCA provides finger-level articulation control and generalizes across hair styles, clothing, and demographics
- Emergent capabilities include relightability and garment physics support without direct supervision
- The 46-author team from Meta Reality Labs demonstrates the massive research effort required for foundation-scale avatar models