Researchers have introduced SETA (Sparse Expert Framework), a new approach to continual learning in large language models that addresses catastrophic forgetting through adaptive sparse subspace decomposition. Published on arXiv on June 5, 2026, by Fatema Siddika, Md Anwar Hossen, Tanwi Mallick, and Ali Jannesari, the framework achieves competitive performance while using only 0.98% to 1.25% of total model capacity as trainable parameters.
Addressing the Plasticity-Stability Dilemma
Continual learning in LLMs faces a fundamental challenge known as the plasticity-stability dilemma, where acquiring new capabilities often leads to catastrophic forgetting of previous knowledge. Existing methods typically treat parameters uniformly, failing to distinguish between specific task knowledge and shared capabilities. This uniform treatment forces tasks to compete for the same parameters, degrading performance on earlier learned tasks.
Separating Task-Specific and Shared Knowledge
SETA resolves the plasticity-stability conflict by separating knowledge into task-specific expert modules that isolate task-specific patterns and shared experts that capture common features across tasks. Unlike standard updates where tasks compete for parameters, SETA assigns unique experts per task. The system maintains this separation through adaptive elastic anchoring and routing-aware regularization, protecting shared knowledge at both weight and routing levels. A unified gating network automatically retrieves the correct expert combination during inference.
Extreme Parameter Efficiency with Strong Performance
Extensive experiments across diverse domain-specific benchmarks demonstrate SETA's effectiveness on LLaMA-2 7B and Qwen3-4B models. The framework achieves competitive or superior overall performance relative to state-of-the-art continual learning baselines, with particularly strong retention of early-task knowledge and improved backward transfer. Trainable parameters increase marginally from 6.34M for the first task to 8.10M by the sixth task, representing only 0.98% to 1.25% of total model capacity.
Practical Implications for Production Systems
Continual learning is critical for deployed LLMs that need to learn new domains such as medical, legal, and code without forgetting prior capabilities. SETA's parameter efficiency—requiring less than 1.5% trainable parameters—makes it practical for production systems where model size and training costs are significant concerns. The approach enables organizations to continuously adapt their models to new domains while preserving existing functionality.
Key Takeaways
- SETA resolves catastrophic forgetting in LLM continual learning through adaptive sparse subspace decomposition separating task-specific and shared knowledge
- The framework assigns unique expert modules per task rather than forcing tasks to compete for the same parameters
- Achieves competitive or superior performance on LLaMA-2 7B and Qwen3-4B with only 0.98% to 1.25% trainable parameters across six tasks
- Demonstrates particularly strong retention of early-task knowledge and improved backward transfer on domain-specific benchmarks
- Parameter efficiency below 1.5% makes the approach practical for production systems requiring continuous domain adaptation