Researchers have introduced NF-CoT, a novel framework that enables large language models to perform intermediate reasoning in continuous latent space while maintaining the probabilistic properties of traditional chain-of-thought prompting. Published on arXiv by a team led by Guancheng Tu, the approach uses normalizing flows to create tractable probability distributions over compact continuous thoughts, addressing long-standing limitations of both explicit textual reasoning and prior latent reasoning methods.
Traditional Chain-of-Thought Reasoning Forces Inefficient Serial Processing
Current language models rely heavily on explicit chain-of-thought (CoT) prompting, where models verbalize each reasoning step as discrete tokens before proceeding. While effective, this approach forces all intermediate computation through a communication-oriented token stream, requiring verbalization even when the underlying cognitive update is semantic, uncertain, or only partially formed. This serial, discrete bottleneck proves inefficient for certain reasoning tasks that might benefit from higher-bandwidth continuous representations.
NF-CoT Preserves Key Advantages While Enabling Continuous Reasoning
The NF-CoT framework addresses these limitations by modeling continuous thoughts with normalizing flows—a class of generative models that provide exact likelihood computation. The architecture instantiates a TARFlow-style normalizing flow inside the LLM backbone, allowing the model to generate continuous-thought positions through an NF head while maintaining standard language model head generation for text positions within the same causal stream.
Key technical advantages include:
- Exact likelihood computation for latent thoughts (no approximations required)
- Native left-to-right generation compatible with existing KV-cache decoding
- Support for direct policy-gradient optimization in latent reasoning space
- Seamless integration of continuous and discrete reasoning in a single causal sequence
Code Generation Benchmarks Show Improved Performance and Efficiency
On code-generation tasks, NF-CoT demonstrates tangible improvements over existing approaches. The framework achieves higher pass rates compared to explicit-CoT baselines while outperforming prior latent-reasoning methods. Crucially, NF-CoT substantially reduces intermediate-reasoning costs by requiring fewer tokens for the reasoning process, translating to computational savings during inference.
The continuous thoughts are distilled from explicit CoT during training, allowing the model to learn compact representations of reasoning patterns that would otherwise require lengthy token sequences. This distillation process enables the model to compress multi-step reasoning into efficient continuous states without losing the ability to generate coherent explanations when needed.
Theoretical Contributions Open Path to Hybrid Reasoning Systems
NF-CoT's theoretical contribution lies in demonstrating that latent reasoning and discrete text generation can coexist in a single probabilistic framework without sacrificing the advantages of either approach. The normalizing flow formulation provides the mathematical rigor needed for tractable likelihood computation—essential for training and policy optimization—while maintaining the flexibility of continuous representations for efficient intermediate reasoning.
This work suggests that the dichotomy between explicit textual reasoning and implicit latent computation may be a false choice. By enabling models to fluidly transition between continuous latent reasoning and discrete token generation within the same causal stream, NF-CoT opens possibilities for more sophisticated hybrid reasoning systems that adaptively choose the most efficient representation for each reasoning step.
Key Takeaways
- NF-CoT uses normalizing flows to enable tractable continuous reasoning in LLMs while maintaining compatibility with standard autoregressive generation and KV-cache decoding
- The framework achieves higher pass rates on code-generation benchmarks compared to explicit chain-of-thought baselines while substantially reducing intermediate-reasoning token costs
- NF-CoT provides exact likelihood computation for latent thoughts and supports direct policy-gradient optimization, unlike approximation-based latent reasoning methods
- The approach unifies continuous latent reasoning and discrete text generation in a single causal sequence, allowing models to choose the most efficient representation for each reasoning step
- Published on arXiv on June 4, 2026, by Guancheng Tu and collaborators from multiple institutions