LongCoT Benchmark Reveals Frontier Models Score Under 10% on Long-Horizon Reasoning

Thursday, April 16, 2026