ECO: Quantized Training without Full-Precision Master Weights Paper • 2601.22101 • Published 1 day ago • 3
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts Paper • 2601.22156 • Published 1 day ago • 4
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 1 day ago • 22
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published 2 days ago • 76
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents Paper • 2601.20975 • Published 2 days ago • 6
Qwen/Qwen3-ForcedAligner-0.6B Automatic Speech Recognition • 0.9B • Updated about 22 hours ago • 249 • 34
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning Paper • 2601.19280 • Published 4 days ago • 7