-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.68k • 1.23k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 354 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
Collections
Discover the best community collections!
Collections including paper arxiv:2512.10430
-
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 228 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 39 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 141
-
T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground
Paper • 2512.10430 • Published • 113 -
X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale
Paper • 2512.04537 • Published • 6 -
Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents
Paper • 2512.08870 • Published • 3 -
Evaluating Gemini Robotics Policies in a Veo World Simulator
Paper • 2512.10675 • Published • 17
-
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 31 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 40 -
BitNet Distillation
Paper • 2510.13998 • Published • 55 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 49
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.68k • 1.23k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 354 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground
Paper • 2512.10430 • Published • 113 -
X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale
Paper • 2512.04537 • Published • 6 -
Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents
Paper • 2512.08870 • Published • 3 -
Evaluating Gemini Robotics Policies in a Veo World Simulator
Paper • 2512.10675 • Published • 17
-
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 31 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 40 -
BitNet Distillation
Paper • 2510.13998 • Published • 55 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 49
-
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 228 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 39 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 141