Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 5 days ago • 124
view post Post 2528 You can now do reinforcement learning training with 7× longer context and no accuracy loss, via our new batching algorithms.Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.Blog: https://unsloth.ai/docs/new/grpo-long-context See translation 🔥 8 8 ❤️ 4 4 🚀 3 3 + Reply
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 Text Generation • 32B • Updated about 8 hours ago • 303k • 576
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 12 days ago • 121