ShiqiangWoo
's Collections
20250903
updated
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
•
2509.02547
•
Published
•
228
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
Tool-Integrated Reasoning
Paper
•
2509.02479
•
Published
•
83
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models
for Document Conversion
Paper
•
2509.01215
•
Published
•
50
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper
•
2509.00676
•
Published
•
84
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn
Reinforcement Learning
Paper
•
2509.02544
•
Published
•
124
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper
•
2509.01055
•
Published
•
76
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Paper
•
2509.02208
•
Published
•
42
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR
Paper
•
2509.02522
•
Published
•
25
Kwai Keye-VL 1.5 Technical Report
Paper
•
2509.01563
•
Published
•
37
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task
Arithmetic
Paper
•
2509.01363
•
Published
•
58
Jointly Reinforcing Diversity and Quality in Language Model Generations
Paper
•
2509.02534
•
Published
•
24
GenCompositor: Generative Video Compositing with Diffusion Transformer
Paper
•
2509.02460
•
Published
•
25
OpenVision 2: A Family of Generative Pretrained Visual Encoders for
Multimodal Learning
Paper
•
2509.01644
•
Published
•
33
Attributes as Textual Genes: Leveraging LLMs as Genetic Algorithm
Simulators for Conditional Synthetic Data Generation
Paper
•
2509.02040
•
Published
•
14
M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via
Self-Supervision
Paper
•
2509.01360
•
Published
•
11
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in
Diverse Adventure Games
Paper
•
2509.01052
•
Published
•
21
Universal Deep Research: Bring Your Own Model and Strategy
Paper
•
2509.00244
•
Published
•
13
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image
Editing
Paper
•
2509.01984
•
Published
•
6
Fantastic Pretraining Optimizers and Where to Find Them
Paper
•
2509.02046
•
Published
•
13
MedDINOv3: How to adapt vision foundation models for medical image
segmentation?
Paper
•
2509.02379
•
Published
•
1
Improving Large Vision and Language Models by Learning from a Panel of
Peers
Paper
•
2509.01610
•
Published
•
2
Towards More Diverse and Challenging Pre-training for Point Cloud
Learning: Self-Supervised Cross Reconstruction with Decoupled Views
Paper
•
2509.01250
•
Published
•
2
SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction
Paper
•
2509.00581
•
Published
•
10
C-DiffDet+: Fusing Global Scene Context with Generative Denoising for
High-Fidelity Object Detection
Paper
•
2509.00578
•
Published
•
1
Metis: Training Large Language Models with Advanced Low-Bit Quantization
Paper
•
2509.00404
•
Published
•
6
FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable
Diffusion Models
Paper
•
2508.20586
•
Published
•
3