HariharaIII
's Collections
Papers
updated
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
•
2504.05118
•
Published
•
26
T1: Tool-integrated Self-verification for Test-time Compute Scaling in
Small Language Models
Paper
•
2504.04718
•
Published
•
42
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge
Refinement
Paper
•
2504.03561
•
Published
•
18
Concept Lancet: Image Editing with Compositional Representation
Transplant
Paper
•
2504.02828
•
Published
•
16
ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning
Paper
•
2503.22738
•
Published
•
17
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated
Agent-Human Interplay
Paper
•
2504.03601
•
Published
•
17
LiveVQA: Live Visual Knowledge Seeking
Paper
•
2504.05288
•
Published
•
15
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning
(v1)
Paper
•
2504.03151
•
Published
•
15
Generative Evaluation of Complex Reasoning in Large Language Models
Paper
•
2504.02810
•
Published
•
14
Tuning-Free Image Editing with Fidelity and Editability via Unified
Latent Diffusion Model
Paper
•
2504.05594
•
Published
•
11
MedSAM2: Segment Anything in 3D Medical Images and Videos
Paper
•
2504.03600
•
Published
•
10
DiaTool-DPO: Multi-Turn Direct Preference Optimization for
Tool-Augmented Large Language Models
Paper
•
2504.02882
•
Published
•
7
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Paper
•
2504.05520
•
Published
•
11
3D Scene Understanding Through Local Random Access Sequence Modeling
Paper
•
2504.03875
•
Published
•
5
Distillation and Refinement of Reasoning in Small Language Models for
Document Re-ranking
Paper
•
2504.03947
•
Published
•
4
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language
Model
Paper
•
2504.03770
•
Published
•
2
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing
Skills
Paper
•
2504.07079
•
Published
•
12
Rethinking Reflection in Pre-Training
Paper
•
2504.04022
•
Published
•
80
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
•
2503.24290
•
Published
•
62
Understanding R1-Zero-Like Training: A Critical Perspective
Paper
•
2503.20783
•
Published
•
59
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist
Policy
Paper
•
2503.24388
•
Published
•
29
Agentic Knowledgeable Self-awareness
Paper
•
2504.03553
•
Published
•
27
Landscape of Thoughts: Visualizing the Reasoning Process of Large
Language Models
Paper
•
2503.22165
•
Published
•
28
Agent S2: A Compositional Generalist-Specialist Framework for Computer
Use Agents
Paper
•
2504.00906
•
Published
•
27
Effectively Controlling Reasoning Models through Thinking Intervention
Paper
•
2503.24370
•
Published
•
19
Expanding RL with Verifiable Rewards Across Diverse Domains
Paper
•
2503.23829
•
Published
•
23
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for
Large Language Models
Paper
•
2503.24377
•
Published
•
18
ActionStudio: A Lightweight Framework for Data and Training of Large
Action Models
Paper
•
2503.22673
•
Published
•
12
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for
Zero-Shot Speech Synthesis
Paper
•
2502.18924
•
Published
•
16
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Paper
•
2504.01871
•
Published
•
12
START: Self-taught Reasoner with Tools
Paper
•
2503.04625
•
Published
•
113
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper
•
2503.04130
•
Published
•
96
SmolDocling: An ultra-compact vision-language model for end-to-end
multi-modal document conversion
Paper
•
2503.11576
•
Published
•
127
SmolVLM: Redefining small and efficient multimodal models
Paper
•
2504.05299
•
Published
•
202
UniF^2ace: Fine-grained Face Understanding and Generation
with Unified Multimodal Models
Paper
•
2503.08120
•
Published
•
31
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper
•
2503.12605
•
Published
•
35
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster
Paper
•
2503.09662
•
Published
•
33
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
•
2503.10291
•
Published
•
36
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Paper
•
2503.16418
•
Published
•
36
Modifying Large Language Model Post-Training for Diverse Creative
Writing
Paper
•
2503.17126
•
Published
•
36
Think Before Recommend: Unleashing the Latent Reasoning Power for
Sequential Recommendation
Paper
•
2503.22675
•
Published
•
36
MagicInfinite: Generating Infinite Talking Videos with Your Words and
Voice
Paper
•
2503.05978
•
Published
•
36
API Agents vs. GUI Agents: Divergence and Convergence
Paper
•
2503.11069
•
Published
•
36
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four
Habits of Highly Effective STaRs
Paper
•
2503.01307
•
Published
•
38
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play
Visual Games with Keyboards and Mouse
Paper
•
2503.16365
•
Published
•
40
DeepSolution: Boosting Complex Engineering Solution Design via
Tree-based Exploration and Bi-point Thinking
Paper
•
2502.20730
•
Published
•
38
Process-based Self-Rewarding Language Models
Paper
•
2503.03746
•
Published
•
39
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
•
2503.21614
•
Published
•
42
EgoLife: Towards Egocentric Life Assistant
Paper
•
2503.03803
•
Published
•
46