-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2503.11576
-
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
Paper • 2512.14614 • Published • 67 -
Agent READMEs: An Empirical Study of Context Files for Agentic Coding
Paper • 2511.12884 • Published • 17 -
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper • 2512.11253 • Published • 34 -
DeepCode: Open Agentic Coding
Paper • 2512.07921 • Published • 31
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 139 -
CommonForms: A Large, Diverse Dataset for Form Field Detection
Paper • 2509.16506 • Published • 19 -
Automated Structured Radiology Report Generation with Rich Clinical Context
Paper • 2510.00428 • Published • 7 -
Extract-0: A Specialized Language Model for Document Information Extraction
Paper • 2509.22906 • Published
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
Paper • 2512.14614 • Published • 67 -
Agent READMEs: An Empirical Study of Context Files for Agentic Coding
Paper • 2511.12884 • Published • 17 -
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper • 2512.11253 • Published • 34 -
DeepCode: Open Agentic Coding
Paper • 2512.07921 • Published • 31
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 139 -
CommonForms: A Large, Diverse Dataset for Form Field Detection
Paper • 2509.16506 • Published • 19 -
Automated Structured Radiology Report Generation with Rich Clinical Context
Paper • 2510.00428 • Published • 7 -
Extract-0: A Specialized Language Model for Document Information Extraction
Paper • 2509.22906 • Published