Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.11576

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Paper • 2512.14614 • Published 21 days ago • 67
Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Paper • 2511.12884 • Published Nov 17, 2025 • 17
PersonaLive! Expressive Portrait Image Animation for Live Streaming

Paper • 2512.11253 • Published 25 days ago • 34
DeepCode: Open Agentic Coding

Paper • 2512.07921 • Published 29 days ago • 31

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Paper • 2409.03420 • Published Sep 5, 2024 • 26

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 139
CommonForms: A Large, Diverse Dataset for Form Field Detection

Paper • 2509.16506 • Published Sep 20, 2025 • 19
Automated Structured Radiology Report Generation with Rich Clinical Context

Paper • 2510.00428 • Published Oct 1, 2025 • 7
Extract-0: A Specialized Language Model for Document Information Extraction

Paper • 2509.22906 • Published Sep 26, 2025

SmolDocling datasets

Datasets used to train SmolDocling

HuggingFaceM4/DoclingMatix

Viewer • Updated Jul 31, 2025 • 1.27M • 4.78k • 48
docling-project/SynthFormulaNet

Viewer • Updated Jul 31, 2025 • 6.45M • 1.09k • 17
docling-project/SynthChartNet

Viewer • Updated Jul 15, 2025 • 1.98M • 1.15k • 14
docling-project/SynthCodeNet

Viewer • Updated Jul 16, 2025 • 9.33M • 3.78k • 11

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Paper • 2512.14614 • Published 21 days ago • 67
Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Paper • 2511.12884 • Published Nov 17, 2025 • 17
PersonaLive! Expressive Portrait Image Animation for Live Streaming

Paper • 2512.11253 • Published 25 days ago • 34
DeepCode: Open Agentic Coding

Paper • 2512.07921 • Published 29 days ago • 31

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 139
CommonForms: A Large, Diverse Dataset for Form Field Detection

Paper • 2509.16506 • Published Sep 20, 2025 • 19
Automated Structured Radiology Report Generation with Rich Clinical Context

Paper • 2510.00428 • Published Oct 1, 2025 • 7
Extract-0: A Specialized Language Model for Document Information Extraction

Paper • 2509.22906 • Published Sep 26, 2025

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Paper • 2409.03420 • Published Sep 5, 2024 • 26

SmolDocling datasets

Datasets used to train SmolDocling

HuggingFaceM4/DoclingMatix

Viewer • Updated Jul 31, 2025 • 1.27M • 4.78k • 48
docling-project/SynthFormulaNet

Viewer • Updated Jul 31, 2025 • 6.45M • 1.09k • 17
docling-project/SynthChartNet

Viewer • Updated Jul 15, 2025 • 1.98M • 1.15k • 14
docling-project/SynthCodeNet

Viewer • Updated Jul 16, 2025 • 9.33M • 3.78k • 11

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 127

Previous
1
2
3
4
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs