MMLab@NTU

university

https://www.mmlab-ntu.com/

MMLabNTU

Activity Feed Request to join this org

AI & ML interests

Computer Vision and Deep Learning

Recent Activity

MoonQiu submitted a paper 15 days ago

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

ldkong submitted a paper 21 days ago

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

jcenaa submitted a paper 24 days ago

VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer

View all activity

Papers

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

View all Papers

MoonQiu

submitted a paper to Daily Papers 15 days ago

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Paper • 2512.21338 • Published 15 days ago • 21

ldkong

submitted a paper to Daily Papers 21 days ago

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Paper • 2512.16760 • Published 21 days ago • 12

jcenaa

submitted a paper to Daily Papers 24 days ago

VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer

Paper • 2512.11891 • Published about 1 month ago • 8

ldkong

authored 8 papers 27 days ago

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Paper • 2405.05258 • Published May 8, 2024

Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

Paper • 2507.05260 • Published Jul 7, 2025

EventFly: Event Camera Perception from Ground to the Sky

Paper • 2503.19916 • Published Mar 25, 2025

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

Paper • 2405.14870 • Published May 23, 2024

Veila: Panoramic LiDAR Generation from a Monocular RGB Image

Paper • 2508.03690 • Published Aug 5, 2025

SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining

Paper • 2503.19912 • Published Mar 25, 2025

Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation

Paper • 2407.15282 • Published Jul 21, 2024

SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting

Paper • 2510.26796 • Published Oct 30, 2025

yumingj

authored a paper about 2 months ago

RynnVLA-002: A Unified Vision-Language-Action and World Model

Paper • 2511.17502 • Published Nov 21, 2025 • 25

Ziqi

authored 3 papers about 2 months ago

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Paper • 2510.13759 • Published Oct 15, 2025 • 10

RealDPO: Real or Not Real, that is the Preference

Paper • 2510.14955 • Published Oct 16, 2025 • 6

Simulating the Visual World with Artificial Intelligence: A Roadmap

Paper • 2511.08585 • Published Nov 11, 2025 • 29

ldkong

authored a paper 2 months ago

3EED: Ground Everything Everywhere in 3D

Paper • 2511.01755 • Published Nov 3, 2025 • 11

Ziqi

authored a paper 2 months ago

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

Paper • 2510.26794 • Published Oct 30, 2025 • 26

ldkong

authored 3 papers 3 months ago

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Paper • 2510.02240 • Published Oct 2, 2025 • 17

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23, 2025 • 55

VideoLucy: Deep Memory Backtracking for Long Video Understanding

Paper • 2510.12422 • Published Oct 14, 2025 • 1