lixinhao (Xinhao Li)

upvoted a paper 3 months ago

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

Paper • 2509.24897 • Published Sep 29, 2025 • 46

upvoted a collection 3 months ago

VideoChat-R1

Collection

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning • 4 items • Updated Sep 28, 2025 • 8

upvoted a paper 5 months ago

Pixels, Patterns, but No Poetry: To See The World like Humans

Paper • 2507.16863 • Published Jul 21, 2025 • 68

upvoted a paper 7 months ago

VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

Paper • 2505.23359 • Published May 29, 2025 • 38

upvoted an article 9 months ago

Article

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

+2

Jun 13, 2024

•

61

upvoted a paper 9 months ago

Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation

Paper • 2503.19622 • Published Mar 25, 2025 • 31

upvoted a collection 10 months ago

VideoChat-Flash

Collection

Faster and more powerful VideoChat. • 15 items • Updated Sep 28, 2025 • 11

upvoted 2 papers 10 months ago

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

Paper • 2310.01324 • Published Oct 2, 2023 • 2

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22, 2024 • 27

upvoted a paper 11 months ago

GuardReasoner: Towards Reasoning-based LLM Safeguards

Paper • 2501.18492 • Published Jan 30, 2025 • 88

upvoted a paper 12 months ago

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Paper • 2501.00574 • Published Dec 31, 2024 • 6

Xinhao Li

AI & ML interests

Organizations

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

VideoChat-R1

Pixels, Patterns, but No Poetry: To See The World like Humans

VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation

VideoChat-Flash

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

GuardReasoner: Towards Reasoning-based LLM Safeguards

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Xinhao Li

AI & ML interests

Organizations

lixinhao's activity

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate