9 31 56

Zekun Qi

qizekun

https://qizekun.github.io/

qizekun

AI & ML interests

Embodied Intelligence, Large Langugae Model, 3D Computer Vision

Recent Activity

upvoted a paper 9 days ago

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

upvoted a paper 16 days ago

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

upvoted a paper 23 days ago

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

View all activity

Organizations

upvoted a paper 9 days ago

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

Paper • 2602.16705 • Published 9 days ago • 26

upvoted a paper 16 days ago

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Paper • 2602.10098 • Published 17 days ago • 18

upvoted a paper 23 days ago

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Paper • 2602.03796 • Published 24 days ago • 58

upvoted a collection about 1 month ago

OmniSpatial

Collection

Collections of ICLR 2026 paper: "OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models" • 4 items • Updated Jan 27 • 1

upvoted a paper about 1 month ago

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published Jan 14 • 193

upvoted a collection 4 months ago

GS-Reasoner

Collection

Collections of paper "Reasoning in Space via Grounding in the World" • 6 items • Updated Oct 20, 2025 • 2

upvoted a paper 4 months ago

Reasoning in Space via Grounding in the World

Paper • 2510.13800 • Published Oct 15, 2025 • 15

upvoted 2 papers 6 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 214

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

Paper • 2508.08240 • Published Aug 11, 2025 • 45

upvoted 2 papers 7 months ago

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

Paper • 2508.10711 • Published Aug 14, 2025 • 145

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Paper • 2507.13344 • Published Jul 17, 2025 • 59

upvoted 2 papers 8 months ago

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7, 2025 • 75

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Paper • 2507.04447 • Published Jul 6, 2025 • 45

upvoted 3 papers 9 months ago

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Paper • 2506.03135 • Published Jun 3, 2025 • 40

ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding

Paper • 2506.01853 • Published Jun 2, 2025 • 32

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97

upvoted a paper 10 months ago

Step1X-Edit: A Practical Framework for General Image Editing

Paper • 2504.17761 • Published Apr 24, 2025 • 92

upvoted a paper 11 months ago

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7, 2025 • 110

upvoted a collection 11 months ago

DreamLLM

Collection

[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation (https://arxiv.org/abs/2309.11499) • 6 items • Updated Mar 22, 2024 • 3

upvoted a paper 11 months ago

Unleashing Vecset Diffusion Model for Fast Shape Generation

Paper • 2503.16302 • Published Mar 20, 2025 • 43

Zekun Qi

AI & ML interests

Recent Activity

Organizations

qizekun's activity