Shivam Kumar's picture

58 289

Shivam Kumar

shivamkumar

·

AI & ML interests

None yet

Recent Activity

liked a model about 2 hours ago

Zyphra/Zonos-v0.1-hybrid

liked a model about 2 hours ago

myshell-ai/OpenVoiceV2

liked a model about 2 hours ago

sesame/csm-1b

View all activity

Organizations

upvoted a collection about 2 hours ago

VoxCPM

4 items • Updated Dec 7, 2025 • 7

upvoted 3 collections about 14 hours ago

Nemotron-Personas

A collection of multilingual, region-specific synthetic persona datasets that support sovereign AI development across many countries and regions. • 5 items • Updated about 8 hours ago • 17

Z-Image

7 items • Updated 2 days ago • 133

Qwen3-ASR

4 items • Updated about 20 hours ago • 27

upvoted a collection 3 days ago

Text-To-Speech

https://kyutai.org/next/tts • 6 items • Updated 17 days ago • 25

upvoted a collection 4 days ago

GLiNER-decoder

A joint encoder-decoder GLiNER model for a scalable open-ontology entity recognition • 3 items • Updated about 19 hours ago • 17

upvoted 2 papers 6 days ago

X-Talk: On the Underestimated Potential of Modular Speech-to-Speech Dialogue System

Paper • 2512.18706 • Published Dec 21, 2025 • 1

Qwen3-TTS Technical Report

Paper • 2601.15621 • Published 8 days ago • 52

upvoted an article 6 days ago

Article

Introducing Waypoint-1: Real-time interactive video diffusion from Overworld

+3

10 days ago

•

33

upvoted a collection 7 days ago

Qwen3-TTS

7 items • Updated 8 days ago • 250

upvoted a collection 23 days ago

sam-audio

11 items • Updated Dec 16, 2025 • 122

upvoted a paper 24 days ago

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

Paper • 2512.23343 • Published Dec 29, 2025 • 28

upvoted a collection 24 days ago

Nemotron Speech

Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S • 9 items • Updated about 8 hours ago • 36

upvoted 2 papers 3 months ago

Yan: Foundational Interactive Video Generation

Paper • 2508.08601 • Published Aug 12, 2025 • 1

MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation

Paper • 2508.19320 • Published Aug 26, 2025 • 29

upvoted 3 collections 3 months ago

VILA: On Pre-training for Visual Language Models

10 items • Updated Sep 13, 2025 • 57

Sana

⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer • 22 items • Updated 10 days ago • 98

SANA-1.5

SANA-1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer • 6 items • Updated Sep 13, 2025 • 10

upvoted a paper 3 months ago

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 186

upvoted a collection 3 months ago

LongAI

Boost AI's Long ability, while keeping Efficient. Models in this collection includes LongVILA, LongVILA-R1, LongLive. • 8 items • Updated Nov 6, 2025 • 2