Audiovisual - a melsiddieg Collection

melsiddieg 's Collections

Arudi

from_scratch_pretrain

bert and friends

Research and Optimization

finetune_datasets

Audiovisual

updated 9 days ago

microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated 24 days ago • 183k • 2.21k
ibm-granite/granite-docling-258M

Image-Text-to-Text • Updated Sep 23, 2025 • 196k • 1.12k
deepseek-ai/DeepSeek-OCR

Image-Text-to-Text • Updated Nov 4, 2025 • 2.99M • 3.15k
Qwen/Qwen3-VL-2B-Thinking

Image-Text-to-Text • 2B • Updated Oct 20, 2025 • 27.5k • 105
datalab-to/chandra

Image-Text-to-Text • 9B • Updated Oct 21, 2025 • 274k • 484
Qwen/Qwen3-VL-2B-Instruct

Image-Text-to-Text • 2B • Updated Oct 23, 2025 • 2.09M • 318
PokeeAI/pokee_research_7b

Text Generation • 8B • Updated Oct 23, 2025 • 353 • 100
openbmb/MiniCPM-o-4_5

Any-to-Any • 9B • Updated 1 day ago • 44.9k • 839
Qwen/Qwen3-ForcedAligner-0.6B

Automatic Speech Recognition • Updated 16 days ago • 34.7k • 82