MFM - Multimodal Foundation Models - a LeafInTheTree Collection

LeafInTheTree 's Collections

Speech-2-Speech

MFM - Multimodal Foundation Models

MFM - Multimodal Foundation Models

updated Jan 28, 2025

Paused

Featured

102

Idefics3

📊

102

Generate text based on an image and prompt
Runtime error

162

VideoLLaMA2

🎥

162

Media understanding
Runtime error

54

GroundingDINO ⚔ OWL

🦖

54

Identify objects in images using text queries
Running

85

Paligemma HF

🤗

85

Generate text and segment images using PaliGemma
Paused

Featured

314

PaliGemma Demo

🤲

314

Annotate and describe images with text prompts
Runtime error

Featured

515

Florence2 + SAM2

🔥

515

Segment and caption objects in images and videos
Sleeping

11

Florence 2 Vision Model V1

💻

11

Analyze images to caption, detect objects, and extract text
Build error

2

Marketing Vision

👁

2
Runtime error

2

Idefics3

📊

2
Paused

10

Theia

⚡

10

Generate detailed image analyses and depth predictions
Runtime error

16

XGen MM

💻

16

Generate detailed descriptions from images and questions
Sleeping

LLaMA 3.1 Vision

🦙
Running on Zero

Featured

80

Chameleon 30b

🔥

80

Chat about uploaded images with AI‑generated answers
Running

Featured

508

InternVL

⚡

508

Chat with AI using text and images, get highlighted answers
Running on Zero

Featured

827

Florence 2

📉

827

Generate captions, detect objects, and segment images with AI
Running on Zero

Featured

224

Phi 3.5 Vision

🔥

224

Ask questions about images and get answers
Runtime error

Featured

885

MiniGPT-4

🚀

885
Runtime error

40

Mistral Pixtral Demo

👀

40

Chat with Pixtral 12B using Mistral Inference
Runtime error

Featured

324

Ovis1.6 Gemma2 9B

🐑

324

Interact with a chatbot that understands text and images
meta-llama/Llama-Guard-3-11B-Vision

Image-Text-to-Text • 11B • Updated Nov 18, 2024 • 841 • 67
Running

Featured

96

Owlv2

👀

96

State-of-the-art Zero-shot Object Detection
Running on Zero

Featured

391

Llama-Vision-11B

🚀

391

Chat with images using Llama Vision model
Runtime error

144

SmolVLM

📊

144

Generate text from images and queries
Runtime error

6

GLM-Edge-V-5B Space

📷

6

Generate text responses based on images and chat history
Running on Zero

17

Paligemma2 Detection

😻

17

Paligemma2 Detection with Supervision
Runtime error

40

Florence Llama

💬

40

Generate text responses from images and text input
Runtime error

6

Paligemma2 10b Ft Docci 448

📉

6
Runtime error

5

VisPer-LM

🔍

5

Visualize image depth, segmentation, and generation
Runtime error

Featured

2.02k

Chat With Janus-Pro-7B

🌍

2.02k

A unified multimodal understanding and generation model.