Gradio demo for FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions. This demo features a BLIP-based model, trained using FuseCap.
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions