| | --- |
| | title: Agentic Document Intelligence |
| | emoji: 📄 |
| | colorFrom: blue |
| | colorTo: pink |
| | sdk: gradio |
| | sdk_version: 6.2.0 |
| | app_file: app.py |
| | pinned: false |
| | license: apache-2.0 |
| | --- |
| | |
| | # 📄 Agentic Document Intelligence |
| | ### PDF RAG with Together.ai |
| |
|
| | This Hugging Face Space demonstrates a **Retrieval-Augmented Generation (RAG)** system that allows users to upload a PDF and ask questions that are **strictly grounded in the document content**. |
| |
|
| | The Space serves as a **foundational Agentic Document Intelligence component**, designed to be simple, transparent, and extensible. |
| |
|
| | --- |
| |
|
| | ## 🚀 What This Space Does |
| |
|
| | - Upload a PDF document |
| | - Build a semantic index using embeddings + FAISS |
| | - Ask natural-language questions |
| | - Receive answers grounded only in the uploaded document |
| | - View retrieved source passages for transparency |
| |
|
| | --- |
| |
|
| | ## 🧠 Architecture Overview |
| |
|
| | 1. **PDF Ingestion** |
| | - Extracts text from uploaded PDF |
| | - Cleans and normalizes content |
| |
|
| | 2. **Chunking** |
| | - Splits text into overlapping semantic chunks |
| | - Ensures contextual continuity |
| |
|
| | 3. **Vector Indexing** |
| | - Generates embeddings using Sentence Transformers |
| | - Indexes vectors using FAISS (cosine similarity) |
| |
|
| | 4. **Retrieval** |
| | - Retrieves top-K relevant chunks for each query |
| |
|
| | 5. **Generation (RAG)** |
| | - Injects retrieved context into LLM prompt |
| | - Uses Together.ai (Mixtral) for answer generation |
| |
|
| | --- |
| |
|
| | ## ▶️ How to Use This Space (End-to-End) |
| |
|
| | ### **Step 1: Upload a PDF** |
| | - Click **“Upload PDF”** |
| | - Select a text-based PDF file |
| | > ⚠️ Note: Scanned PDFs without text extraction will not work unless OCR is applied. |
| |
|
| | --- |
| |
|
| | ### **Step 2: Wait for Indexing** |
| | - The system will: |
| | - extract text |
| | - split it into chunks |
| | - build a FAISS vector index |
| | - You will see a confirmation message: |
| |
|
| |
|
| |
|
| |
|
| | --- |
| |
|
| | ### **Step 3: Ask a Question** |
| | - Type a natural-language question related to the document |
| | Examples: |
| | - *“Summarize the document”* |
| | - *“What is the main contribution?”* |
| | - *“Explain the methodology section”* |
| |
|
| | --- |
| |
|
| | ### **Step 4: Receive the Answer** |
| | You will get: |
| | - ✅ A generated answer based **only on document context** |
| | - 📌 Retrieved source passages with similarity scores |
| | - 🚫 No hallucinated or external information |
| |
|
| | If the answer is not present in the document, the system will respond: |
| |
|
| |
|
| |
|
| | --- |
| |
|
| | ### **Step 3: Ask a Question** |
| | - Type a natural-language question related to the document |
| | Examples: |
| | - *“Summarize the document”* |
| | - *“What is the main contribution?”* |
| | - *“Explain the methodology section”* |
| |
|
| | --- |
| |
|
| | ### **Step 4: Receive the Answer** |
| | You will get: |
| | - ✅ A generated answer based **only on document context** |
| | - 📌 Retrieved source passages with similarity scores |
| | - 🚫 No hallucinated or external information |
| |
|
| | If the answer is not present in the document, the system will respond: |
| |
|
| |
|
| |
|
| |
|
| | --- |
| |
|
| | ## 🤖 Models Used |
| |
|
| | ### **Language Model** |
| | - **Provider:** Together.ai |
| | - **Model:** `mistralai/Mixtral-8x7B-Instruct-v0.1` |
| |
|
| | ### **Embedding Model** |
| | - `sentence-transformers/all-MiniLM-L6-v2` |
| |
|
| | --- |
| |
|
| | ## 🧰 Tech Stack |
| |
|
| | - Python |
| | - Gradio (UI) |
| | - FAISS (vector search) |
| | - Sentence Transformers (embeddings) |
| | - Together.ai (LLM) |
| | - Hugging Face Spaces |
| |
|
| | --- |
| |
|
| | ## 🔐 Environment Configuration (For Developers) |
| |
|
| | ### **Secrets** |
| | - `TOGETHER_API_KEY` → Together.ai API key |
| | - `OPENAI_API_KEY` → Same value (compatibility with OpenAI client) |
| |
|
| | ### **Variables** |
| | - `TOGETHER_MODEL` → `mistralai/Mixtral-8x7B-Instruct-v0.1` |
| | - `TOGETHER_BASE_URL` → `https://api.together.xyz/v1` |
| |
|
| | --- |
| |
|
| | ## 🧩 Intended Use Cases |
| |
|
| | - Research paper Q&A |
| | - Technical documentation assistants |
| | - Internal knowledge bases |
| | - RAG pipeline reference implementation |
| | - Agentic AI system foundations |
| |
|
| | --- |
| |
|
| | ## 🔮 Future Enhancements |
| |
|
| | - Multi-PDF support |
| | - Chat memory |
| | - Streaming responses |
| | - Agent routing & tool usage |
| | - Evaluation and scoring agents |
| |
|
| | --- |
| |
|
| | ## 🙌 Author |
| |
|
| | Built by **Abhishek Prithvi Teja** |
| | Focused on **Agentic AI, RAG systems, and applied LLM engineering** |
| |
|
| | --- |
| |
|
| | ## 🏷️ Tags |
| |
|
| | `rag` · `agentic-ai` · `document-qa` · `faiss` · `together-ai` · `huggingface-spaces` |
| |
|