Spaces:

prithvi1029
/

agentic-document-intelligence

Sleeping

App Files Files Community

agentic-document-intelligence / README.md

prithvi1029

Update README.md

3a745a5 verified 3 months ago

preview code

raw

history blame contribute delete

4.09 kB

	---
	title: Agentic Document Intelligence
	emoji: 📄
	colorFrom: blue
	colorTo: pink
	sdk: gradio
	sdk_version: 6.2.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# 📄 Agentic Document Intelligence
	### PDF RAG with Together.ai

	This Hugging Face Space demonstrates a Retrieval-Augmented Generation (RAG) system that allows users to upload a PDF and ask questions that are strictly grounded in the document content.

	The Space serves as a foundational Agentic Document Intelligence component, designed to be simple, transparent, and extensible.

	---

	## 🚀 What This Space Does

	- Upload a PDF document
	- Build a semantic index using embeddings + FAISS
	- Ask natural-language questions
	- Receive answers grounded only in the uploaded document
	- View retrieved source passages for transparency

	---

	## 🧠 Architecture Overview

	1. PDF Ingestion
	- Extracts text from uploaded PDF
	- Cleans and normalizes content

	2. Chunking
	- Splits text into overlapping semantic chunks
	- Ensures contextual continuity

	3. Vector Indexing
	- Generates embeddings using Sentence Transformers
	- Indexes vectors using FAISS (cosine similarity)

	4. Retrieval
	- Retrieves top-K relevant chunks for each query

	5. Generation (RAG)
	- Injects retrieved context into LLM prompt
	- Uses Together.ai (Mixtral) for answer generation

	---

	## ▶️ How to Use This Space (End-to-End)

	### Step 1: Upload a PDF
	- Click “Upload PDF”
	- Select a text-based PDF file
	> ⚠️ Note: Scanned PDFs without text extraction will not work unless OCR is applied.

	---

	### Step 2: Wait for Indexing
	- The system will:
	- extract text
	- split it into chunks
	- build a FAISS vector index
	- You will see a confirmation message:




	---

	### Step 3: Ask a Question
	- Type a natural-language question related to the document
	Examples:
	- “Summarize the document”
	- “What is the main contribution?”
	- “Explain the methodology section”

	---

	### Step 4: Receive the Answer
	You will get:
	- ✅ A generated answer based only on document context
	- 📌 Retrieved source passages with similarity scores
	- 🚫 No hallucinated or external information

	If the answer is not present in the document, the system will respond:



	---

	### Step 3: Ask a Question
	- Type a natural-language question related to the document
	Examples:
	- “Summarize the document”
	- “What is the main contribution?”
	- “Explain the methodology section”

	---

	### Step 4: Receive the Answer
	You will get:
	- ✅ A generated answer based only on document context
	- 📌 Retrieved source passages with similarity scores
	- 🚫 No hallucinated or external information

	If the answer is not present in the document, the system will respond:




	---

	## 🤖 Models Used

	### Language Model
	- Provider: Together.ai
	- Model: `mistralai/Mixtral-8x7B-Instruct-v0.1`

	### Embedding Model
	- `sentence-transformers/all-MiniLM-L6-v2`

	---

	## 🧰 Tech Stack

	- Python
	- Gradio (UI)
	- FAISS (vector search)
	- Sentence Transformers (embeddings)
	- Together.ai (LLM)
	- Hugging Face Spaces

	---

	## 🔐 Environment Configuration (For Developers)

	### Secrets
	- `TOGETHER_API_KEY` → Together.ai API key
	- `OPENAI_API_KEY` → Same value (compatibility with OpenAI client)

	### Variables
	- `TOGETHER_MODEL` → `mistralai/Mixtral-8x7B-Instruct-v0.1`
	- `TOGETHER_BASE_URL` → `https://api.together.xyz/v1`

	---

	## 🧩 Intended Use Cases

	- Research paper Q&A
	- Technical documentation assistants
	- Internal knowledge bases
	- RAG pipeline reference implementation
	- Agentic AI system foundations

	---

	## 🔮 Future Enhancements

	- Multi-PDF support
	- Chat memory
	- Streaming responses
	- Agent routing & tool usage
	- Evaluation and scoring agents

	---

	## 🙌 Author

	Built by Abhishek Prithvi Teja
	Focused on Agentic AI, RAG systems, and applied LLM engineering

	---

	## 🏷️ Tags

	`rag` · `agentic-ai` · `document-qa` · `faiss` · `together-ai` · `huggingface-spaces`