Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
title: Autonomous Data Analyst
emoji: π»
colorFrom: purple
colorTo: gray
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Multi-Source Agentic RAG
A local AI agent with an intelligent Router that directs traffic between:
- Neon PostgreSQL β structured/numerical data (sales, counts, aggregations)
- ChromaDB β policies, documentation (company_policies collection)
Both paths converge at a synthesis node that produces natural language answers.
Architecture
- Router: LLM analyzes intent β routes to
sqlorvector_store - SQL path: explore β generate β HITL approval β execute β self-correct β visualize
- Vector path: retrieve from ChromaDB
company_policies - Synthesis:
generate_answerproduces a clear natural language response
βββββββββββββββ
β Router β
ββββββββ¬βββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ β βΌ
βββββββββββββββ β βββββββββββββββ
β SQL path β β β Vector path β
β (Neon) β β β (ChromaDB) β
ββββββββ¬βββββββ β ββββββββ¬βββββββ
β β β
βββββββββββββββββΌββββββββββββββββ
βΌ
βββββββββββββββ
β generate_ β
β answer β
βββββββββββββββ
Free hosting (Hugging Face Spaces)
You can host the Gradio app for free on Hugging Face Spaces. The app will use Groq (free LLM API) and HuggingFace embeddings in the cloudβno Ollama required.
See DEPLOY.md for step-by-step instructions (Groq API key, Space secrets, and Neon connection string).
Setup
1. Create a Neon database
- Sign up at neon.tech and create a project.
- Copy the connection string from the dashboard (Connection string β URI).
2. Configure connection
export NEON_DATABASE_URL='postgresql://user:password@ep-xxx.region.aws.neon.tech/neondb?sslmode=require'
Or copy .env.example to .env and set NEON_DATABASE_URL there.
3. Install Python dependencies
cd "AGENTIC _AI"
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
4. Seed the database (once)
python3 seed_neon.py
This creates 7 tables with different kinds of data:
| Table | Kind | Description |
|---|---|---|
| regions | Lookup | Geographic (name, country) |
| product_categories | Lookup | Category names |
| customers | Entity | Name, email, region_id, created_at |
| products | Catalog | Name, category_id, base_price |
| sales | Transactions | customer_id, product_id, quantity, amount, sale_date |
| returns | Events | sale_id, amount, reason, return_date |
| promotions | Time-bound | code, discount_pct, start_date, end_date |
Then load ~100k rows of synthetic data (with edge cases):
python3 generate_synthetic_sales.py
5. Seed the vector store (once)
ollama pull nomic-embed-text # embedding model
python3 seed_vector_store.py
6. Install Ollama and pull models
ollama pull deepseek-coder-v2 # for routing + SQL + synthesis
ollama pull nomic-embed-text # for ChromaDB embeddings
# Run ollama serve if not using the app
Usage
CLI
# SQL path (structured data)
python3 main.py "What are total sales by category?"
python3 main.py "Show sales by region"
# Vector path (policies/docs)
python3 main.py "What is the expense policy?"
python3 main.py "What is the remote work policy?"
Streamlit UI
streamlit run app.py
- Chat interface with
st.chat_messageandst.chat_input - Agent Reasoning expander shows datasource, SQL, errors
st.dataframe()for SQL results,st.image()for charts- HITL: high-impact queries pause with Approve / Reject buttons
- Sidebar: Neon connection string, Clear Chat
Gradio UI
python3 app_gradio.py
- Opens at http://localhost:7860
- Chat interface with config (Neon URL, Ollama model) in an accordion
- Sample-question buttons; results table and chart below chat when applicable
- HITL: Approve / Reject buttons appear when a high-impact SQL query is proposed
The agent will:
- Explore the PostgreSQL schema
- Generate a SQL query
- Human-in-the-Loop (HITL): If the query is high-impact (UPDATE, DELETE, INSERT, DROP, etc.), the program pauses and prompts you to approve or reject before execution
- Execute it (and retry with fixes if it fails)
- Infer the best chart type and save to
data/outputs/analysis_chart.png
Project Structure
AGENTIC _AI/
βββ main.py # Entry point (uses NEON_DATABASE_URL)
βββ seed_neon.py # One-time: create tables and seed data on Neon
βββ .env.example # Example env vars (NEON_DATABASE_URL)
βββ requirements.txt
βββ data/
β βββ outputs/ # Saved charts
βββ src/
βββ agent.py # LangGraph agent + self-correction loop
βββ models.py # Pydantic schemas
βββ state.py # Agent state
βββ tools.py # Schema explorer, SQL executor
βββ visualizer.py # Pandas/Matplotlib/Seaborn pipeline
Concepts Covered
- Local LLM orchestration with Ollama
- Stateful agent loops with LangGraph
- Self-correction when SQL fails
- Structured outputs with Pydantic
- Function-style tools for schema and execution
- Data lifecycle from raw SQL to visualization