Autonomous_Data_Scientist / docs /DATA_AND_POLICIES_GUIDE.md
Megha Panicker
Rename app to Autonomous Data Analyst and improve user-doc indexing
7154ae4
# Data & Policy Guide
Use this guide to understand what data and policies you can ask about. Ask questions in plain English; the Autonomous Data Analyst will use this information to answer you.
---
## What you can ask about
- **Data questions** — Sales, products, customers, regions, returns, promotions (stored in the database).
- **Policy questions** — Expense, remote work, leave, security, and code review (stored in company documents / PDFs).
---
## Database tables (for data questions)
The database has these tables. Use them to ask things like *"Top products by sales"*, *"Sales by region"*, or *"How many customers?"*.
### **regions**
| Column | Type | Description |
|----------|------|-------------|
| id | Integer (primary key) | Unique region ID |
| name | Text | Region name |
| country | Text | Country code (e.g. US, CA, DE, GB) |
**What’s in the database:** North America, South America, Europe West, Europe East, UK & Ireland, Asia Pacific, Middle East, Africa, ANZ, Central, Nordics, DACH (with country codes US, CA, MX, BR, DE, FR, GB, IN, AU, JP, AE, ZA).
*Use for: geography, “sales by region”, “customers in North America”.*
---
### **product_categories**
| Column | Type | Description |
|--------|------|-------------|
| id | Integer (primary key) | Unique category ID |
| name | Text | Category name |
**What’s in the database:** Electronics, Computers, Clothing, Home & Garden, Sports & Outdoors, Toys & Games, Books & Media, Health & Beauty, Office Supplies, Automotive, Pet Supplies, Groceries.
*Use for: “by category”, “revenue by product category”, “top categories”.*
---
### **customers**
| Column | Type | Description |
|------------|------|-------------|
| id | Integer (primary key) | Unique customer ID |
| name | Text | Customer name |
| email | Text | Email address |
| region_id | Integer | Links to **regions**.id |
| created_at | Date | When the customer was added |
*Use for: “which customers”, “customers in region X”, “customers by region”.*
---
### **products**
| Column | Type | Description |
|-------------|------|-------------|
| id | Integer (primary key) | Unique product ID |
| name | Text | Product name |
| category_id | Integer | Links to **product_categories**.id |
| base_price | Numeric | Base price of the product |
*Use for: “top products”, “products in Electronics”, “revenue by product”.*
---
### **sales**
| Column | Type | Description |
|-------------|------|-------------|
| id | Integer (primary key) | Unique sale ID |
| customer_id | Integer | Links to **customers**.id |
| product_id | Integer | Links to **products**.id |
| quantity | Integer | Number of units sold |
| amount | Numeric | Total sale amount (revenue) |
| sale_date | Date | Date of the sale |
*Use for: “total sales”, “sales by month”, “revenue by region/category/product”.*
---
### **returns**
| Column | Type | Description |
|------------|------|-------------|
| id | Integer (primary key) | Unique return ID |
| sale_id | Integer | Links to **sales**.id (the sale being returned) |
| amount | Numeric | Refund amount |
| reason | Text | Reason for return |
| return_date| Date | Date of the return |
**What’s in the database:** Return reasons include Defective, Wrong item shipped, Changed mind, Duplicate order, Arrived damaged, Not as described, Better price elsewhere, No longer needed, Size fit issue, Other.
*Use for: “return rate”, “refunds by reason”, “returns by category”.*
---
### **promotions**
| Column | Type | Description |
|-------------|------|-------------|
| id | Integer (primary key) | Unique promotion ID |
| code | Text | Promotion code (e.g. SAVE10, FLASH20) |
| discount_pct| Numeric | Discount percentage |
| start_date | Date | When the promotion starts |
| end_date | Date | When the promotion ends |
*Use for: “active promotions”, “discounts”, “promotions running in 2024”.*
---
## How the tables connect
- **sales** links **customers** and **products** (who bought what, and how much).
- **products** link to **product_categories** (each product belongs to a category).
- **customers** link to **regions** (each customer is in a region).
- **returns** link to **sales** (each return refers to one sale).
---
## Company policies (for policy questions)
Policy content is taken from company documents (e.g. PDFs) stored in the system. You can ask about these topics in plain language.
| Topic | What it covers (from company documents) |
|-------|----------------------------------------|
| **Expense policy** | Pre-approval for amounts over $500; submit receipts within 30 days; economy air travel unless trip exceeds 8 hours; meal allowance $75 domestic / $100 international. |
| **Remote work** | Up to 3 days per week remote with manager approval; core hours 10am–3pm local; VPN required; equipment reimbursement up to $500 for home office. |
| **Leave / PTO** | 15 days PTO, 10 sick days; up to 5 days PTO carryover; bereavement 5 days; parental leave 12 weeks paid. |
| **Data security** | Encryption at rest and in transit; production database access requires 2FA and manager approval; no PII in logs; incident reporting within 24 hours. |
| **Code review** | 2 approvals per PR; run tests locally; no direct commits to main; use feature branches; document breaking changes in CHANGELOG. |
---
## Example questions
**Data (database)**
- Top 5 products by total sales
- Total sales by region
- Revenue by product category
- Best selling months last year
- Which customers made the most purchases?
- What percentage of sales were returned?
- Returns by reason (e.g. defective vs changed mind)
- How many products and customers do we have?
- Sales in North America (or Europe West, Asia Pacific, etc.)
- Active promotions or discounts
**Policies (company documents / PDFs)**
- What is our expense policy?
- What are the remote work guidelines?
- What is our vacation or PTO policy?
- How many PTO days do we get?
- What is the data security policy?
- What is the code review process?
---
*This guide is used by the Autonomous Data Analyst to answer your questions. You don’t need to write SQL or know table names — just ask in plain English.*