Autonomous_Data_Scientist / docs /DATA_AND_POLICIES_GUIDE.md
Megha Panicker
Rename app to Autonomous Data Analyst and improve user-doc indexing
7154ae4

Data & Policy Guide

Use this guide to understand what data and policies you can ask about. Ask questions in plain English; the Autonomous Data Analyst will use this information to answer you.


What you can ask about

  • Data questions — Sales, products, customers, regions, returns, promotions (stored in the database).
  • Policy questions — Expense, remote work, leave, security, and code review (stored in company documents / PDFs).

Database tables (for data questions)

The database has these tables. Use them to ask things like "Top products by sales", "Sales by region", or "How many customers?".

regions

Column Type Description
id Integer (primary key) Unique region ID
name Text Region name
country Text Country code (e.g. US, CA, DE, GB)

What’s in the database: North America, South America, Europe West, Europe East, UK & Ireland, Asia Pacific, Middle East, Africa, ANZ, Central, Nordics, DACH (with country codes US, CA, MX, BR, DE, FR, GB, IN, AU, JP, AE, ZA).

Use for: geography, “sales by region”, “customers in North America”.


product_categories

Column Type Description
id Integer (primary key) Unique category ID
name Text Category name

What’s in the database: Electronics, Computers, Clothing, Home & Garden, Sports & Outdoors, Toys & Games, Books & Media, Health & Beauty, Office Supplies, Automotive, Pet Supplies, Groceries.

Use for: “by category”, “revenue by product category”, “top categories”.


customers

Column Type Description
id Integer (primary key) Unique customer ID
name Text Customer name
email Text Email address
region_id Integer Links to regions.id
created_at Date When the customer was added

Use for: “which customers”, “customers in region X”, “customers by region”.


products

Column Type Description
id Integer (primary key) Unique product ID
name Text Product name
category_id Integer Links to product_categories.id
base_price Numeric Base price of the product

Use for: “top products”, “products in Electronics”, “revenue by product”.


sales

Column Type Description
id Integer (primary key) Unique sale ID
customer_id Integer Links to customers.id
product_id Integer Links to products.id
quantity Integer Number of units sold
amount Numeric Total sale amount (revenue)
sale_date Date Date of the sale

Use for: “total sales”, “sales by month”, “revenue by region/category/product”.


returns

Column Type Description
id Integer (primary key) Unique return ID
sale_id Integer Links to sales.id (the sale being returned)
amount Numeric Refund amount
reason Text Reason for return
return_date Date Date of the return

What’s in the database: Return reasons include Defective, Wrong item shipped, Changed mind, Duplicate order, Arrived damaged, Not as described, Better price elsewhere, No longer needed, Size fit issue, Other.

Use for: “return rate”, “refunds by reason”, “returns by category”.


promotions

Column Type Description
id Integer (primary key) Unique promotion ID
code Text Promotion code (e.g. SAVE10, FLASH20)
discount_pct Numeric Discount percentage
start_date Date When the promotion starts
end_date Date When the promotion ends

Use for: “active promotions”, “discounts”, “promotions running in 2024”.


How the tables connect

  • sales links customers and products (who bought what, and how much).
  • products link to product_categories (each product belongs to a category).
  • customers link to regions (each customer is in a region).
  • returns link to sales (each return refers to one sale).

Company policies (for policy questions)

Policy content is taken from company documents (e.g. PDFs) stored in the system. You can ask about these topics in plain language.

Topic What it covers (from company documents)
Expense policy Pre-approval for amounts over $500; submit receipts within 30 days; economy air travel unless trip exceeds 8 hours; meal allowance $75 domestic / $100 international.
Remote work Up to 3 days per week remote with manager approval; core hours 10am–3pm local; VPN required; equipment reimbursement up to $500 for home office.
Leave / PTO 15 days PTO, 10 sick days; up to 5 days PTO carryover; bereavement 5 days; parental leave 12 weeks paid.
Data security Encryption at rest and in transit; production database access requires 2FA and manager approval; no PII in logs; incident reporting within 24 hours.
Code review 2 approvals per PR; run tests locally; no direct commits to main; use feature branches; document breaking changes in CHANGELOG.

Example questions

Data (database)

  • Top 5 products by total sales
  • Total sales by region
  • Revenue by product category
  • Best selling months last year
  • Which customers made the most purchases?
  • What percentage of sales were returned?
  • Returns by reason (e.g. defective vs changed mind)
  • How many products and customers do we have?
  • Sales in North America (or Europe West, Asia Pacific, etc.)
  • Active promotions or discounts

Policies (company documents / PDFs)

  • What is our expense policy?
  • What are the remote work guidelines?
  • What is our vacation or PTO policy?
  • How many PTO days do we get?
  • What is the data security policy?
  • What is the code review process?

This guide is used by the Autonomous Data Analyst to answer your questions. You don’t need to write SQL or know table names — just ask in plain English.