A FastAPI backend application that provides healthcare provider data through a REST API, with PostgreSQL database integration and ETL processing from CSV data.
- FastAPI backend with Python 3.11
- PostgreSQL database with SQLAlchemy ORM
- Pydantic schemas for data validation
- ETL process to populate database from CSV
- Docker and Docker Compose support
- Local development setup with health checks
- AI-powered query endpoint using OpenAI GPT-4o
- RAG (Retrieval-Augmented Generation) with embeddings
- Smart filtering for healthcare-related queries
GET /- Root endpointGET /providers- Get all providers with optional filtering by DRG, zip code, and radiusPOST /ask- AI-powered healthcare questions endpointGET /health- Health check
The /providers endpoint supports advanced filtering:
# Get all providers
curl "http://localhost:8000/providers"
# Filter by DRG (Diagnosis Related Group)
curl "http://localhost:8000/providers?drg=470"
# Filter by location and radius (requires all three parameters)
curl "http://localhost:8000/providers?drg=470&zip=10001&radius_km=40"Query Parameters:
drg(optional): Diagnosis Related Group codezip(optional): Zip code to search fromradius_km(optional): Radius in kilometers from the zip code
The /ask endpoint allows natural language queries about healthcare providers using OpenAI GPT-4o with RAG (Retrieval-Augmented Generation).
Endpoint: POST /ask
Request Body:
{
"question": "Your healthcare-related question here"
}Response:
{
"answer": "AI-generated response with relevant provider information"
}Example 1: Best rated providers for procedures
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "Who has the best ratings for heart surgery near 10032?"}'Example 2: Cost comparisons
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "What are the costs for hospital procedures in New York?"}'Example 3: Provider recommendations
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "Find me the most affordable cardiac surgery providers with good ratings"}'Example 4: Location-based queries
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "Which hospitals in California have the highest star ratings?"}'Non-healthcare related questions are automatically filtered out:
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "How is the weather today?"}'
# Response:
{
"answer": "I can only help with hospital pricing and quality information. Please ask about medical procedures, costs or hospital ratings."
}- Query Filtering: Uses AI to determine if questions are healthcare-related
- RAG with Embeddings: Converts queries and provider data to embeddings for semantic matching
- Context Retrieval: Finds the most relevant providers based on query similarity
- AI Response: Uses GPT-4o to generate helpful responses with provider context
- Fallback Mode: Works even without OpenAI API key using keyword matching
The providers table contains the following columns:
id(UUID, Primary Key) - Unique identifier for each recordprovider_id(String, Indexed) - Provider identifier from source dataprovider_name(String) - Name of the healthcare providerprovider_city(String) - City where provider is locatedprovider_state(String) - State where provider is locatedprovider_zip_code(Integer, Indexed) - ZIP code of provider locationms_drg_definition(Integer) - Medicare Severity Diagnosis Related Group codetotal_discharges(Integer) - Number of patients dischargedaverage_covered_charges(Decimal) - Average amount billed to insuranceaverage_total_payments(Decimal) - Average total payments receivedaverage_medicare_payments(Decimal) - Average Medicare payments receivedlatitude(Float, Nullable) - Geographic latitude for location-based querieslongitude(Float, Nullable) - Geographic longitude for location-based queriesstar_rating(Integer) - Quality rating from 1-10 (higher is better)
The ETL process maps the following CSV columns to database columns:
| CSV Column | Database Column |
|---|---|
| Rndrng_Prvdr_CCN | provider_id |
| Rndrng_Prvdr_Org_Name | provider_name |
| Rndrng_Prvdr_City | provider_city |
| Rndrng_Prvdr_State_Abrvtn | provider_state |
| Rndrng_Prvdr_Zip5 | provider_zip_code |
| DRG_Cd | ms_drg_definition |
| Tot_Dschrgs | total_discharges |
| Avg_Submtd_Cvrd_Chrg | average_covered_charges |
| Avg_Tot_Pymt_Amt | average_total_payments |
| Avg_Mdcr_Pymt_Amt | average_medicare_payments |
The easiest way to run the application is using Docker Compose:
# Build and run with Docker Compose
docker compose up --buildThis will:
- Start a PostgreSQL database container
- Build and start the FastAPI application container
- Run the ETL process on startup
- Make the API available on http://localhost:8000
- Python 3.11
- PostgreSQL installed and running
- Virtual environment (recommended)
# Install PostgreSQL
brew install postgresql
# Start PostgreSQL service
brew services start postgresql
# Create postgres user with password
createuser -s postgres
psql postgres -c "ALTER USER postgres PASSWORD 'password';"
# Create the providers database
createdb -U postgres providers# Install PostgreSQL
sudo apt update
sudo apt install postgresql postgresql-contrib
# Start PostgreSQL service
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Switch to postgres user and create database user
sudo -u postgres psql -c "CREATE USER postgres WITH SUPERUSER PASSWORD 'password';"
sudo -u postgres createdb providersDownload and install PostgreSQL from the official website: https://www.postgresql.org/download/windows/
After installation:
-- Connect to PostgreSQL as superuser and run:
CREATE USER postgres WITH SUPERUSER PASSWORD 'password';
CREATE DATABASE providers OWNER postgres;If you're still having issues with empty database, follow these steps:
- Connect to PostgreSQL as superuser:
# macOS
psql postgres
# Ubuntu/Debian
sudo -u postgres psql
# Windows
# Use pgAdmin or psql from PostgreSQL installation- Create the postgres user with proper permissions:
-- Create user if it doesn't exist
CREATE USER postgres WITH SUPERUSER PASSWORD 'password';
-- Grant all privileges
ALTER USER postgres WITH SUPERUSER;
-- Create the providers database
CREATE DATABASE providers OWNER postgres;
-- Grant all privileges on the database
GRANT ALL PRIVILEGES ON DATABASE providers TO postgres;
-- Connect to the providers database
\c providers
-- Grant schema privileges
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO postgres;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO postgres;
-- Exit
\q- Test the connection:
psql -h localhost -U postgres -d providers -c "SELECT 1;"To use the /ask endpoint with full AI capabilities, you need to set up an OpenAI API key:
- Visit OpenAI's website
- Sign up or log in to your account
- Navigate to the API Keys page
- Click "Create new secret key"
- Copy the generated API key (keep it secure!)
Create a .env file in the project root directory:
# Create .env file
touch .envAdd your OpenAI API key to the .env file:
# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Database Configuration (for local development)
DATABASE_URL=postgresql://postgres:password@localhost:5432/providersImportant Notes:
- Replace
your_openai_api_key_herewith your actual OpenAI API key - The
.envfile is already in.gitignoreto keep your API key secure - Never commit your actual API key to version control
The application works even without an OpenAI API key:
- With API key: Full AI capabilities with GPT-4o and embeddings
- Without API key: Basic keyword matching and structured responses
You'll see this warning if no API key is configured:
⚠️ OPENAI_API_KEY not found in environment variables
⚠️ Please create a .env file with OPENAI_API_KEY=your_api_key_here
- Clone the repository:
git clone <repository-url>
cd outfox_interview_test- Create and activate a virtual environment:
python3.11 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt-
Ensure PostgreSQL is running and accessible with:
- Host: localhost
- Port: 5432
- User: postgres
- Password: password
- Database: providers
-
Run the application:
python run_local.pyThe application will:
- Check if PostgreSQL is running
- Create the
providersdatabase if it doesn't exist - Run the ETL process to populate the database
- Start the FastAPI server on http://localhost:8000
Once the application is running, you can access:
- API documentation: http://localhost:8000/docs
- Alternative docs: http://localhost:8000/redoc
- Health check: http://localhost:8000/health
- For AI features, create a
.envfile first:
# Create .env file with your OpenAI API key
echo "OPENAI_API_KEY=your_openai_api_key_here" > .env- Build and run the application:
docker compose up --buildThis will:
- Start a PostgreSQL database container
- Build and start the FastAPI application container
- Run the ETL process on startup
- Make the API available on http://localhost:8000
- Load environment variables from
.envfile (including OpenAI API key)
- Stop the application:
docker compose down- Build the Docker image:
docker build -t providers-api .- Run with a PostgreSQL database:
docker run -p 8000:8000 --env DATABASE_URL=postgresql://postgres:password@host.docker.internal:5432/providers providers-apioutfox_interview_test/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── database.py # Database configuration
│ ├── models.py # SQLAlchemy models
│ ├── schemas.py # Pydantic schemas
│ ├── etl.py # ETL process
│ ├── openai_service.py # AI service with RAG functionality
│ └── geocoding.py # Geocoding utilities
├── MUP_INP_RY24_P03_V10_DY22_PrvSvc.csv # Source healthcare data
├── USZipsWithLatLon_20231227.csv # ZIP code geocoding data
├── .env.example # Environment variables template
├── requirements.txt # Python dependencies
├── run_local.py # Local development runner
├── run_etl.py # Manual ETL runner
├── troubleshoot_db.py # Database troubleshooting utility
├── Dockerfile
├── docker-compose.yml
└── README.md
The application uses environment variables for configuration. Create a .env file with:
OPENAI_API_KEY: Your OpenAI API key for AI features (optional but recommended)DATABASE_URL: PostgreSQL connection string (default: postgresql://postgres:password@localhost:5432/providers)
Example .env file:
OPENAI_API_KEY=sk-your-actual-openai-api-key-here
DATABASE_URL=postgresql://postgres:password@localhost:5432/providersIf the database is created but empty after running the ETL:
- Check if the ETL process ran successfully:
# Look for ETL output in the application logs
# You should see messages like:
# "✅ Successfully processed X records into the database"- Verify database connection:
psql -h localhost -U postgres -d providers -c "SELECT COUNT(*) FROM providers;"- Check if tables exist:
psql -h localhost -U postgres -d providers -c "\dt"- Manually test the ETL process:
# Install dependencies first
pip install -r requirements.txt
# Run ETL manually
python -c "from app.etl import run_etl; run_etl()"- Check CSV file:
# Verify CSV file exists and is readable
ls -la MUP_INP_RY24_P03_V10_DY22_PrvSvc.csv
head -5 MUP_INP_RY24_P03_V10_DY22_PrvSvc.csvIf you see errors like "Textual SQL expression should be explicitly declared as text()", this has been fixed in the latest version of the ETL script. The error occurs because SQLAlchemy 2.0 requires explicit text() wrapping for raw SQL queries.
If you encounter PostgreSQL connection issues:
- Check if PostgreSQL is running:
# macOS
brew services list | grep postgresql
# Ubuntu/Debian
sudo systemctl status postgresql
# Windows
# Check Services app for PostgreSQL service- Start PostgreSQL if not running:
# macOS
brew services start postgresql
# Ubuntu/Debian
sudo systemctl start postgresql
# Windows
# Start PostgreSQL service from Services- Check PostgreSQL credentials and create user if needed:
CREATE USER postgres WITH PASSWORD 'password';
ALTER USER postgres WITH SUPERUSER;- Test PostgreSQL connection:
psql -h localhost -U postgres -d postgres -c "SELECT 1;"Ensure the CSV file MUP_INP_RY24_P03_V10_DY22_PrvSvc.csv is in the root directory of the project.
If Docker containers fail to start:
- Check if ports are already in use:
lsof -i :8000
lsof -i :5432- Remove existing containers and volumes:
docker compose down -v
docker system prune -fIf you see SQLAlchemy warnings about session state, this is usually harmless and indicates that the ETL process is working correctly. The warnings occur when the session is rolled back after a successful commit.
- Add new routes in
app/main.py - Create corresponding schemas in
app/schemas.pyif needed - Update this README with new endpoint documentation
The application uses SQLAlchemy's create_all() for table creation. For production, consider using Alembic for database migrations.
This project is for interview purposes.
