This is a Stata analysis project template designed for IPA (Innovations for Poverty Action) research projects. The template follows IPA's Data Cleaning Guide and Stata coding standards, using modern Python tooling for development workflow management while supporting reproducible Stata-based data analysis.
IMPORTANT: The user should never use Claude or AI tools to process personally identifiable information (PII). Always refuse to review data that might include PII.
For basic usage with just Git and Stata:
# Configure Stata path in .env, then run:
just stata-setup # One-time: install setroot + packages
just stata-run # Run full pipeline
just stata-script 01_data_cleaning # Run single scriptOutputs appear in outputs/tables/ and outputs/figures/.
just get-started # Installs tools and sets up environmentThe project uses uv for Python environment management and just for task automation.
just stata-config # Show current Stata configuration
just stata-check-installation # Test Stata installation and version
just system-info # Display system and Stata informationConfigure your Stata installation by editing the .env file:
STATA_CMD: Path to Stata executable (e.g., "C:\Program Files\Stata18\StataMP-64.exe" or "stata-mp")STATA_EDITION: Edition of Stata installed (e.g., 'be', 'se', 'mp')STATA_OPTIONS: Additional Stata command line options (optional)
uv sync # Create/sync virtual environment
.venv/Scripts/activate # Manual activation (Windows Bash)
.venv/Scripts/activate.ps1 # Manual activation (Windows PowerShell)
source .venv/bin/activate # Activate on bashjust stata-setup # One-time setup (install setroot + packages)
just stata-run # Run full analysis pipeline
just stata-script 01_data_cleaning # Run a single script via runner pattern
just stata-config # Show Stata configuration
just help # See available commandsThe project uses setroot to find the project root via the .here marker file.
This enables:
- Scripts work from any directory (no
c(pwd)dependency) - No user-specific
if c(user)blocks needed - Full adopath isolation (only BASE + local
ado/) for reproducibility - Runner pattern for individual script execution with proper environment
just lint-py # Lint Python code with ruff
just fmt-python # Format Python code with ruff
just fmt-markdown # Format all markdown files
just lint-stata # Lint Stata do-files with stata_linter
just fmt-all # Run all formatting and lintingFor large projects where full builds take >5 minutes:
just stata-build # Build with dependency tracking (only rebuilds changed files)
just stata-data # Build only data pipeline
just stata-analysis # Build only analysis
just stata-clean # Clean all outputsjust lab # Launch Jupyter Lab for analysis
just preview-docs # Preview Quarto documentation
just build-docs # Build Quarto documentationjust update-reqs # Update uv.lock and pre-commit hooks
just clean # Remove virtual environment- pystatacons: Primary dependency for Stata integration in Python environment
- ipaplots: IPA-branded Stata visualization scheme (recommended for IPA staff)
- stata_linter: World Bank DIME Analytics tool for Stata code quality enforcement
- ruff: Python linting and formatting
- pre-commit: Git hook framework for code quality
- codespell: Spell checking
- markdownlint-cli2: Markdown formatting and linting
- uv: Python package and environment management
- just: Command runner for development tasks
- Follows IPA Data Cleaning Guide principles and Stata coding standards
- Uses global macros for file paths (IPA best practice)
- Implements defensive programming with assert statements
- Uses IPA extended missing value conventions (.d/.o/.n/.r/.s)
- Variable naming follows IPA conventions with descriptive prefixes
- Conservative
maxvardefault (5000) - increase only for genuinely wide datasets - Automatically uses ipaplots theme when available for IPA-branded visualizations
- Integrated stata_linter for automatic code quality checking and best practice enforcement
- Requirements-based Stata package management system for reproducible environments
- Uses
Justfilefor cross-platform task automation - Python virtual environment managed by
uvin.venv/ - Pre-commit hooks configured for code quality enforcement
- Supports Windows, macOS, and Linux development environments
- Ready for Stata analysis workflows through pystatacons integration
Before increasing maxvar, consider:
- Load only needed columns:
use var1 var2 using "data.dta" - Reshape to long format: Wide loops are slow; long operations are fast
- Modularize: Clean one survey module at a time, not entire survey