Skip to content

saksham-2000/dane-demographics-explorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dane County Demographics Explorer

An interactive Shiny app for exploring demographic distributions across Dane County, Wisconsin, and comparing individual census tracts against the county baseline.

Live demo: dane-demographics-explorer

App screenshot

About

This project visualizes how representative any given census tract is of Dane County as a whole across three demographic dimensions: race/ethnicity, educational attainment, and household income. Users can click any tract on the map to see its demographic distribution side-by-side with the county average.

The app was built as a learning project inspired by the UW-Madison Data Science Institute's Knowledge Map tool, which assesses survey representativeness across geographic units using distributional comparisons. The low-sample-size warning in this app reflects one of Knowledge Map's open research questions: how to flag comparisons where the underlying population is too small to produce reliable representativeness scores.

Features

  • Interactive choropleth map of Dane County census tracts, colored by the selected demographic variable
  • Three demographic categories: race/ethnicity composition, educational attainment, and median household income
  • Click-to-compare: select any tract to see its distribution overlaid against the county baseline
  • Low-N warning: tracts with populations below reliable-comparison thresholds are flagged automatically
  • Handles edge cases: tracts dominated by non-household populations (e.g., university dorms) are surfaced as NA rather than hidden

Data source

Demographic data is pulled from the U.S. Census Bureau's American Community Survey (ACS) 5-year estimates, 2022 vintage, at the census tract level for Dane County, Wisconsin. Spatial geometries come from the Census Bureau's TIGER/Line shapefiles, accessed via the tidycensus R package.

The cached data file (data/dane_acs.rds) is committed to this repository so the app runs without requiring a Census API key. To refresh or regenerate the data, see the "Rebuilding the data cache" section below.

Running locally

Requirements

  • R 4.0 or later
  • The following R packages: shiny, dplyr, ggplot2, leaflet, sf, scales

Install them with:

install.packages(c("shiny", "dplyr", "ggplot2", "leaflet", "sf", "scales"))

Note: sf requires system-level GDAL/GEOS libraries. On macOS: brew install gdal. On Ubuntu: sudo apt-get install libgdal-dev libgeos-dev libproj-dev. On Windows, the CRAN binary includes these.

Launch

From the project root:

shiny::runApp()

The app will open in your default browser at http://127.0.0.1:XXXX.

Rebuilding the data cache

If you want to refresh the Census data (e.g., to use a newer ACS vintage or add variables), you'll need a free Census API key:

  1. Request one at api.census.gov/data/key_signup.html
  2. Register it with tidycensus:
   tidycensus::census_api_key("YOUR_KEY_HERE", install = TRUE)
  1. Restart R, then run the fetch script:
   source("R/data_fetch.R")

This overwrites data/dane_acs.rds with fresh data.

Project structure

dane-demographics-explorer/
├── app.R                  Main Shiny application (UI + server)
├── R/
│   └── data_fetch.R       One-time Census ACS data pull and cache
├── data/
│   └── dane_acs.rds       Cached demographic data (committed for reproducibility)
├── README.md
└── LICENSE

Design notes

Why percentages for Race and Education, but dollars for Income? Race and Education are compositional (each tract's population breaks down across mutually exclusive categories that sum to 100%), so percentage comparisons make tracts of different sizes directly comparable. Income is reported as a single median value per tract, with no meaningful denominator.

County baseline for Income. The county "average" shown for Income is the mean of tract-level medians, not the true county median. Computing the true median would require the underlying household-level data, which isn't publicly available at this resolution. Mean of medians is a common approximation and is adequate for rough comparison, though it undercounts the influence of populous tracts.

Handling NA tracts. Some tracts (notably those containing UW-Madison's dorms) report NA for median household income because the Census collects income at the household level, and areas dominated by group quarters have too few households to publish reliable medians. These tracts are shown in gray on the map.

Known limitations and future work

  • Education data uses a simplified variable set (5 attainment levels). A more rigorous analysis would use all 25 levels from Census table B15003 and bucket them carefully.
  • Margin of error (MOE) from the ACS is not currently displayed. Incorporating MOE bands on the comparison chart would help users understand which differences are statistically meaningful.
  • The low-N warning uses a fixed threshold (500 and 1,000). A more principled approach would use confidence intervals based on sample size, following the framework in Qi et al. (2021), which is the methodology underlying Knowledge Map's representativeness scoring.

Acknowledgments

  • Data: U.S. Census Bureau, American Community Survey (2022 5-year estimates)
  • Tooling: built with Shiny, Leaflet, tidycensus, and sf
  • Inspired by the Knowledge Map project (Chan Zuckerberg Initiative, 2022–2024)

License

MIT. See LICENSE.

About

Interactive R/Shiny dashboard for exploring demographic representativeness across Dane County census tracts using ACS data and reactive choropleth visualizations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages