docs: two-service docker-compose (streamlit + jupyter), README, notebook cardinality cleanup#7
Conversation
- Replace 'moderately discrete' / 'near-categorical' with explicit cardinality framing (n_unique with a glossary table) - Add explicit max_fields cap = 2000 in the load cell + comment explaining the demo tile only has 100 features - Mention Docker kernel options in the Setup markdown - docker-compose.yml: add 'jupyter' service under the 'notebooks' profile that runs JupyterLab on :8888 against the same image. Activate with: docker compose --profile notebooks up jupyter Addresses Federico's feedback on PR #6 regarding (1) running the notebook inside Docker, (2) why max_fields=2000 yields 100 fields, (3) what 'discrete' / 'categorical' meant in the original draft. Co-Authored-By: Federico Catalano <fcatalano85@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
…DME + Makefile updates - docker-compose.yml: rename 'app' service to 'streamlit' (explicit naming), remove the 'notebooks' profile from 'jupyter' so both services start with 'docker compose up'. Each service has its own container_name, command, and port mapping. Both mount the same volumes including notebooks/. - README.md: rewrite the Quick Start section to document both services (streamlit on :8501, jupyter on :8888/lab) and how to start them together or individually. Add a 'Demo data' section explaining what _demo_crop_fd_test_tile_v1.geojson is, who uses it (app, regression tests, notebook), and how to regenerate it via scripts/generate_test_tile.py. Update Project Structure to list notebooks/ and scripts/generate_test_tile.py. - Makefile: update 'shell' / 'shell-temp' targets to point at the streamlit service instead of the now-removed 'app' service. Add 'shell-jupyter' for the new sidecar. - scripts/generate_test_tile.py: commit the synthetic demo tile generator so the demo data is reproducible and discoverable. Referenced from README. Co-Authored-By: Federico Catalano <fcatalano85@gmail.com>
…posure Devin Review flagged that removing 'profiles: [notebooks]' from the jupyter service in the previous commit causes 'docker compose up' to launch JupyterLab with no auth (--ServerApp.token=, --ServerApp.password=) and --allow-root, bound to all host interfaces via '8888:8888'. Any device on the same LAN could then execute arbitrary code as root inside the container, which has rw mounts to ./app, ./data, ./notebooks, ./output, ./scripts. Federico explicitly asked for the jupyter service to start by default with 'docker compose up' (no profile gating), so the fix here is the safer binding rather than re-introducing the profile. - docker-compose.yml: bind to '127.0.0.1:8888:8888' so JupyterLab is only reachable from the host machine, not from other LAN devices. Updated the inline comment with the rationale and instructions for re-enabling auth if remote access is needed. - README.md: surface the localhost-only binding as a 'Security note' in the Quick Start section so users know what's happening and how to un-lock-down if they need remote access. Co-Authored-By: Federico Catalano <fcatalano85@gmail.com>
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 Demo data files are gitignored, contradicting README claim that repo "ships with" them (.gitignore:47-48)
The README added in this PR states "The repository ships with a synthetic demo tile at data/_demo_crop_fd_test_tile_v1.geojson plus a matching properties CSV at data/props__fd_test_tile_v1.csv", and the tracked notebook (notebooks/h3_resolution_effects.ipynb) directly loads _demo_crop_fd_test_tile_v1.geojson. However, the .gitignore contains both data/* (line 44) and *.geojson / *.csv (lines 47–48), which prevent these files from ever being committed. As confirmed by git check-ignore -v, both files are ignored. Anyone cloning the repo will have an empty data/ directory: the Streamlit tile picker will show nothing, and the notebook will fail with FileNotFoundError.
Negation rules needed in .gitignore
Add negation entries after the *.geojson and *.csv lines (since git uses last-match-wins):
!data/_demo_crop_fd_test_tile_v1.geojson
!data/props__fd_test_tile_v1.csv
Then git add -f the two files so they're tracked.
View 5 additional findings in Devin Review.
Devin Review flagged that the README claim 'repository ships with a synthetic demo tile at data/_demo_crop_fd_test_tile_v1.geojson plus a matching properties CSV at data/props__fd_test_tile_v1.csv' was untrue: both files were excluded by the .gitignore rules 'data/*', '*.geojson', and '*.csv', so anyone cloning the repo got an empty data/ directory. The Streamlit tile picker would show no tiles and the notebook would fail with FileNotFoundError on FieldLoader.load_fields_with_properties. Fix: - .gitignore: add explicit negation entries for the two demo files. Per git's last-match-wins rule, the negations must come AFTER the *.geojson / *.csv globs. - Track the actual demo data files (geojson 20 KB, csv 57 KB \u2014 cheap to commit; both are deterministically reproducible via scripts/generate_test_tile.py if they ever need to be regenerated). After this commit, 'git check-ignore -v' returns the negation rule for both files, and a fresh clone gets the working demo data without needing to run the generator script. Co-Authored-By: Federico Catalano <fcatalano85@gmail.com>
|
Fixed in commit |
Summary
Iteration on PR #6 addressing Federico's feedback. Three follow-ups, in order:
1. Two-service
docker-compose.ymlRestructured into two explicit services that both start with the default
docker compose up:streamlitapp)jupyternotebooks/Both services share the same
Dockerfile(no extra build) and mount the same volumes (app,data,notebooks,output,scripts). Thestreamlitservice keeps its existing healthcheck. Thejupyterservice runs JupyterLab on0.0.0.0:8888with token auth disabled (local-only — explicit warning in the YAML and the README about not exposing port 8888 to the public internet).You can also start them individually:
Makefileupdated accordingly:make shellnow opens a shell on thestreamlitcontainer; newmake shell-jupytertarget for the sidecar.2.
README.mddocuments both services + demo tiledata/_demo_crop_fd_test_tile_v1.geojsonis: a synthetic 10×10 grid generated byscripts/generate_test_tile.py, not a real-world dataset. Documents the three cropping regimes (corn cluster top-right, soybean cluster bottom-left, alternating middle band) and how the alternating band is what triggers the NaN-LISA bug fixed in PR fix(autocorr): NaN-safe LISA, correct HL/LH labels, preserve field boundaries with hexgrid #5.notebooks/h3_resolution_effects.ipynbandscripts/generate_test_tile.py.3.
notebooks/h3_resolution_effects.ipynb— terminology cleanup (from earlier in this PR)Replaced "moderately discrete" / "near-categorical" (ambiguous, informally invented terms) with an explicit
n_unique-based classification table (binary-valued / low-cardinality / mid-cardinality / high-cardinality) and definitions of what each implies for Moran's I. Also added an inline comment + print clarifying thatmax_fields = 2000is a defensive cap, not a target — the demo tile only contains 100 features regardless of the cap.4. Track
scripts/generate_test_tile.pyThe script that generates the demo tile was previously untracked. Committed it so the demo data is reproducible and the README's regeneration instructions actually work.
Review & Testing Checklist for Human
docker compose up -dshould start both services — verify by hitting http://localhost:8501 and http://localhost:8888/lab.docker compose up streamlitshould start only the streamlit service (sanity-check the README's "individual service" instructions).make shellshould open a bash shell inside thespatial-agriculture-toolkit-streamlitcontainer.notebooks/h3_resolution_effects.ipynbfrom JupyterLab and Run All. Expected runtime ≈ 30–60 s. Verify the dataframe still produces the PR fix(autocorr): NaN-safe LISA, correct HL/LH labels, preserve field boundaries with hexgrid #5 A2 baseline (crop_pct_2022 / res 9 → {ns: 608, LL: 165, HH: 149, LH: 8}).README.md— does it match how you understand the demo tile's role?Notes
app/was modified.app→streamlit) and container name (spatial-agriculture-toolkit→spatial-agriculture-toolkit-streamlit) changed.jupyterservice for local development convenience. Re-enable it (e.g. by removing--ServerApp.token=) if you ever expose the port externally.docker compose config --servicesreturns[jupyter, streamlit](both default).Link to Devin session: https://app.devin.ai/sessions/3af4a156532e4dbd90cae861a1e73943
Requested by: @PyMap