Skip to content

docs: two-service docker-compose (streamlit + jupyter), README, notebook cardinality cleanup#7

Open
devin-ai-integration[bot] wants to merge 4 commits intomainfrom
devin/1777727064-notebook-iteration
Open

docs: two-service docker-compose (streamlit + jupyter), README, notebook cardinality cleanup#7
devin-ai-integration[bot] wants to merge 4 commits intomainfrom
devin/1777727064-notebook-iteration

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented May 2, 2026

Summary

Iteration on PR #6 addressing Federico's feedback. Three follow-ups, in order:

1. Two-service docker-compose.yml

Restructured into two explicit services that both start with the default docker compose up:

Service URL Purpose
streamlit http://localhost:8501 Interactive UI for spatial interpolation and autocorrelation (renamed from app)
jupyter http://localhost:8888/lab JupyterLab kernel for executing the documentation notebooks under notebooks/

Both services share the same Dockerfile (no extra build) and mount the same volumes (app, data, notebooks, output, scripts). The streamlit service keeps its existing healthcheck. The jupyter service runs JupyterLab on 0.0.0.0:8888 with token auth disabled (local-only — explicit warning in the YAML and the README about not exposing port 8888 to the public internet).

You can also start them individually:

docker compose up streamlit   # only the Streamlit app on :8501
docker compose up jupyter     # only the JupyterLab kernel on :8888

Makefile updated accordingly: make shell now opens a shell on the streamlit container; new make shell-jupyter target for the sidecar.

2. README.md documents both services + demo tile

  • Quick Start now lists both services with their URLs in a table, and explains how to start them together or individually.
  • New Demo data section explaining what data/_demo_crop_fd_test_tile_v1.geojson is: a synthetic 10×10 grid generated by scripts/generate_test_tile.py, not a real-world dataset. Documents the three cropping regimes (corn cluster top-right, soybean cluster bottom-left, alternating middle band) and how the alternating band is what triggers the NaN-LISA bug fixed in PR fix(autocorr): NaN-safe LISA, correct HL/LH labels, preserve field boundaries with hexgrid #5.
  • Project Structure updated to list notebooks/h3_resolution_effects.ipynb and scripts/generate_test_tile.py.

3. notebooks/h3_resolution_effects.ipynb — terminology cleanup (from earlier in this PR)

Replaced "moderately discrete" / "near-categorical" (ambiguous, informally invented terms) with an explicit n_unique-based classification table (binary-valued / low-cardinality / mid-cardinality / high-cardinality) and definitions of what each implies for Moran's I. Also added an inline comment + print clarifying that max_fields = 2000 is a defensive cap, not a target — the demo tile only contains 100 features regardless of the cap.

4. Track scripts/generate_test_tile.py

The script that generates the demo tile was previously untracked. Committed it so the demo data is reproducible and the README's regeneration instructions actually work.

Review & Testing Checklist for Human

  • docker compose up -d should start both services — verify by hitting http://localhost:8501 and http://localhost:8888/lab.
  • docker compose up streamlit should start only the streamlit service (sanity-check the README's "individual service" instructions).
  • make shell should open a bash shell inside the spatial-agriculture-toolkit-streamlit container.
  • Open notebooks/h3_resolution_effects.ipynb from JupyterLab and Run All. Expected runtime ≈ 30–60 s. Verify the dataframe still produces the PR fix(autocorr): NaN-safe LISA, correct HL/LH labels, preserve field boundaries with hexgrid #5 A2 baseline (crop_pct_2022 / res 9 → {ns: 608, LL: 165, HH: 149, LH: 8}).
  • Skim the new "Demo data" section of README.md — does it match how you understand the demo tile's role?

Notes

  • No production code in app/ was modified.
  • The Streamlit container's behaviour is functionally unchanged; only the service name (appstreamlit) and container name (spatial-agriculture-toolkitspatial-agriculture-toolkit-streamlit) changed.
  • Token auth is disabled in the jupyter service for local development convenience. Re-enable it (e.g. by removing --ServerApp.token=) if you ever expose the port externally.
  • Verified docker compose config --services returns [jupyter, streamlit] (both default).

Link to Devin session: https://app.devin.ai/sessions/3af4a156532e4dbd90cae861a1e73943
Requested by: @PyMap


Open in Devin Review

- Replace 'moderately discrete' / 'near-categorical' with explicit
  cardinality framing (n_unique with a glossary table)
- Add explicit max_fields cap = 2000 in the load cell + comment
  explaining the demo tile only has 100 features
- Mention Docker kernel options in the Setup markdown
- docker-compose.yml: add 'jupyter' service under the 'notebooks'
  profile that runs JupyterLab on :8888 against the same image.
  Activate with: docker compose --profile notebooks up jupyter

Addresses Federico's feedback on PR #6 regarding (1) running the
notebook inside Docker, (2) why max_fields=2000 yields 100 fields,
(3) what 'discrete' / 'categorical' meant in the original draft.

Co-Authored-By: Federico Catalano <fcatalano85@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

…DME + Makefile updates

- docker-compose.yml: rename 'app' service to 'streamlit' (explicit naming),
  remove the 'notebooks' profile from 'jupyter' so both services start with
  'docker compose up'. Each service has its own container_name, command, and
  port mapping. Both mount the same volumes including notebooks/.

- README.md: rewrite the Quick Start section to document both services
  (streamlit on :8501, jupyter on :8888/lab) and how to start them together
  or individually. Add a 'Demo data' section explaining what
  _demo_crop_fd_test_tile_v1.geojson is, who uses it (app, regression tests,
  notebook), and how to regenerate it via scripts/generate_test_tile.py.
  Update Project Structure to list notebooks/ and scripts/generate_test_tile.py.

- Makefile: update 'shell' / 'shell-temp' targets to point at the streamlit
  service instead of the now-removed 'app' service. Add 'shell-jupyter' for
  the new sidecar.

- scripts/generate_test_tile.py: commit the synthetic demo tile generator
  so the demo data is reproducible and discoverable. Referenced from README.

Co-Authored-By: Federico Catalano <fcatalano85@gmail.com>
@devin-ai-integration devin-ai-integration Bot changed the title docs(notebooks): clarify cardinality terminology + add jupyter sidecar docs: two-service docker-compose (streamlit + jupyter), README, notebook cardinality cleanup May 2, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

…posure

Devin Review flagged that removing 'profiles: [notebooks]' from the jupyter
service in the previous commit causes 'docker compose up' to launch
JupyterLab with no auth (--ServerApp.token=, --ServerApp.password=) and
--allow-root, bound to all host interfaces via '8888:8888'. Any device on
the same LAN could then execute arbitrary code as root inside the
container, which has rw mounts to ./app, ./data, ./notebooks, ./output,
./scripts.

Federico explicitly asked for the jupyter service to start by default with
'docker compose up' (no profile gating), so the fix here is the safer
binding rather than re-introducing the profile.

- docker-compose.yml: bind to '127.0.0.1:8888:8888' so JupyterLab is only
  reachable from the host machine, not from other LAN devices. Updated the
  inline comment with the rationale and instructions for re-enabling auth
  if remote access is needed.
- README.md: surface the localhost-only binding as a 'Security note' in
  the Quick Start section so users know what's happening and how to
  un-lock-down if they need remote access.

Co-Authored-By: Federico Catalano <fcatalano85@gmail.com>
Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

🐛 1 issue in files not directly in the diff

🐛 Demo data files are gitignored, contradicting README claim that repo "ships with" them (.gitignore:47-48)

The README added in this PR states "The repository ships with a synthetic demo tile at data/_demo_crop_fd_test_tile_v1.geojson plus a matching properties CSV at data/props__fd_test_tile_v1.csv", and the tracked notebook (notebooks/h3_resolution_effects.ipynb) directly loads _demo_crop_fd_test_tile_v1.geojson. However, the .gitignore contains both data/* (line 44) and *.geojson / *.csv (lines 47–48), which prevent these files from ever being committed. As confirmed by git check-ignore -v, both files are ignored. Anyone cloning the repo will have an empty data/ directory: the Streamlit tile picker will show nothing, and the notebook will fail with FileNotFoundError.

Negation rules needed in .gitignore

Add negation entries after the *.geojson and *.csv lines (since git uses last-match-wins):

!data/_demo_crop_fd_test_tile_v1.geojson
!data/props__fd_test_tile_v1.csv

Then git add -f the two files so they're tracked.

View 5 additional findings in Devin Review.

Open in Devin Review

Devin Review flagged that the README claim 'repository ships with a synthetic
demo tile at data/_demo_crop_fd_test_tile_v1.geojson plus a matching
properties CSV at data/props__fd_test_tile_v1.csv' was untrue: both files
were excluded by the .gitignore rules 'data/*', '*.geojson', and '*.csv',
so anyone cloning the repo got an empty data/ directory. The Streamlit
tile picker would show no tiles and the notebook would fail with
FileNotFoundError on FieldLoader.load_fields_with_properties.

Fix:
- .gitignore: add explicit negation entries for the two demo files. Per
  git's last-match-wins rule, the negations must come AFTER the *.geojson
  / *.csv globs.
- Track the actual demo data files (geojson 20 KB, csv 57 KB \u2014 cheap to
  commit; both are deterministically reproducible via
  scripts/generate_test_tile.py if they ever need to be regenerated).

After this commit, 'git check-ignore -v' returns the negation rule for both
files, and a fresh clone gets the working demo data without needing to run
the generator script.

Co-Authored-By: Federico Catalano <fcatalano85@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Fixed in commit a0feb06. Verified git check-ignore -v data/_demo_crop_fd_test_tile_v1.geojson data/props__fd_test_tile_v1.csv now returns the negation entries, and git ls-files data/ lists both files plus .gitkeep. A fresh clone now gets the working demo data without needing to run scripts/generate_test_tile.py. (Both files are also deterministically reproducible via the script if needed.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant