Skip to content

Create unique compound index on (name, associated_studies) in biosample_set collection#1430

Merged
eecavanna merged 16 commits intomainfrom
1417-make-a-compound-unique-key-constraint-in-mongo-for-biosamples
Apr 17, 2026
Merged

Create unique compound index on (name, associated_studies) in biosample_set collection#1430
eecavanna merged 16 commits intomainfrom
1417-make-a-compound-unique-key-constraint-in-mongo-for-biosamples

Conversation

@eecavanna
Copy link
Copy Markdown
Collaborator

@eecavanna eecavanna commented Apr 11, 2026

On this branch, I updated the Runtime's bootup routine to ensure the Mongo database has a unique compound index on (name, associated_studies), so that no two biosamples associated with the same study can have the same name.

Details

I also implemented a generic test that demonstrates that such a unique compound index works the way I expect.

Related issue(s)

Fixes #1417

Related subsystem(s)

  • Runtime API (except the Minter)
  • Minter
  • Dagster
  • Project documentation (in the docs directory)
  • Translators (metadata ingest pipelines)
  • MongoDB migrations
  • Other

Testing

  • I tested these changes (explain below)
  • I did not test these changes

I tested these changes by confirming all tests pass, including the newly-introduced one.

Documentation

  • I have not checked for relevant documentation yet (e.g. in the docs directory)
  • I have updated all relevant documentation so it will remain accurate
  • Other (explain below)

Maintainability

  • Every Python function I defined includes a docstring (test functions are exempt from this)
  • Every Python function parameter I introduced includes a type hint (e.g. study_id: str)
  • All "to do" or "fix me" Python comments I added begin with either # TODO or # FIXME
  • I used black to format all the Python files I created/modified
  • The PR title is in the imperative mood (e.g. "Do X") and not the declarative mood (e.g. "Does X" or "Did X")

@eecavanna eecavanna self-assigned this Apr 11, 2026
@eecavanna eecavanna linked an issue Apr 11, 2026 that may be closed by this pull request
1 task
@eecavanna eecavanna requested a review from Copilot April 11, 2026 07:10
@eecavanna eecavanna marked this pull request as ready for review April 11, 2026 07:10

This comment was marked as resolved.

@aclum
Copy link
Copy Markdown
Contributor

aclum commented Apr 16, 2026

I see this covers json:submit and you can't change an ID with changesheets. What about queries:run updates?

@eecavanna
Copy link
Copy Markdown
Collaborator Author

Maybe needless to say: Once the index exists in Mongo, Mongo, itself, will prevent operations that violate the index.

The things I'm adding to this branch are about UX (user experience). Since some writes happen via Dagster, the endpoint (and user) doesn't have a way to know whether the write succeeds or fails (unless they follow up with Dagster). Currently, the endpoint will say "yes, I've created the job to perform the write," whether that write will eventually succeed or fail. So, I'm updating the endpoints to check whether the write would (if performed right now) fail, so the endpoint can return an actionable response to the user.

aclum
aclum previously approved these changes Apr 16, 2026
@eecavanna
Copy link
Copy Markdown
Collaborator Author

eecavanna commented Apr 17, 2026

For now, only /metadata/json:submit will return an error response when the requested change would violate the uniqueness index. I added TODO comments about updating /metadata/changesheets:validate and /metadata/changesheets:submit (people can introduce duplicate name values through that endpoint) and /queries:run to do the same. I'll file follow-on tickets about those.

In short, once I merge this PR into main:

  1. The MongoDB index will prohibit multiple biosamples associated with the same study from having the same name as one another.
  2. When someone submits something via /metadata/json:submit that would violate that, the API will return an error response. This PR contains a test demonstrating that.
  3. When someone submits something via either /metadata/changesheets:validate, /metadata/changesheets:submit, or /queries:run that would violate that, the API will not return an error response, but the downstream write performed by Dagster will fail (due to item 1 above).
  4. I'll create two follow-on tickets: one about /metadata/changesheets:validate/ and /metadata/changesheets:submit and one about /queries:run

@eecavanna eecavanna merged commit 67770b8 into main Apr 17, 2026
2 checks passed
@eecavanna eecavanna deleted the 1417-make-a-compound-unique-key-constraint-in-mongo-for-biosamples branch April 17, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make a compound unique key constraint in mongo for biosamples

3 participants