Skip to content

Extend obographs-solr.py to load parents and ancestors #36

@dosumis

Description

@dosumis

STATUS: DRAFT

For the CAP project, we would like to load parents and ancestors to SOLR (storing labels and curies for each node). This needs to be configurable by relation and to allow specification of upper bounds.

obographs-solr.py is the current loader so it would be simplest to just extend this. As this is both a runner script and a collection of functions, args should be shifted to use argparse and new functionality should be driven by optional args. This will ensure that current uses of the script (e.g. in VFB) will remain unaffected.

This script already uses OBOgraphs json format to load labels and synonyms to load content to SOLR. OAK can load these data structures and has an interface that makes it easy to get lists of descendants or ancestors.

Suggested new args:

--add-ancestors {path to file of curies specifying relations to follow - default = subClassOf} --upper-bounds {path to file of curies specifying upper bounds}

For each each term in the upper bound list, generate a list of descendants. (UBD)
For each term loaded, generate list of ancestors. Load the intersection of this list with UBD.

Potential concerns: Scaling
Possible alternative - just use an ubergraph for queries?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions