Skip to content

Feature/extract more info from rdf#71

Open
Larzans wants to merge 4 commits intogarethbjohnson:masterfrom
Larzans:feature/extract_more_info_from_rdf
Open

Feature/extract more info from rdf#71
Larzans wants to merge 4 commits intogarethbjohnson:masterfrom
Larzans:feature/extract_more_info_from_rdf

Conversation

@Larzans
Copy link
Copy Markdown

@Larzans Larzans commented Mar 14, 2026

This pull request modifies the updatecatalog command and extracts the fields

  • published year
  • wikipedia url
  • reading score (string) + value (float)
  • related books
    from a book's rdf file and adds it to the books model.

It also extracts the gutenberg_id for a person and adds it to the person model.

Lars E added 4 commits March 9, 2026 14:42
Extracts dcterms:issued (publication year) and Wikipedia URL from
dcterms:description in each Gutenberg RDF file and stores both fields
directly on the Book model.
Extracts the numeric agent ID from the rdf:about attribute on pg:agent elements in the catalog RDF and stores it as Person.gutenberg_id (unique, nullable). updatecatalog now resolves persons by this stable ID rather than name+birth+death, preventing duplicate Person rows for authors with inconsistent name spellings across catalog entries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant