Feature/extract more info from rdf by Larzans · Pull Request #71 · garethbjohnson/gutendex

Larzans · 2026-03-14T21:26:36Z

This pull request modifies the updatecatalog command and extracts the fields

published year
wikipedia url
reading score (string) + value (float)
related books
from a book's rdf file and adds it to the books model.

It also extracts the gutenberg_id for a person and adds it to the person model.

Extracts dcterms:issued (publication year) and Wikipedia URL from dcterms:description in each Gutenberg RDF file and stores both fields directly on the Book model.

…marc908

Extracts the numeric agent ID from the rdf:about attribute on pg:agent elements in the catalog RDF and stores it as Person.gutenberg_id (unique, nullable). updatecatalog now resolves persons by this stable ID rather than name+birth+death, preventing duplicate Person rows for authors with inconsistent name spellings across catalog entries.

Lars E added 4 commits March 9, 2026 14:42

Add published_year and wikipedia_url to Book

82a2221

Extracts dcterms:issued (publication year) and Wikipedia URL from dcterms:description in each Gutenberg RDF file and stores both fields directly on the Book model.

books: add reading_score and reading_score_value fields from pgterms:…

32a09b8

…marc908

Adding related books parsed from description

9c00de6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/extract more info from rdf#71

Feature/extract more info from rdf#71
Larzans wants to merge 4 commits intogarethbjohnson:masterfrom
Larzans:feature/extract_more_info_from_rdf

Larzans commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Larzans commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant