lastgenre: Genre spelling normalization (aliases)#6466
Conversation
|
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
|
20b2d88 to
90064fa
Compare
66a7d98 to
1785dab
Compare
1785dab to
c5a14d3
Compare
d0cc527 to
12e7358
Compare
|
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
| Choosing the Right Tool | ||
| ----------------------- | ||
|
|
||
| With multiple ways to filter and map genres, here is a quick guide on when to | ||
| use what: | ||
|
|
||
| - **Aliases**: Use these first to fix spelling variants and abbreviations (e.g., | ||
| ``dnb`` → ``drum and bass``). | ||
| - **Ignorelist**: Use this for error correction when Last.fm results are not | ||
| accurate, or for precise per-artist or global exclusions (e.g., rejecting | ||
| ``Metal`` for specific electronic artists). | ||
| - **Canonicalization**: Use this to automatically map specific sub-genres to | ||
| broader categories (e.g., ``Grindcore`` → ``Metal``). | ||
| - **Whitelist**: Use this to finally limit your library to a predefined set of | ||
| genres. When combined with canonicalization, the plugin will try to map a | ||
| sub-genre to its closest whitelisted parent. Anything else is dropped. | ||
|
|
There was a problem hiding this comment.
Reviewers: This new chapter is a good starting point for reviewing and might help thinking through where and when things should happen in the plugin and why multiple filtering features make sense for this plugin!
87c272c to
b12b314
Compare
b992e59 to
196862b
Compare
1d39506 to
570d1d3
Compare
There was a problem hiding this comment.
Pull request overview
PR add genre normalization (aliases) to lastgenre plugin, so many spelling/format variants map to one canonical genre before ignorelist/whitelist/c14n steps. PR also add bundled alias table, docs for users, and tests to lock behavior.
Changes:
- Add
normalize_genre()+ config loader forlastgenre.aliases(default on) and shipaliases.yaml. - Apply aliases in Last.fm client lookup path and in
_resolve_genres()pipeline. - Add docs + many tests for alias behavior and default bundled aliases.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/plugins/test_lastgenre.py | Add unit + integration tests for alias normalization and config parsing. |
| docs/plugins/lastgenre.rst | Document aliases feature + guidance on when to use aliases/ignorelist/c14n/whitelist. |
| beetsplug/lastgenre/utils.py | Add normalize_genre() and alias type defs; keep ignorelist helper. |
| beetsplug/lastgenre/client.py | Normalize tags via aliases before ignorelist filtering in _last_lookup(). |
| beetsplug/lastgenre/aliases.yaml | New bundled default alias patterns and templates. |
| beetsplug/lastgenre/init.py | Load aliases from config/bundled file; normalize early in _resolve_genres(). |
|
Interestingly, I can see some overlap with - album_fields: genre genres style
item_fields: genre genres style
replace:
" ?90 techno,?": ""
"-techno": " techno"
"hardt": "hard t"
"Techno .*": "techno"
"A": "a"
"B": "b"
"C": "c"
"D": "d"
"E": "e"
"F": "f"
"G": "g"
"H": "h"
"I": "i"
"J": "j"
"L": "l"
"M": "m"
"N": "n"
"O": "o"
"P": "p"
"R": "r"
"S": "s"
"T": "t"
"U": "u"
"W": "w"
"V": "v"
"Drum n [Bb]ass": "drum & bass"
"iDm": "idm"
"([0-9])0's": "\\0s"
experiemental: experimental
deeptechno: "deep techno"
"([a-z])techno": "\\1 techno"
psy-trance: psytrance
drumandbass: drum and bass
drum[' ]n[' ]bass: drum and bass
ukbass: uk bass |
|
one tool for a purpose usually is good and what we should aim for, but in this case the speciality of lastgenre aliases is that it streamlines genre spelling right after they are fetched from last.fm, and only then they get thrown into the lastgenre resolve machine. of course existing genres can also be normalized before they are put in the mix so yeah overlap is bad/not nice but in this case I can't imagine that importreplace is supposed to do all that :-) |
|
A while ago I built a quick and dirty last.fm CLI client to better understand and analyze the content of the service. These are the top 1000 tags used - we find quite some duplicate stuff in there already like Hip-Hop, Hip Hop, which requires normalizing early. This is a main use case for this plugin: |
|
Dropping these into an AI reveals there is quite som inconsistencies: https://www.perplexity.ai/search/these-are-the-top-1000-used-ta-Vg3KSMEyS26IzawtxSR49g I am aware that in the past some users tried to solve naming incosistencies with the canonicalization tree, but that is not the purpose of it and of course it is a tedious process to find all spelling inconsistencies word by word and entries in the canonicalization tree for each word doesnt make things easier. Normalizing different spellings of these is is a main use case this new lastgenre plugin feature now quite easily can solve:
|
|
Hm I still think this should be done somewhat centrally - since at least 4 plugins can provide genres: |
the word hallucinate is reserved for AI's only these days ☝️😂😂 but yeah we could keep in mind to make that part more central. good idea for now what do you think about moving on with this feature for lastgenre/last.fm only? In my opinion it's absolutely essential to finally really get the best out of the service. |
Add comment on alias and log in client Refactor aliases load using confuse MappingValues Get rid of drop/norm helper again
which also slightly improves readability after removing the redundant drop_ignore_genres helper.
Description
Requires #6449 to be merged first!
This PR introduces a regex-based normalization (alias) system to unify variant genre tags and improves the plugin's documentation.
The normalization feature uses an ordered list of regular expression aliases to map variant spellings or synonyms to a single canonical name. The mapping keys act as
re.Match.expand()templates, supporting\g<N>back-references to regex capture groups:normalize_genrebeforeis_ignoredin both theLastFmClient(for clean fetching/caching) and the core_resolve_genresloop (for uniform processing of existing file tags).