For example
15 § Rakennusjärjestyksen hyväksyminen
is tokenized to
15 rakennusjärjestys hyväksyä
which is all good.
However it is impossible to search from the search index using search term
§
because I presume it is filtered out along with lowercasing and punctuation filtering.
Is it possible to affect the way punctuations are filtered?
I guess we could add the character § here to include it in the tokenizations, but then it would cause problems in cases where § is mistyped, for example
15 §Rakennusjärjestyksen
Could not be understood by the Voikko -stemming rules.
Have you encountered similar requests from your clients?
Any advice on how this could be achieved in such a way that the change could be incorporated in this plugin?
For example
15 § Rakennusjärjestyksen hyväksyminenis tokenized to
15 rakennusjärjestys hyväksyäwhich is all good.
However it is impossible to search from the search index using search term
§because I presume it is filtered out along with lowercasing and punctuation filtering.
Is it possible to affect the way punctuations are filtered?
I guess we could add the character
§here to include it in the tokenizations, but then it would cause problems in cases where § is mistyped, for example15 §RakennusjärjestyksenCould not be understood by the Voikko -stemming rules.
Have you encountered similar requests from your clients?
Any advice on how this could be achieved in such a way that the change could be incorporated in this plugin?