Vocabulary needs to also be shortened in #removeNgramsWithCountsLessThan:

In the method we are deleting ngrams and reducing history counts, i think vocabulary needs to be cleaned up too (when word history becomes zero, for instance).

The main idea of this method is to get rid of tokens and their sequences that we find irrelevant, in order to speed up reading from file or lookup within the model. And in this case always keeping all the vocabulary entries defeats the purpose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vocabulary needs to also be shortened in #removeNgramsWithCountsLessThan: #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vocabulary needs to also be shortened in #removeNgramsWithCountsLessThan: #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions