Need to 'regularize' the categories (e.g. job title, industry, etc) in order to visualize. Right now it's extremely simple (difflib.get_close_matches() which maybe uses some sort of edit distance?).
A better way would probably be word2vec or some kind of NER. Maybe can't do much dynamically (e.g. retrain a word2vec based on new inputs) because all the inputs are going to come in at once. But could retrain a model before plotting, say.
Need to 'regularize' the categories (e.g. job title, industry, etc) in order to visualize. Right now it's extremely simple (
difflib.get_close_matches()which maybe uses some sort of edit distance?).A better way would probably be word2vec or some kind of NER. Maybe can't do much dynamically (e.g. retrain a word2vec based on new inputs) because all the inputs are going to come in at once. But could retrain a model before plotting, say.