An existential problem for any major website today is how to handle toxic and divisive content. Quora is a platform that empowers people to learn from each other. On Quora, people can ask questions and connect with others who contribute unique insights and quality answers. A key challenge is to weed out insincere questions -- those founded upon false premises, or that intend to make a statement rather than look for helpful answers. In this kaggle competition, Kagglers will develop models that identify and flag insincere questions.
- scikit-learn
- Keras
- Tensorflow
- Numpy
Quora_data_exploration.ipynbperforms basic data exploration
Quora_GRU_no_pretrain_embeddings.ipynbbuilt Gated Recurrent Unit(GRU) neural network model without pretrained word embeddings to do the text classificationQuora_GRU_with_pretrained_embeddings.ipynbbuilt Gated Recurrent Unit(GRU) neural network model with pretrained word embeddings (i.e. Glove)
- The evalution metric for this competiion is F1 Score
- The private score for GRU without pretrained word embeddings is 0.65286 (ranked 1242/4037, top 30.8%)
- The performance is improved with pretrained word embeddings (i.e., 0.67470, ranked 1182/4037, top 29%)
- The results may be further improved by spell check
- use multiple pretrained word embeddings (including GoogleNews, wiki-news-300d-1M), and make the classifications on top of multiple recurrent neural network models
