Practice NLP and relevant libraries, this time in Python.
- Regular expressions & word tokenization: basic NLP concepts, such as word tokenization and regular expressions to help parse text. Also how to handle non-English text and more difficult tokenization we might find.
- Topic identification: Identify topics from texts based on term frequencies. We do experiment and compare two simple methods: bag-of-words and Tf-idf using NLTK, and a new library Gensim.
- Named-entity recognition: Identify the who, what, and where of our texts using pre-trained models on English and non-English text. Also how to use polyglot and spaCy, to add to NLP toolbox.
- Fake News Classifier: With basics along with supervised ML we build a "fake news" detector.