NLP-Base

Practice NLP and relevant libraries, this time in Python.

Regular expressions & word tokenization: basic NLP concepts, such as word tokenization and regular expressions to help parse text. Also how to handle non-English text and more difficult tokenization we might find.
Topic identification: Identify topics from texts based on term frequencies. We do experiment and compare two simple methods: bag-of-words and Tf-idf using NLTK, and a new library Gensim.
Named-entity recognition: Identify the who, what, and where of our texts using pre-trained models on English and non-English text. Also how to use polyglot and spaCy, to add to NLP toolbox.
Fake News Classifier: With basics along with supervised ML we build a "fake news" detector.

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
.vscode		.vscode
Data		Data
Sentiment-Analysis-Adv		Sentiment-Analysis-Adv
.DS_Store		.DS_Store
.gitignore		.gitignore
Code.py		Code.py
FakeNewsClassification.ipynb		FakeNewsClassification.ipynb
Named-entity recognition.ipynb		Named-entity recognition.ipynb
Numeric Features.ipynb		Numeric Features.ipynb
README.md		README.md
Regular expressions & word tokenization.ipynb		Regular expressions & word tokenization.ipynb
Sentiment intro.ipynb		Sentiment intro.ipynb
Topic Identification.ipynb		Topic Identification.ipynb
Tweets Sentiment.ipynb		Tweets Sentiment.ipynb
chatbotcloud.py		chatbotcloud.py
spaCy-polyglot.ipynb		spaCy-polyglot.ipynb

Provide feedback