Skip to content

This is the repo for the group learning about Natural Language Processing.

Notifications You must be signed in to change notification settings

cindyluba/nlpgroup

 
 

Repository files navigation

nlpgroup

Introduction

This is the repo for the group learning about Natural Language Processing.

Description

This project aims to capture major decade trends that occur in New York Times article titles. This will be accomplished by pulling article titles from the NYT Archives API and performing text analysis. The goal is to find words that are important for a given decade relative to all other decades. The important concepts from a decade will be visualized using word clouds. This project will be built using python with the nltk and wordcloud libraries.

Analysis

  • recessions vs expansion periods: what are significant words for each across all 150 years?
  • recessions through the years: what makes each 50-year period of recessions unique?
  • expansions through the years: what makes each 50-year period of expansions unique?

Presentation

  • introduction - research questions
  • explain data (nytapi)
  • explain methods (python api, tfidf, wordcloud)
  • show results 1
  • show results 2

Resources

Wheel Repository

Use this to install libraries (binaries) as wheel files. http://www.lfd.uci.edu/~gohlke/pythonlibs/

nyt api

We use the archives api from the nyt api. https://developer.nytimes.com/archive_api.json

nltk

We can start learning NLP by going through this tutorial. http://www.nltk.org/book/

gensim

gensim is a python library https://radimrehurek.com/gensim/models/word2vec.html

About

This is the repo for the group learning about Natural Language Processing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%