Skip to content

Latest commit

 

History

History
23 lines (15 loc) · 542 Bytes

README.md

File metadata and controls

23 lines (15 loc) · 542 Bytes

Text Data Templates

This repo contains R scripts for cleaning and preparing text data for further analysis. I will also provide simple templates of some popular text analysis methods such as Word2Vec, topic modeling (structural topic modeling, or LDA).

In general my text-data-cleaning process is as follows:

  1. remove emojis
  2. remove URLs
  3. remove language(s) that you don't use in the final analysis
  4. remove spams

Description of text data

  1. top words
  2. bigram
  3. trigram

Topic modeling

  1. LDA
  2. STM

Word2Vec