Term project for ENGL 681 Introduction to Natural Language Processing at RIT, 2017.
Public web forums allow for massive online debate, especially on the community platform Reddit. The language of persuasive arguments can be found on the sub-community /r/ChangeMyView, which encourages sharing views and discourse in a moderated public forum. To determine if there are any similarities between persuasive comments that were successful in changing a user's view we organized and labeled sets of argument examples, and found valuable features for classifying novel arguments through language modeling. Our model results saw 6% improved accuracy over the baseline, concluding that there are identifiable stylistic and topic features in effective arguments.
To download code: $ git clone https://github.com/glebpro/nlptermproject2017
To download corpus: download link. Data format explained here.
To generate your own corpus, gathering posts backwards in time from now:
- Populate
corpus_utils/reddit.auth.json
with your reddit credentials - Run
$ python corpus_utils/download.py num_posts_to_collect
To generate comment pairs, use:
$ python corpus_utils/comment_pairs.py CMV_##.jsonlist
For the classifier:
- To extract features:
$ python model_files/get_features.py comment_pairs.jsonlist
- To train new model:
$ python model_files/model.py
- steps 1 and 2 might take hours
- To explore results:
$ python model_files/explore.py
- to print results from included model
Additional utility scrips included for parsing posts and comments
python >= 3.4, praw >= 5.2, spacy >= 2.0.3, sklearn >= 0.18.1, numpy >= 1.13.3, nltk >= 3.2.5
MIT licensed. See the bundled LICENSE file for more details.