Skip to content

NordineQuadar/CS501-Project

Repository files navigation

Project Scope:

The main goal of this project is to perform advanced analysis on product reviews using data analytics methods and machine learning algorithms. Amazon product reviews will be used as a case study. The final outcomes of this project are:

  1. Perform data cleaning.
  2. Apply some data exploration methods to get some initial insights about the data.
  3. Train a classifier model to perform sentimental analysis.
  4. Use pre-trained deep learning model (Bert) to perform sentimental analysis.
  5. Compare both methods.
  6. Extract useful insights from the sentimental analysis to help understand the quality/issues/satisfaction of a specific product.
  7. If time permits, a QA method will be used to help users quickly explore/search information in reviews of a specific product.

Dataset:

It is a large crawl of product reviews from Amazon. This dataset contains 82.83 million unique reviews, from around 20 million users. In this project, only Appliances dataset is used to make the processing time faster. An example of a review is shwon below.

Note: Dataset file is lage to upload in Github, please use the link below to upload the same used Appliances dataset.

Example:
{
"reviewerID": "A2SUAM1J3GNN3B",
"asin": "0000013714",
"reviewerName": "J. McDonald",
"helpful": [2, 3],
"reviewText": "I bought this for my husband who plays the piano. He is having a wonderful time playing these old hymns. The music is at times hard to read because we think the book was published for singing from more than playing from. Great purchase though!",
"overall": 5.0,
"summary": "Heavenly Highway Hymns",
"unixReviewTime": 1252800000,
"reviewTime": "09 13, 2009"
}

Link: https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/

About

Product Reviews Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published