The main goal of this project is to perform advanced analysis on product reviews using data analytics methods and machine learning algorithms. Amazon product reviews will be used as a case study. The final outcomes of this project are:
- Perform data cleaning.
- Apply some data exploration methods to get some initial insights about the data.
- Train a classifier model to perform sentimental analysis.
- Use pre-trained deep learning model (Bert) to perform sentimental analysis.
- Compare both methods.
- Extract useful insights from the sentimental analysis to help understand the quality/issues/satisfaction of a specific product.
- If time permits, a QA method will be used to help users quickly explore/search information in reviews of a specific product.
It is a large crawl of product reviews from Amazon. This dataset contains 82.83 million unique reviews, from around 20 million users. In this project, only Appliances dataset is used to make the processing time faster. An example of a review is shwon below.
Note: Dataset file is lage to upload in Github, please use the link below to upload the same used Appliances dataset.
Example:
{
"reviewerID": "A2SUAM1J3GNN3B",
"asin": "0000013714",
"reviewerName": "J. McDonald",
"helpful": [2, 3],
"reviewText": "I bought this for my husband who plays the piano. He is having a wonderful time playing these old hymns. The music is at times hard to read because we think the book was published for singing from more than playing from. Great purchase though!",
"overall": 5.0,
"summary": "Heavenly Highway Hymns",
"unixReviewTime": 1252800000,
"reviewTime": "09 13, 2009"
}