Skip to content

Using PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Then using PySpark, Pandas, & SQL to determine if there is any bias toward favorable reviews from Vine members in the dataset.

Notifications You must be signed in to change notification settings

Deving789/Amazon_Vine_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Amazon_Vine_Analysis

Overview of the analysis:

  • In this project I have picked a data set from amazon reviews regarding video games and used Pyspark to perform the ETL process by extracting the data, transforming the data and connecting to the database that was generated for me through the AWS webserver. With the reviews my goal is to try and determine if there is favorable review bias from the Vine members of our data set.

Results:

How many Vine reviews and non-Vine reviews were there?

  • There were a total of of 4,291 vine reviews in our dataset, and 40,471 non-vine reviews in the complete dataset.

nonvne

How many Vine reviews were 5 stars? How many non-Vine reviews were 5 stars?

  • In the data set their was a total of 15,711 5-star reviews
  • 15,663 of the 5-star reviews were non-vine

What percentage of Vine reviews were 5 stars? What percentage of non-Vine reviews were 5 stars?

  • 38.2% of the five_star reviews were vine
  • 38.9% of the five_star reviews were non-vine

Screen Shot 2020-11-08 at 6 59 01 PM

Summary:

-After I had come up with my analysis there does not appear to be any sort of positivity bias because the percentages shown above are very similar at 38%. To conclude the analysis the vine program does not show any bias.

About

Using PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Then using PySpark, Pandas, & SQL to determine if there is any bias toward favorable reviews from Vine members in the dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published