Skip to content

Predict whether Steam games have many or few ratings (binary classification)

Notifications You must be signed in to change notification settings

jsngn/steam-machine-learning

Repository files navigation

steam-machine-learning

Description

A machine learning project using data about 27,000 Steam games.

The objective is to predict whether a game has many/few ratings based on its other attributes. I got the model to predict with ~80% accuracy.

I started this project because I wanted to try a binary classification problem with all binarized data.

Jupyter Notebooks

IMPORTANT: Please try this fix if the notebooks don't load on GitHub.

steam_clean_data.ipynb: prepares data to be fed into the models

steam_models.ipynb: evaluates a handful of models then tunes RFC

Data Sources

steam.csv: https://www.kaggle.com/nikdavis/steam-store-games (all credits to original author; see link for author & license info)

top-dev-pub.csv: https://steam250.com/developer and https://steam250.com/publisher (retrieved on 08/19/2019)

steam-cleaned.csv: self-cleaned data using steam.csv

Visualizations

visualizations: provides larger versions of graphs that got obscured in steam_clean_data.ipynb

Note: achievements_vs_total_ratings.png & owners_mid_vs_total_ratings.png show only a part of the full graph to demonstrate a point; low_price_vs_total_ratings.png & high_price_vs_total_ratings.png show 2 parts of the graph to demonstrate a point. For these graphs, the range of data on the x-axis is so big that including the whole graph would obscure the labels.

Releases

No releases published

Packages

No packages published