Credit: Kristina Tripkovic
According to the World Health Organization:
"Close to 800,000 people die due to suicide every year, which is one person every 40 seconds. Suicide is a global phenomenon and occurs throughout the lifespan. Effective and evidence-based interventions can be implemented at population, sub-population and individual levels to prevent suicide and suicide attempts. There are indications that for each adult who died by suicide there may have been more than 20 others attempting suicide."
Libraries you will need
- pandas
- numpy
-
Download data from "Data Sources"
-
Transformation
-
Jupyter Notebook`
- Import Original CSVs
- Filter columns ("Country", "Freedom Rank", "Freedom Score", "Suicide Rate per 100k", "Happiness Rank", "Happiness Score", "Fifa Score", "Fifa Total Points")
- Sort Null Value rows onto a different dataframe
- Fix duplicate country names with different spelling and combine rows
- Concatenate above data set with original data set so that both only contain full rows and exclude countries with incomplete information
- Export to CSV
-
Postgres
- Import Original CSVs
- Utilize Full Outer Join to identify inconsistency in country names
- Modify identified country names
- Store country -
output/CleanFifa.csv
,output/CleanSuicide.csv
,output/CleanHappiness.csv
,output/CleanFreedom.csv
-
-
Load
- Schema -
sql/schema.sql
- Data -
sql/queries.sql
- FIFA data
- Freedom data
- Happiness data
- Schema -
Human Freedom Index (.csv)
FIFA World Rankings (.csv)
World Happiness Report (.csv)
World Health Organization (.csv)