Small data engineering task related to open data given by the city of Helsinki related to real estates

What the code does

Reads the .csv-file containing the data and drops some unnecessary columns and NaN values
Reads the dataframe to a SQL table using SQLite
Calculates how the fraction of residential buildings in all buildings regarding to floor area has developed over the last three years 2020, 2021 and 2022
Does some very basic data visualization by scatterplotting build date, floor area and floor count attributes

How to run the code

You can clone this repository and do the following commands in the directory:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 ./app.py

TODO:

The code has some comments, where it could be improved and refactored. This program might get some updates over time as it was made in a very short time. Still here is a list, where it could be improved:

CHECKING THE SCHEMA: When the data is transferred from the .csv-file to a Pandas dataframe and again to a SQL table, the schema gets altered. For example, some of the data contains a number presented as a string, which then get transformed in a funny way along the way. For example postal codes are handled as FLOATS in the SQLite, like postal code "00100" is presented as 100.0 in the SQL table. These should be fixed.
REFACTORING: There is a lot of room for improvement regarding refactoring code. For example, the visualizing and database handling could be separated to separate modules. Also the queries are a bit of copy-paste, as the year and building codes only changes. Doing a single function for the computing, which then gets the year and building codes as a parameter would be a solid fix in the first place.
DATA VISUALIZATION: The data visualization is very basic, as mentioned before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Small data engineering task related to open data given by the city of Helsinki related to real estates

What the code does

How to run the code

TODO:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Small data engineering task related to open data given by the city of Helsinki related to real estates

What the code does

How to run the code

TODO: