Skip to content

Big data project to analyse 1.1 billion taxi trips in NYC and create a forecasting model

Notifications You must be signed in to change notification settings

magrathj/NYC-Taxi-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data - NYC Taxi Data Analysis & Time Series Forecasting

This repo provides scripts to download, process, and analyze data for billions of taxi and for-hire vehicle (Uber, Lyft, etc.) trips originating in New York City since 2009. The data is stored in a PostgreSQL database, and uses PostGIS for spatial calculations.

Statistics through June 30, 2019:

  • 2.45 billion total trips
  • 1.65 billion taxi
  • 800 million for-hire vehicle
  • 279 GB of raw data
  • Database takes up 378 GB on disk with minimal indexes

Instructions to use in Azure Postgres instance of Postgres and PostGIS

  1. Create an instance of Postgres in Azure

  2. Download raw data python Setup/download_data.py

  3. Modify paths in the config.py script and run it to load the csv data into the DB. Then populate the database: python Setup/etl_db.py --host="<server-name>" --port=5432 --user="<admin-username>" --dbname="<database-name>" --password="<admin-password>" --sslmode="require"

  4. Analysis Additional Postgres and R scripts for analysis are in the analysis/ folder

Instructions to use in local instance of Postgres and PostGIS

  1. Install Docker

  2. To run the server: docker run -d --name ht_pg_server -v ht_dbdata:/var/lib/postgresql/data -p 54320:5432 postgres:11

  3. Check the logs to see if it is running: docker logs -f ht_pg_server

  4. Create the database: docker exec -it ht_pg_server psql -U postgres -c "create database postgres"

  5. Download raw data python Setup/download_data.py

  6. Modify paths in the config.py script and run it to load the csv data into the DB. Then populate the database: python Setup/etl_db.py --host="localhost" --port=5432 --user="<admin-username>" --dbname="postgres" --password="<admin-password>" --sslmode="allow"

  7. Analysis Additional Postgres and R scripts for analysis are in the analysis/ folder, or you can do your own!

useful links

About

Big data project to analyse 1.1 billion taxi trips in NYC and create a forecasting model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages