Skip to content

Python package to fetch, process and store electricity data from different providers.

Notifications You must be signed in to change notification settings

badrbmb/power-stash

Repository files navigation

power-stash

Python package to fetch, process and store electricity data from different providers.

Overview

power-stash is a python package designed to fecth electricity data (consumption, generation, exchange, prices, ...) from various sources, process them leveraging dask and pandas and store the resulting timeseries in a Timescale database.

The project is structured following an (attempt of) hexagonal architecture pattern, with the core functionality of the project housed under power_stash/services, interfaces defined under power_stash/models and i/o and external actors defined under power_stash/inputs and power_stash/outputs respectively.

Inputs from data providers currently implemented under power_stash/inputs are from ENSTOE, covering both hourly consumption, production, day ahead prices and exhanges accross all zones available.


Repository Structure

└── power-stash/
    ├── .env.sample
    ├── docker-compose.yml
    ├── power_stash
    │   ├── config.py
    │   ├── constants.py
    │   ├── inputs
    │   │   └── entsoe
    │   ├── main.py
    │   ├── models
    │   │   ├── fetcher.py
    │   │   ├── processor.py
    │   │   ├── request.py
    │   │   └── storage
    │   ├── outputs
    │   │   ├── database
    │   │   └── localfs
    │   ├── services
    │   │   └── service.py
    │   └── utils.py
    └── pyproject.toml

Modules

power_stash
File Summary
main.py This code enables data extraction, processing, and storage from specified electricity data sources, utilizing a configurable Dask cluster for parallel processing and a CLI for user interaction within the service's architecture.
power_stash.models
File Summary
request.py Defines an abstract request model and builder for time-bounded data fetching tasks, including validation and status tracking.
processor.py The processor.py defines an abstract base class for transforming raw data into a format suitable for database storage.
fetcher.py The fetcher.py defines an interface for fetching electricity data, integrating with external sources to supply data in a DataFrame, crucial for the modularity in data retrieval within the power-stash architecture.
power_stash.services
File Summary
service.py The PowerConsumerService in service.py orchestrates data flow: fetching, processing, storage, and error-handling, for power data within a parallelized, dask-powered pipeline.
power_stash.models.storage
File Summary
database.py Defines database models and repository interface for managing power data within parent architecture; represents requests and abstracts persistence operations.
blob.py The blob.py defines a storage interface for managing data blobs, crucial for the archiving functionality within the power-stash repository.
power_stash.outputs.database
File Summary
config.py The snippet defines database settings for a Postgres instance within power-stash architecture, handling secure credential storage and connection string construction.
tables.py Defines database hypertables, integrating energy data models with the repository's data storage architecture.
repository.py The SqlRepository class manages database operations, providing methods to initialize the database with TimeScale hypertables, check existence, add, update, and bulk insert records, and execute queries within the power-stash energy data infrastructure.
power_stash.outputs.localfs
File Summary
blob_client.py LocalClient in power_stash manages file storage, ensuring file existence, validating size, listing, storing datasets as Parquet, and deleting files or directories locally.
power_stash.inputs.entsoe
File Summary
config.py The config.py within power_stash/inputs/entsoe manages configuration, securely handling the ENTSO-E API token for data retrieval in the energy data platform's architecture.
models.py Models define energy data structure, parse raw records, and compute unique IDs for database integration within the energy data management system.
request.py The request.py module defines EntsoeRequest classes for querying power data by area and type, and builds default batch requests within the data ingestion pipeline of the Power Stash application.
processor.py EntsoeProcessor transforms ENTSO-E data, fitting it to database models for various request types within power data management system.
fetcher.py This fetcher.py within power-stash handles data retrieval from the ENTSO-E API, with retries for resilience, and provides electricity market data like load, generation, prices, and capacity.

Getting Started

Make sure you're ideally working on a new virtual env with Poetry installed. Install the dependencies with:

poetry install

Running power-stash

If you're using vs code, you might find some usefull launch scripts in .vscode/launch.json, otherwise feel free to have a look at power_stash/main.py help running:

python power_stash/main.py --help

Make sure you have a running instance of the TimescaleDB (to store the downloaded time-series). There's a docker-compose file to help with that, refer to docker-compose.yml for more details, or run the following command:

docker-compose up

Tests

To execute tests, run:

pytest

Acknowledgments

TODO

About

Python package to fetch, process and store electricity data from different providers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages