Skip to content

Data augmentation

Hou Shengren edited this page Aug 5, 2024 · 1 revision

Data Augmentation Model

This page provides a detailed explanation of the data augmentation model used in the RL-ADN framework. The data augmentation model enhances the robustness and generalizability of the trained policy by artificially expanding the diversity of the historical time-series data.

Key Components

ActivePowerDataManager Class

The ActivePowerDataManager class is a subclass of GeneralPowerDataManager that specifically handles active power data. It retrieves and preprocesses active power data from the dataset.

  • Attributes:

    • df: DataFrame containing the loaded data.
    • time_interval: Time interval between data points.
  • Methods:

    • __init__(self, datapath): Initializes the ActivePowerDataManager object with the path to the data file.
    • get_active_power_data(self): Retrieves and preprocesses active power data from the dataset.

TimeSeriesDataAugmentor Class

The TimeSeriesDataAugmentor class handles the generation of synthetic time-series data to enhance the robustness and generalizability of the trained policy. It uses various models for data augmentation, including Gaussian Mixture Models (GMM) and Gaussian Mixture Copulas (GMC).

  • Attributes:

    • data_manager: Instance of ActivePowerDataManager for handling data.
    • augmentation_model_name: Name of the chosen augmentation model (GMC, GMM, TC).
    • augmentation_model: The fitted augmentation model.
    • n_models: Number of GMM models based on the data interval.
  • Methods:

    • __init__(self, data_manager, augmentation_model_name): Initializes the TimeSeriesDataAugmentor with a data manager instance and the selected augmentation model.
    • _create_augmentation_model(self): Private method to create the augmentation model based on the chosen method.
    • _gmm_cdf(self, gmm, x): Converts CDF to pseudo-observations in the interval [0, 1].
    • _inverse_gmm_cdf(self, gmm, percentile): Finds the inverse of the CDF for a given percentile using a GMM model.
    • _bic_value(self, data, n_components_range): Computes BIC value for different numbers of components to determine the best model.
    • check_data_format(self): Verifies that the data matches the expected format for augmentation.
    • augment_data(self, num_nodes, num_days, start_date): Performs data augmentation using the specified model and parameters.
    • save_augmented_data(self, augmented_df, file_name): Saves the augmented data to a CSV file.
    • sort_columns(self, columns, pattern): Sorts columns based on a given pattern.

Workflow

Initialization

  1. Data Manager Initialization: The ActivePowerDataManager is initialized with the path to the data file. It loads the data and determines the time interval between data points.
  2. Augmentor Initialization: The TimeSeriesDataAugmentor is initialized with the data manager instance and the chosen augmentation model. The augmentation model is created based on the selected method (GMC, GMM, TC).

Data Augmentation

  1. Active Power Data Retrieval: The get_active_power_data method of the ActivePowerDataManager retrieves and preprocesses active power data from the dataset.
  2. Model Fitting: The _create_augmentation_model method fits the chosen augmentation model to the data. For GMC, it fits GMM models to the data and then fits a copula to the transformed data.
  3. Data Generation: The augment_data method generates synthetic data using the fitted augmentation model. It creates pseudo-observations and transforms them back to realistic data.
  4. Data Formatting: The generated data is formatted into a DataFrame with timestamps and node indices.
  5. Data Saving: The save_augmented_data method saves the augmented data to a CSV file.

Summary

The data augmentation model in the RL-ADN framework uses advanced statistical techniques to generate synthetic time-series data, capturing the stochastic nature of load in the power system. By understanding the class structure and workflow, users can effectively utilize and customize the data augmentation process to enhance the robustness and generalizability of their DRL agents.

For detailed examples and further customization options, refer to the full documentation and example notebooks provided with the framework.