Skip to content

Parameter usage

qinrj-mbp edited this page May 30, 2022 · 2 revisions

Usage of neorl.make()

NeoRL uses OpenAI Gym API, allowing users to create an env via neorl.make()

For neorl.make() func, the parameters are shown below:

param type description
task str The task name you want to create. A full list of tasks is available here
reward_func func A customized reward function, which should be provided if you want to calculate reward instead of using built-in reward of dataset.

The following code segment shows the usage of neorl with a customized reward function.

Example

import neorl

def customized_reward_func(data):
    obs = data["obs"]
    action = data["action"]
    obs_next = data["next_obs"]

    single_reward = False
    if len(obs.shape) == 1:
        single_reward = True
        obs = obs.reshape(1, -1)
    if len(action.shape) == 1:
        action = action.reshape(1, -1)
    if len(obs_next.shape) == 1:
        obs_next = obs_next.reshape(1, -1)

    CRF = 3.0
    CRC = 1.0

    fatigue = obs_next[:, -2]
    consumption = obs_next[:, -1]

    cost = CRF * fatigue + CRC * consumption

    reward = -cost

    if single_reward:
        reward = reward[0].item()
    else:
        reward = reward.reshape(-1, 1)

    return reward

env = neorl.make("ib", reward_func=customized_reward_func)  # create the industrial benchmark env

Usage of get_dataset()

For get_dataset() func, the parameters are shown below:

param type description
task_name_version str The name and version (if applicable) of the task, default is the same as task while making env
data_type str Which type of policy is used to collect data. It should be one of ["high", "medium", "low"], default to high
train_num int The num of trajectory of training data. Note that the num should be less than 10,000, 100 by default
need_val bool Whether needs to download validation data, default to True
val_ratio float The ratio of validation data to training data, default to 0.1
path str The directory of data to load from or download to ./data/
use_data_reward bool Whether uses default data reward. If false, a customized reward function should be provided by users while making env

Note that task_name_version is the same as task while making env by default. For instance, env = neorl.make("citylearn") will bind citylearn with env and dataset, which indicates env.get_dataset() will obtain citylearn data by default. For flexibility, task_name_version can be other task considering some people only intend to obtain data using an existing env instead of creating a neo one.

When calling get_dataset(), it will first look at local path for appropriate dataset ("appropriate" means the data type should match with the target data and the num of trajectories should not be less than the target data's). Meanwhile, MD5 is utilized to ensure dataset is complete and correct. If local dataset is not applicable, it will download the least appropriate dataset from remote server to path according to local data_map.json.

Example

import neorl

env = neorl.make("finance")
train_data, val_data = env.get_dataset(data_type="medium", train_num=100, need_val=True, val_ratio=0.2, use_data_reward=True)

It will load 100 trajectories for train_data and 10 trajectories for val_data, both using "medium" policy and built-in data reward.

import neorl

env = neorl.make("citylearn")
train_data, _ = env.get_dataset(task_name_version="HalfCheetah-v3", data_type="low", train_num=50, need_val=False, use_data_reward=True)

It will load 50 trajectories for train_data without val_data, using "low" policy and built-in data reward.

Clone this wiki locally