Trainers

On-Policy Trainer

On Policy Trainer Class

Trainer class for all the On Policy Agents: A2C, PPO1 and VPG

genrl.trainers.OnPolicyTrainer.agent

Agent algorithm object

Type:object
genrl.trainers.OnPolicyTrainer.env

Environment

Type:object
genrl.trainers.OnPolicyTrainer.log_mode

List of different kinds of logging. Supported: [“csv”, “stdout”, “tensorboard”]

Type:list of str
genrl.trainers.OnPolicyTrainer.log_key

Key plotted on x_axis. Supported: [“timestep”, “episode”]

Type:str
genrl.trainers.OnPolicyTrainer.log_interval

Timesteps between successive logging of parameters onto the console

Type:int
genrl.trainers.OnPolicyTrainer.logdir

Directory where log files should be saved.

Type:str
genrl.trainers.OnPolicyTrainer.epochs

Total number of epochs to train for

Type:int
genrl.trainers.OnPolicyTrainer.max_timesteps

Maximum limit of timesteps to train for

Type:int
genrl.trainers.OnPolicyTrainer.off_policy

True if the agent is an off policy agent, False if it is on policy

Type:bool
genrl.trainers.OnPolicyTrainer.save_interval

Timesteps between successive saves of the agent’s important hyperparameters

Type:int
genrl.trainers.OnPolicyTrainer.save_model

Directory where the checkpoints of agent parameters should be saved

Type:str
genrl.trainers.OnPolicyTrainer.run_num

A run number allotted to the save of parameters

Type:int
genrl.trainers.OnPolicyTrainer.load_model

File to load saved parameter checkpoint from

Type:str
genrl.trainers.OnPolicyTrainer.render

True if environment is to be rendered during training, else False

Type:bool
genrl.trainers.OnPolicyTrainer.evaluate_episodes

Number of episodes to evaluate for

Type:int
genrl.trainers.OnPolicyTrainer.seed

Set seed for reproducibility

Type:int
genrl.trainers.OnPolicyTrainer.n_envs

Number of environments

Off-Policy Trainer

Off Policy Trainer Class

Trainer class for all the Off Policy Agents: DQN (all variants), DDPG, TD3 and SAC

genrl.trainers.OffPolicyTrainer.agent

Agent algorithm object

Type:object
genrl.trainers.OffPolicyTrainer.env

Environment

Type:object
genrl.trainers.OffPolicyTrainer.buffer

Replay Buffer object

Type:object
genrl.trainers.OffPolicyTrainer.max_ep_len

Maximum Episode length for training

Type:int
genrl.trainers.OffPolicyTrainer.max_timesteps

Maximum limit of timesteps to train for

Type:int
genrl.trainers.OffPolicyTrainer.warmup_steps

Number of warmup steps. (random actions are taken to add randomness to training)

Type:int
genrl.trainers.OffPolicyTrainer.start_update

Timesteps after which the agent networks should start updating

Type:int
genrl.trainers.OffPolicyTrainer.update_interval

Timesteps between target network updates

Type:int
genrl.trainers.OffPolicyTrainer.log_mode

List of different kinds of logging. Supported: [“csv”, “stdout”, “tensorboard”]

Type:list of str
genrl.trainers.OffPolicyTrainer.log_key

Key plotted on x_axis. Supported: [“timestep”, “episode”]

Type:str
genrl.trainers.OffPolicyTrainer.log_interval

Timesteps between successive logging of parameters onto the console

Type:int
genrl.trainers.OffPolicyTrainer.logdir

Directory where log files should be saved.

Type:str
genrl.trainers.OffPolicyTrainer.epochs

Total number of epochs to train for

Type:int
genrl.trainers.OffPolicyTrainer.off_policy

True if the agent is an off policy agent, False if it is on policy

Type:bool
genrl.trainers.OffPolicyTrainer.save_interval

Timesteps between successive saves of the agent’s important hyperparameters

Type:int
genrl.trainers.OffPolicyTrainer.save_model

Directory where the checkpoints of agent parameters should be saved

Type:str
genrl.trainers.OffPolicyTrainer.run_num

A run number allotted to the save of parameters

Type:int
genrl.trainers.OffPolicyTrainer.load_model

File to load saved parameter checkpoint from

Type:str
genrl.trainers.OffPolicyTrainer.render

True if environment is to be rendered during training, else False

Type:bool
genrl.trainers.OffPolicyTrainer.evaluate_episodes

Number of episodes to evaluate for

Type:int
genrl.trainers.OffPolicyTrainer.seed

Set seed for reproducibility

Type:int
genrl.trainers.OffPolicyTrainer.n_envs

Number of environments

Classical Trainer

Global trainer class for classical RL algorithms

param agent:Algorithm object to train
param env:standard gym environment to train on
param mode:mode of value function update [‘learn’, ‘plan’, ‘dyna’]
param model:model to use for planning [‘tabular’]
param n_episodes:
 number of training episodes
param plan_n_steps:
 number of planning step per environment interaction
param start_steps:
 number of initial exploration timesteps
param seed:seed for random number generator
param render:render gym environment
type agent:object
type env:Gym environment
type mode:str
type model:str
type n_episodes:
 int
type plan_n_steps:
 int
type start_steps:
 int
type seed:int
type render:bool

Deep Contextual Bandit Trainer

Bandit Trainer Class

param agent:Agent to train.
type agent:genrl.deep.bandit.dcb_agents.DCBAgent
param bandit:Bandit to train agent on.
type bandit:genrl.deep.bandit.data_bandits.DataBasedBandit
param logdir:Path to directory to store logs in.
type logdir:str
param log_mode:List of modes for logging.
type log_mode:List[str]

Multi Armed Bandit Trainer

Bandit Trainer Class

param agent:Agent to train.
type agent:genrl.deep.bandit.dcb_agents.DCBAgent
param bandit:Bandit to train agent on.
type bandit:genrl.deep.bandit.data_bandits.DataBasedBandit
param logdir:Path to directory to store logs in.
type logdir:str
param log_mode:List of modes for logging.
type log_mode:List[str]

Base Trainer

Base Trainer Class

To be inherited specific use-cases

genrl.trainers.Trainer.agent

Agent algorithm object

Type:object
genrl.trainers.Trainer.env

Environment

Type:object
genrl.trainers.Trainer.log_mode

List of different kinds of logging. Supported: [“csv”, “stdout”, “tensorboard”]

Type:list of str
genrl.trainers.Trainer.log_key

Key plotted on x_axis. Supported: [“timestep”, “episode”]

Type:str
genrl.trainers.Trainer.log_interval

Timesteps between successive logging of parameters onto the console

Type:int
genrl.trainers.Trainer.logdir

Directory where log files should be saved.

Type:str
genrl.trainers.Trainer.epochs

Total number of epochs to train for

Type:int
genrl.trainers.Trainer.max_timesteps

Maximum limit of timesteps to train for

Type:int
genrl.trainers.Trainer.off_policy

True if the agent is an off policy agent, False if it is on policy

Type:bool
genrl.trainers.Trainer.save_interval

Timesteps between successive saves of the agent’s important hyperparameters

Type:int
genrl.trainers.Trainer.save_model

Directory where the checkpoints of agent parameters should be saved

Type:str
genrl.trainers.Trainer.run_num

A run number allotted to the save of parameters

Type:int
genrl.trainers.Trainer.load_weights

Weights file

Type:str
genrl.trainers.Trainer.load_hyperparams

File to load hyperparameters

Type:str
genrl.trainers.Trainer.render

True if environment is to be rendered during training, else False

Type:bool
genrl.trainers.Trainer.evaluate_episodes

Number of episodes to evaluate for

Type:int
genrl.trainers.Trainer.seed

Set seed for reproducibility

Type:int
genrl.trainers.Trainer.n_envs

Number of environments