Trainers¶
On-Policy Trainer¶
On Policy Trainer Class
Trainer class for all the On Policy Agents: A2C, PPO1 and VPG
-
genrl.trainers.OnPolicyTrainer.
agent
¶ Agent algorithm object
Type: object
-
genrl.trainers.OnPolicyTrainer.
env
¶ Environment
Type: object
-
genrl.trainers.OnPolicyTrainer.
log_mode
¶ List of different kinds of logging. Supported: [“csv”, “stdout”, “tensorboard”]
Type: list
of str
-
genrl.trainers.OnPolicyTrainer.
log_key
¶ Key plotted on x_axis. Supported: [“timestep”, “episode”]
Type: str
-
genrl.trainers.OnPolicyTrainer.
log_interval
¶ Timesteps between successive logging of parameters onto the console
Type: int
-
genrl.trainers.OnPolicyTrainer.
logdir
¶ Directory where log files should be saved.
Type: str
-
genrl.trainers.OnPolicyTrainer.
epochs
¶ Total number of epochs to train for
Type: int
-
genrl.trainers.OnPolicyTrainer.
max_timesteps
¶ Maximum limit of timesteps to train for
Type: int
-
genrl.trainers.OnPolicyTrainer.
off_policy
¶ True if the agent is an off policy agent, False if it is on policy
Type: bool
-
genrl.trainers.OnPolicyTrainer.
save_interval
¶ Timesteps between successive saves of the agent’s important hyperparameters
Type: int
-
genrl.trainers.OnPolicyTrainer.
save_model
¶ Directory where the checkpoints of agent parameters should be saved
Type: str
-
genrl.trainers.OnPolicyTrainer.
run_num
¶ A run number allotted to the save of parameters
Type: int
-
genrl.trainers.OnPolicyTrainer.
load_model
¶ File to load saved parameter checkpoint from
Type: str
-
genrl.trainers.OnPolicyTrainer.
render
¶ True if environment is to be rendered during training, else False
Type: bool
-
genrl.trainers.OnPolicyTrainer.
evaluate_episodes
¶ Number of episodes to evaluate for
Type: int
-
genrl.trainers.OnPolicyTrainer.
seed
¶ Set seed for reproducibility
Type: int
-
genrl.trainers.OnPolicyTrainer.
n_envs
¶ Number of environments
Off-Policy Trainer¶
Off Policy Trainer Class
Trainer class for all the Off Policy Agents: DQN (all variants), DDPG, TD3 and SAC
-
genrl.trainers.OffPolicyTrainer.
agent
¶ Agent algorithm object
Type: object
-
genrl.trainers.OffPolicyTrainer.
env
¶ Environment
Type: object
-
genrl.trainers.OffPolicyTrainer.
buffer
¶ Replay Buffer object
Type: object
-
genrl.trainers.OffPolicyTrainer.
max_ep_len
¶ Maximum Episode length for training
Type: int
-
genrl.trainers.OffPolicyTrainer.
max_timesteps
¶ Maximum limit of timesteps to train for
Type: int
-
genrl.trainers.OffPolicyTrainer.
warmup_steps
¶ Number of warmup steps. (random actions are taken to add randomness to training)
Type: int
-
genrl.trainers.OffPolicyTrainer.
start_update
¶ Timesteps after which the agent networks should start updating
Type: int
-
genrl.trainers.OffPolicyTrainer.
update_interval
¶ Timesteps between target network updates
Type: int
-
genrl.trainers.OffPolicyTrainer.
log_mode
¶ List of different kinds of logging. Supported: [“csv”, “stdout”, “tensorboard”]
Type: list
of str
-
genrl.trainers.OffPolicyTrainer.
log_key
¶ Key plotted on x_axis. Supported: [“timestep”, “episode”]
Type: str
-
genrl.trainers.OffPolicyTrainer.
log_interval
¶ Timesteps between successive logging of parameters onto the console
Type: int
-
genrl.trainers.OffPolicyTrainer.
logdir
¶ Directory where log files should be saved.
Type: str
-
genrl.trainers.OffPolicyTrainer.
epochs
¶ Total number of epochs to train for
Type: int
-
genrl.trainers.OffPolicyTrainer.
off_policy
¶ True if the agent is an off policy agent, False if it is on policy
Type: bool
-
genrl.trainers.OffPolicyTrainer.
save_interval
¶ Timesteps between successive saves of the agent’s important hyperparameters
Type: int
-
genrl.trainers.OffPolicyTrainer.
save_model
¶ Directory where the checkpoints of agent parameters should be saved
Type: str
-
genrl.trainers.OffPolicyTrainer.
run_num
¶ A run number allotted to the save of parameters
Type: int
-
genrl.trainers.OffPolicyTrainer.
load_model
¶ File to load saved parameter checkpoint from
Type: str
-
genrl.trainers.OffPolicyTrainer.
render
¶ True if environment is to be rendered during training, else False
Type: bool
-
genrl.trainers.OffPolicyTrainer.
evaluate_episodes
¶ Number of episodes to evaluate for
Type: int
-
genrl.trainers.OffPolicyTrainer.
seed
¶ Set seed for reproducibility
Type: int
-
genrl.trainers.OffPolicyTrainer.
n_envs
¶ Number of environments
Classical Trainer¶
Global trainer class for classical RL algorithms
param agent: | Algorithm object to train |
---|---|
param env: | standard gym environment to train on |
param mode: | mode of value function update [‘learn’, ‘plan’, ‘dyna’] |
param model: | model to use for planning [‘tabular’] |
param n_episodes: | |
number of training episodes | |
param plan_n_steps: | |
number of planning step per environment interaction | |
param start_steps: | |
number of initial exploration timesteps | |
param seed: | seed for random number generator |
param render: | render gym environment |
type agent: | object |
type env: | Gym environment |
type mode: | str |
type model: | str |
type n_episodes: | |
int | |
type plan_n_steps: | |
int | |
type start_steps: | |
int | |
type seed: | int |
type render: | bool |
Deep Contextual Bandit Trainer¶
Bandit Trainer Class
param agent: | Agent to train. |
---|---|
type agent: | genrl.deep.bandit.dcb_agents.DCBAgent |
param bandit: | Bandit to train agent on. |
type bandit: | genrl.deep.bandit.data_bandits.DataBasedBandit |
param logdir: | Path to directory to store logs in. |
type logdir: | str |
param log_mode: | List of modes for logging. |
type log_mode: | List[str] |
Multi Armed Bandit Trainer¶
Bandit Trainer Class
param agent: | Agent to train. |
---|---|
type agent: | genrl.deep.bandit.dcb_agents.DCBAgent |
param bandit: | Bandit to train agent on. |
type bandit: | genrl.deep.bandit.data_bandits.DataBasedBandit |
param logdir: | Path to directory to store logs in. |
type logdir: | str |
param log_mode: | List of modes for logging. |
type log_mode: | List[str] |
Base Trainer¶
Base Trainer Class
To be inherited specific use-cases
-
genrl.trainers.Trainer.
agent
¶ Agent algorithm object
Type: object
-
genrl.trainers.Trainer.
env
¶ Environment
Type: object
-
genrl.trainers.Trainer.
log_mode
¶ List of different kinds of logging. Supported: [“csv”, “stdout”, “tensorboard”]
Type: list
of str
-
genrl.trainers.Trainer.
log_key
¶ Key plotted on x_axis. Supported: [“timestep”, “episode”]
Type: str
-
genrl.trainers.Trainer.
log_interval
¶ Timesteps between successive logging of parameters onto the console
Type: int
-
genrl.trainers.Trainer.
logdir
¶ Directory where log files should be saved.
Type: str
-
genrl.trainers.Trainer.
epochs
¶ Total number of epochs to train for
Type: int
-
genrl.trainers.Trainer.
max_timesteps
¶ Maximum limit of timesteps to train for
Type: int
-
genrl.trainers.Trainer.
off_policy
¶ True if the agent is an off policy agent, False if it is on policy
Type: bool
-
genrl.trainers.Trainer.
save_interval
¶ Timesteps between successive saves of the agent’s important hyperparameters
Type: int
-
genrl.trainers.Trainer.
save_model
¶ Directory where the checkpoints of agent parameters should be saved
Type: str
-
genrl.trainers.Trainer.
run_num
¶ A run number allotted to the save of parameters
Type: int
-
genrl.trainers.Trainer.
load_weights
¶ Weights file
Type: str
-
genrl.trainers.Trainer.
load_hyperparams
¶ File to load hyperparameters
Type: str
-
genrl.trainers.Trainer.
render
¶ True if environment is to be rendered during training, else False
Type: bool
-
genrl.trainers.Trainer.
evaluate_episodes
¶ Number of episodes to evaluate for
Type: int
-
genrl.trainers.Trainer.
seed
¶ Set seed for reproducibility
Type: int
-
genrl.trainers.Trainer.
n_envs
¶ Number of environments