Trainers¶
On-Policy Trainer¶
On Policy Trainer Class
Trainer class for all the On Policy Agents: A2C, PPO1 and VPG
-
genrl.trainers.OnPolicyTrainer.agent¶ Agent algorithm object
Type: object
-
genrl.trainers.OnPolicyTrainer.env¶ Environment
Type: object
-
genrl.trainers.OnPolicyTrainer.log_mode¶ List of different kinds of logging. Supported: [“csv”, “stdout”, “tensorboard”]
Type: listof str
-
genrl.trainers.OnPolicyTrainer.log_key¶ Key plotted on x_axis. Supported: [“timestep”, “episode”]
Type: str
-
genrl.trainers.OnPolicyTrainer.log_interval¶ Timesteps between successive logging of parameters onto the console
Type: int
-
genrl.trainers.OnPolicyTrainer.logdir¶ Directory where log files should be saved.
Type: str
-
genrl.trainers.OnPolicyTrainer.epochs¶ Total number of epochs to train for
Type: int
-
genrl.trainers.OnPolicyTrainer.max_timesteps¶ Maximum limit of timesteps to train for
Type: int
-
genrl.trainers.OnPolicyTrainer.off_policy¶ True if the agent is an off policy agent, False if it is on policy
Type: bool
-
genrl.trainers.OnPolicyTrainer.save_interval¶ Timesteps between successive saves of the agent’s important hyperparameters
Type: int
-
genrl.trainers.OnPolicyTrainer.save_model¶ Directory where the checkpoints of agent parameters should be saved
Type: str
-
genrl.trainers.OnPolicyTrainer.run_num¶ A run number allotted to the save of parameters
Type: int
-
genrl.trainers.OnPolicyTrainer.load_model¶ File to load saved parameter checkpoint from
Type: str
-
genrl.trainers.OnPolicyTrainer.render¶ True if environment is to be rendered during training, else False
Type: bool
-
genrl.trainers.OnPolicyTrainer.evaluate_episodes¶ Number of episodes to evaluate for
Type: int
-
genrl.trainers.OnPolicyTrainer.seed¶ Set seed for reproducibility
Type: int
-
genrl.trainers.OnPolicyTrainer.n_envs¶ Number of environments
Off-Policy Trainer¶
Off Policy Trainer Class
Trainer class for all the Off Policy Agents: DQN (all variants), DDPG, TD3 and SAC
-
genrl.trainers.OffPolicyTrainer.agent¶ Agent algorithm object
Type: object
-
genrl.trainers.OffPolicyTrainer.env¶ Environment
Type: object
-
genrl.trainers.OffPolicyTrainer.buffer¶ Replay Buffer object
Type: object
-
genrl.trainers.OffPolicyTrainer.max_ep_len¶ Maximum Episode length for training
Type: int
-
genrl.trainers.OffPolicyTrainer.warmup_steps¶ Number of warmup steps. (random actions are taken to add randomness to training)
Type: int
-
genrl.trainers.OffPolicyTrainer.start_update¶ Timesteps after which the agent networks should start updating
Type: int
-
genrl.trainers.OffPolicyTrainer.update_interval¶ Timesteps between target network updates
Type: int
-
genrl.trainers.OffPolicyTrainer.log_mode¶ List of different kinds of logging. Supported: [“csv”, “stdout”, “tensorboard”]
Type: listof str
-
genrl.trainers.OffPolicyTrainer.log_key¶ Key plotted on x_axis. Supported: [“timestep”, “episode”]
Type: str
-
genrl.trainers.OffPolicyTrainer.log_interval¶ Timesteps between successive logging of parameters onto the console
Type: int
-
genrl.trainers.OffPolicyTrainer.logdir¶ Directory where log files should be saved.
Type: str
-
genrl.trainers.OffPolicyTrainer.epochs¶ Total number of epochs to train for
Type: int
-
genrl.trainers.OffPolicyTrainer.max_timesteps¶ Maximum limit of timesteps to train for
Type: int
-
genrl.trainers.OffPolicyTrainer.off_policy¶ True if the agent is an off policy agent, False if it is on policy
Type: bool
-
genrl.trainers.OffPolicyTrainer.save_interval¶ Timesteps between successive saves of the agent’s important hyperparameters
Type: int
-
genrl.trainers.OffPolicyTrainer.save_model¶ Directory where the checkpoints of agent parameters should be saved
Type: str
-
genrl.trainers.OffPolicyTrainer.run_num¶ A run number allotted to the save of parameters
Type: int
-
genrl.trainers.OffPolicyTrainer.load_model¶ File to load saved parameter checkpoint from
Type: str
-
genrl.trainers.OffPolicyTrainer.render¶ True if environment is to be rendered during training, else False
Type: bool
-
genrl.trainers.OffPolicyTrainer.evaluate_episodes¶ Number of episodes to evaluate for
Type: int
-
genrl.trainers.OffPolicyTrainer.seed¶ Set seed for reproducibility
Type: int
-
genrl.trainers.OffPolicyTrainer.n_envs¶ Number of environments
Classical Trainer¶
Global trainer class for classical RL algorithms
| param agent: | Algorithm object to train |
|---|---|
| param env: | standard gym environment to train on |
| param mode: | mode of value function update [‘learn’, ‘plan’, ‘dyna’] |
| param model: | model to use for planning [‘tabular’] |
| param n_episodes: | |
| number of training episodes | |
| param plan_n_steps: | |
| number of planning step per environment interaction | |
| param start_steps: | |
| number of initial exploration timesteps | |
| param seed: | seed for random number generator |
| param render: | render gym environment |
| type agent: | object |
| type env: | Gym environment |
| type mode: | str |
| type model: | str |
| type n_episodes: | |
| int | |
| type plan_n_steps: | |
| int | |
| type start_steps: | |
| int | |
| type seed: | int |
| type render: | bool |
Deep Contextual Bandit Trainer¶
Bandit Trainer Class
| param agent: | Agent to train. |
|---|---|
| type agent: | genrl.deep.bandit.dcb_agents.DCBAgent |
| param bandit: | Bandit to train agent on. |
| type bandit: | genrl.deep.bandit.data_bandits.DataBasedBandit |
| param logdir: | Path to directory to store logs in. |
| type logdir: | str |
| param log_mode: | List of modes for logging. |
| type log_mode: | List[str] |
Multi Armed Bandit Trainer¶
Bandit Trainer Class
| param agent: | Agent to train. |
|---|---|
| type agent: | genrl.deep.bandit.dcb_agents.DCBAgent |
| param bandit: | Bandit to train agent on. |
| type bandit: | genrl.deep.bandit.data_bandits.DataBasedBandit |
| param logdir: | Path to directory to store logs in. |
| type logdir: | str |
| param log_mode: | List of modes for logging. |
| type log_mode: | List[str] |
Base Trainer¶
Base Trainer Class
To be inherited specific use-cases
-
genrl.trainers.Trainer.agent¶ Agent algorithm object
Type: object
-
genrl.trainers.Trainer.env¶ Environment
Type: object
-
genrl.trainers.Trainer.log_mode¶ List of different kinds of logging. Supported: [“csv”, “stdout”, “tensorboard”]
Type: listof str
-
genrl.trainers.Trainer.log_key¶ Key plotted on x_axis. Supported: [“timestep”, “episode”]
Type: str
-
genrl.trainers.Trainer.log_interval¶ Timesteps between successive logging of parameters onto the console
Type: int
-
genrl.trainers.Trainer.logdir¶ Directory where log files should be saved.
Type: str
-
genrl.trainers.Trainer.epochs¶ Total number of epochs to train for
Type: int
-
genrl.trainers.Trainer.max_timesteps¶ Maximum limit of timesteps to train for
Type: int
-
genrl.trainers.Trainer.off_policy¶ True if the agent is an off policy agent, False if it is on policy
Type: bool
-
genrl.trainers.Trainer.save_interval¶ Timesteps between successive saves of the agent’s important hyperparameters
Type: int
-
genrl.trainers.Trainer.save_model¶ Directory where the checkpoints of agent parameters should be saved
Type: str
-
genrl.trainers.Trainer.run_num¶ A run number allotted to the save of parameters
Type: int
-
genrl.trainers.Trainer.load_model¶ File to load saved parameter checkpoint from
Type: str
-
genrl.trainers.Trainer.render¶ True if environment is to be rendered during training, else False
Type: bool
-
genrl.trainers.Trainer.evaluate_episodes¶ Number of episodes to evaluate for
Type: int
-
genrl.trainers.Trainer.seed¶ Set seed for reproducibility
Type: int
-
genrl.trainers.Trainer.n_envs¶ Number of environments