Core¶
ActorCritic¶
-
class
genrl.core.actor_critic.
CNNActorCritic
(framestack: int, action_dim: gym.spaces.space.Space, policy_layers: Tuple = (256, ), value_layers: Tuple = (256, ), val_type: str = 'V', discrete: bool = True, *args, **kwargs)[source]¶ Bases:
genrl.core.base.BaseActorCritic
CNN Actor Critic
param framestack: Number of previous frames to stack together param action_dim: Action dimensions of the environment param fc_layers: Sizes of hidden layers param val_type: Specifies type of value function: ( - “V” for V(s), “Qs” for Q(s), “Qsa” for Q(s,a))
param discrete: True if action space is discrete, else False param framestack: Number of previous frames to stack together type action_dim: int type fc_layers: tuple or list type val_type: str type discrete: bool
-
get_action
(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]¶ Get action from the Actor based on input
param state: The state being passed as input to the Actor param deterministic: (True if the action space is deterministic, - else False)
type state: Tensor type deterministic: boolean returns: action
-
class
genrl.core.actor_critic.
MlpActorCritic
(state_dim: gym.spaces.space.Space, action_dim: gym.spaces.space.Space, shared_layers: None, policy_layers: Tuple = (32, 32), value_layers: Tuple = (32, 32), val_type: str = 'V', discrete: bool = True, **kwargs)[source]¶ Bases:
genrl.core.base.BaseActorCritic
MLP Actor Critic
-
state_dim
¶ State dimensions of the environment
Type: int
-
action_dim
¶ Action space dimensions of the environment
Type: int
-
policy_layers
¶ Hidden layers in the policy MLP
Type: list
ortuple
-
value_layers
¶ Hidden layers in the value MLP
Type: list
ortuple
-
val_type
¶ Value type of the critic network
Type: str
-
discrete
¶ True if the action space is discrete, else False
Type: bool
-
sac
¶ True if a SAC-like network is needed, else False
Type: bool
-
activation
¶ Activation function to be used. Can be either “tanh” or “relu”
Type: str
-
Bases:
genrl.core.base.BaseActorCritic
MLP Shared Actor Critic
State dimensions of the environment
Type: int
Action space dimensions of the environment
Type: int
Hidden layers in the shared MLP
Type: list
ortuple
Hidden layers in the policy MLP
Type: list
ortuple
Hidden layers in the value MLP
Type: list
ortuple
Value type of the critic network
Type: str
True if the action space is discrete, else False
Type: bool
True if a SAC-like network is needed, else False
Type: bool
Activation function to be used. Can be either “tanh” or “relu”
Type: str
Get Actions from the actor
- Arg:
- state (
torch.Tensor
): The state(s) being passed to the critics deterministic (bool): True if the action space is deterministic, else False
Returns: List of actions as estimated by the critic distribution (): The distribution from which the action was sampled (None if deterministReturn type: action ( list
)
Extract features from the state, which is then an input to get_action and get_value
Parameters: state ( torch.Tensor
) – The state(s) being passedReturns: The feature(s) extracted from the state Return type: features ( torch.Tensor
)
Get Values from the Critic
- Arg:
- state (
torch.Tensor
): The state(s) being passed to the critics
Returns: List of values as estimated by the critic Return type: values ( list
)
Bases:
genrl.core.actor_critic.MlpSingleActorTwoCritic
MLP Actor Critic
State dimensions of the environment
Type: int
Action space dimensions of the environment
Type: int
Hidden layers in the shared MLP
Type: list
ortuple
Hidden layers in the policy MLP
Type: list
ortuple
Hidden layers in the value MLP
Type: list
ortuple
Value type of the critic network
Type: str
True if the action space is discrete, else False
Type: bool
Number of critics in the architecture
Type: int
True if a SAC-like network is needed, else False
Type: bool
Activation function to be used. Can be either “tanh” or “relu”
Type: str
Get Actions from the actor
- Arg:
- state (
torch.Tensor
): The state(s) being passed to the critics deterministic (bool): True if the action space is deterministic, else False
Returns: List of actions as estimated by the critic distribution (): The distribution from which the action was sampled (None if deterministic)Return type: action ( list
)
Extract features from the state, which is then an input to get_action and get_value
Parameters: state ( torch.Tensor
) – The state(s) being passedReturns: The feature(s) extracted from the state Return type: features ( torch.Tensor
)
Get Values from both the Critic
- Arg:
state (
torch.Tensor
): The state(s) being passed to the critics mode (str): What values should be returned. Types:“both” –> Both values will be returned “min” –> The minimum of both values will be returned “first” –> The value from the first critic only will be returned
Returns: List of values as estimated by each individual critic Return type: values ( list
)
-
class
genrl.core.actor_critic.
MlpSingleActorTwoCritic
(state_dim: gym.spaces.space.Space, action_dim: gym.spaces.space.Space, policy_layers: Tuple = (32, 32), value_layers: Tuple = (32, 32), val_type: str = 'V', discrete: bool = True, num_critics: int = 2, **kwargs)[source]¶ Bases:
genrl.core.base.BaseActorCritic
MLP Actor Critic
-
state_dim
¶ State dimensions of the environment
Type: int
-
action_dim
¶ Action space dimensions of the environment
Type: int
-
policy_layers
¶ Hidden layers in the policy MLP
Type: list
ortuple
-
value_layers
¶ Hidden layers in the value MLP
Type: list
ortuple
-
val_type
¶ Value type of the critic network
Type: str
-
discrete
¶ True if the action space is discrete, else False
Type: bool
-
num_critics
¶ Number of critics in the architecture
Type: int
-
sac
¶ True if a SAC-like network is needed, else False
Type: bool
-
activation
¶ Activation function to be used. Can be either “tanh” or “relu”
Type: str
-
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
get_action
(state: torch.Tensor, deterministic: bool = False)[source]¶ Get Actions from the actor
- Arg:
- state (
torch.Tensor
): The state(s) being passed to the critics deterministic (bool): True if the action space is deterministic, else False
Returns: List of actions as estimated by the critic distribution (): The distribution from which the action was sampled (None if deterministReturn type: action ( list
)
-
get_value
(state: torch.Tensor, mode='first') → torch.Tensor[source]¶ Get Values from the Critic
- Arg:
state (
torch.Tensor
): The state(s) being passed to the critics mode (str): What values should be returned. Types:“both” –> Both values will be returned “min” –> The minimum of both values will be returned “first” –> The value from the first critic only will be returned
Returns: List of values as estimated by each individual critic Return type: values ( list
)
-
Base¶
-
class
genrl.core.base.
BaseActorCritic
[source]¶ Bases:
torch.nn.modules.module.Module
Basic implementation of a general Actor Critic
-
get_action
(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]¶ Get action from the Actor based on input
param state: The state being passed as input to the Actor param deterministic: (True if the action space is deterministic, - else False)
type state: Tensor type deterministic: boolean returns: action
-
-
class
genrl.core.base.
BasePolicy
(state_dim: int, action_dim: int, hidden: Tuple, discrete: bool, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
Basic implementation of a general Policy
Parameters: - state_dim (int) – State dimensions of the environment
- action_dim (int) – Action dimensions of the environment
- hidden (tuple or list) – Sizes of hidden layers
- discrete (bool) – True if action space is discrete, else False
-
forward
(state: torch.Tensor) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶ Defines the computation performed at every call.
Parameters: state (Tensor) – The state being passed as input to the policy
-
get_action
(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]¶ Get action from policy based on input
param state: The state being passed as input to the policy param deterministic: (True if the action space is deterministic, - else False)
type state: Tensor type deterministic: boolean returns: action
-
class
genrl.core.base.
BaseValue
(state_dim: int, action_dim: int)[source]¶ Bases:
torch.nn.modules.module.Module
Basic implementation of a general Value function
Buffers¶
-
class
genrl.core.buffers.
PrioritizedBuffer
(capacity: int, alpha: float = 0.6, beta: float = 0.4)[source]¶ Bases:
object
Implements the Prioritized Experience Replay Mechanism
Parameters: - capacity (int) – Size of the replay buffer
- alpha (int) – Level of prioritization
-
pos
¶
-
push
(inp: Tuple) → None[source]¶ Adds new experience to buffer
param inp: (Tuple containing state, action, reward, - next_state and done)
type inp: tuple returns: None
-
sample
(batch_size: int, beta: float = None) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶ - (Returns randomly sampled memories from replay memory along with their
respective indices and weights)
param batch_size: Number of samples per batch param beta: (Bias exponent used to correct - Importance Sampling (IS) weights)
type batch_size: int type beta: float returns: (Tuple containing states, actions, next_states,
rewards, dones, indices and weights)
-
update_priorities
(batch_indices: Tuple, batch_priorities: Tuple) → None[source]¶ Updates list of priorities with new order of priorities
param batch_indices: List of indices of batch param batch_priorities: (List of priorities of the batch at the - specific indices)
type batch_indices: list or tuple type batch_priorities: list or tuple
-
class
genrl.core.buffers.
PrioritizedReplayBufferSamples
(states, actions, rewards, next_states, dones, indices, weights)[source]¶ Bases:
tuple
-
actions
¶ Alias for field number 1
-
dones
¶ Alias for field number 4
-
indices
¶ Alias for field number 5
-
next_states
¶ Alias for field number 3
-
rewards
¶ Alias for field number 2
-
states
¶ Alias for field number 0
-
weights
¶ Alias for field number 6
-
-
class
genrl.core.buffers.
ReplayBuffer
(capacity: int)[source]¶ Bases:
object
Implements the basic Experience Replay Mechanism
Parameters: capacity (int) – Size of the replay buffer -
push
(inp: Tuple) → None[source]¶ Adds new experience to buffer
Parameters: inp (tuple) – Tuple containing state, action, reward, next_state and done Returns: None
-
sample
(batch_size: int) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶ Returns randomly sampled experiences from replay memory
param batch_size: Number of samples per batch type batch_size: int returns: (Tuple composing of state, action, reward, next_state and done)
-
Noise¶
-
class
genrl.core.noise.
ActionNoise
(mean: float, std: float)[source]¶ Bases:
abc.ABC
Base class for Action Noise
Parameters: - mean (float) – Mean of noise distribution
- std (float) – Standard deviation of noise distribution
-
mean
¶ Returns mean of noise distribution
-
std
¶ Returns standard deviation of noise distribution
-
class
genrl.core.noise.
NoisyLinear
(in_features: int, out_features: int, std_init: float = 0.4)[source]¶ Bases:
torch.nn.modules.module.Module
Noisy Linear Layer Class
Class to represent a Noisy Linear class (noisy version of nn.Linear)
-
in_features
¶ Input dimensions
Type: int
-
out_features
¶ Output dimensions
Type: int
-
std_init
¶ Weight initialisation constant
Type: float
-
forward
(state: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
genrl.core.noise.
NormalActionNoise
(mean: float, std: float)[source]¶ Bases:
genrl.core.noise.ActionNoise
Normal implementation of Action Noise
Parameters: - mean (float) – Mean of noise distribution
- std (float) – Standard deviation of noise distribution
-
class
genrl.core.noise.
OrnsteinUhlenbeckActionNoise
(mean: float, std: float, theta: float = 0.15, dt: float = 0.01, initial_noise: torch.Tensor = None)[source]¶ Bases:
genrl.core.noise.ActionNoise
Ornstein Uhlenbeck implementation of Action Noise
Parameters: - mean (float) – Mean of noise distribution
- std (float) – Standard deviation of noise distribution
- theta (float) – Parameter used to solve the Ornstein Uhlenbeck process
- dt (float) – Small parameter used to solve the Ornstein Uhlenbeck process
- initial_noise (torch.Tensor) – Initial noise distribution
Policies¶
-
class
genrl.core.policies.
CNNPolicy
(framestack: int, action_dim: int, hidden: Tuple = (32, 32), discrete: bool = True, *args, **kwargs)[source]¶ Bases:
genrl.core.base.BasePolicy
CNN Policy
Parameters: - framestack (int) – Number of previous frames to stack together
- action_dim (int) – Action dimensions of the environment
- fc_layers (tuple or list) – Sizes of hidden layers
- discrete (bool) – True if action space is discrete, else False
- channels (list or tuple) – Channel sizes for cnn layers
-
class
genrl.core.policies.
MlpPolicy
(state_dim: int, action_dim: int, hidden: Tuple = (32, 32), discrete: bool = True, *args, **kwargs)[source]¶ Bases:
genrl.core.base.BasePolicy
MLP Policy
Parameters: - state_dim (int) – State dimensions of the environment
- action_dim (int) – Action dimensions of the environment
- hidden (tuple or list) – Sizes of hidden layers
- discrete (bool) – True if action space is discrete, else False
RolloutStorage¶
-
class
genrl.core.rollout_storage.
BaseBuffer
(buffer_size: int, env: Union[gym.core.Env, genrl.environments.vec_env.vector_envs.VecEnv], device: Union[torch.device, str] = 'cpu')[source]¶ Bases:
object
Base class that represent a buffer (rollout or replay) :param buffer_size: (int) Max number of element in the buffer :param env: (Environment) The environment being trained on :param device: (Union[torch.device, str]) PyTorch device
to which the values will be convertedParameters: n_envs – (int) Number of parallel environments -
sample
(batch_size: int)[source]¶ Parameters: batch_size – (int) Number of element to sample Returns: (Union[RolloutBufferSamples, ReplayBufferSamples])
-
static
swap_and_flatten
(arr: numpy.ndarray) → numpy.ndarray[source]¶ Swap and then flatten axes 0 (buffer_size) and 1 (n_envs) to convert shape from [n_steps, n_envs, …] (when … is the shape of the features) to [n_steps * n_envs, …] (which maintain the order) :param arr: (np.ndarray) :return: (np.ndarray)
-
to_torch
(array: numpy.ndarray, copy: bool = True) → torch.Tensor[source]¶ Convert a numpy array to a PyTorch tensor. Note: it copies the data by default :param array: (np.ndarray) :param copy: (bool) Whether to copy or not the data
(may be useful to avoid changing things be reference)Returns: (torch.Tensor)
-
-
class
genrl.core.rollout_storage.
ReplayBufferSamples
(observations, actions, next_observations, dones, rewards)[source]¶ Bases:
tuple
-
actions
¶ Alias for field number 1
-
dones
¶ Alias for field number 3
-
next_observations
¶ Alias for field number 2
-
observations
¶ Alias for field number 0
-
rewards
¶ Alias for field number 4
-
-
class
genrl.core.rollout_storage.
RolloutBuffer
(buffer_size: int, env: Union[gym.core.Env, genrl.environments.vec_env.vector_envs.VecEnv], device: Union[torch.device, str] = 'cpu', gae_lambda: float = 1, gamma: float = 0.99)[source]¶ Bases:
genrl.core.rollout_storage.BaseBuffer
Rollout buffer used in on-policy algorithms like A2C/PPO. :param buffer_size: (int) Max number of element in the buffer :param env: (Environment) The environment being trained on :param device: (torch.device) :param gae_lambda: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator
Equivalent to classic advantage when set to 1.Parameters: - gamma – (float) Discount factor
- n_envs – (int) Number of parallel environments
-
add
(obs: None._VariableFunctions.zeros, action: None._VariableFunctions.zeros, reward: None._VariableFunctions.zeros, done: None._VariableFunctions.zeros, value: torch.Tensor, log_prob: torch.Tensor) → None[source]¶ Parameters: - obs – (torch.zeros) Observation
- action – (torch.zeros) Action
- reward – (torch.zeros)
- done – (torch.zeros) End of episode signal.
- value – (torch.Tensor) estimated value of the current state following the current policy.
- log_prob – (torch.Tensor) log probability of the action following the current policy.
-
class
genrl.core.rollout_storage.
RolloutBufferSamples
(observations, actions, old_values, old_log_prob, advantages, returns)[source]¶ Bases:
tuple
-
actions
¶ Alias for field number 1
-
advantages
¶ Alias for field number 4
-
observations
¶ Alias for field number 0
-
old_log_prob
¶ Alias for field number 3
-
old_values
¶ Alias for field number 2
-
returns
¶ Alias for field number 5
-
-
class
genrl.core.rollout_storage.
RolloutReturn
(episode_reward, episode_timesteps, n_episodes, continue_training)[source]¶ Bases:
tuple
-
continue_training
¶ Alias for field number 3
-
episode_reward
¶ Alias for field number 0
-
episode_timesteps
¶ Alias for field number 1
-
n_episodes
¶ Alias for field number 2
-
Values¶
-
class
genrl.core.values.
CnnCategoricalValue
(*args, **kwargs)[source]¶ Bases:
genrl.core.values.CnnNoisyValue
Class for Categorical DQN’s CNN Q-Value function
-
framestack
¶ No. of frames being passed into the Q-value function
Type: int
-
action_dim
¶ Action space dimensions
Type: int
-
fc_layers
¶ Fully connected layer dimensions
Type: tuple
-
noisy_layers
¶ Noisy layer dimensions
Type: tuple
-
num_atoms
¶ Number of atoms used to discretise the Categorical DQN value distribution
Type: int
-
-
class
genrl.core.values.
CnnDuelingValue
(*args, **kwargs)[source]¶ Bases:
genrl.core.values.CnnValue
Class for Dueling DQN’s MLP Q-Value function
-
framestack
¶ No. of frames being passed into the Q-value function
Type: int
-
action_dim
¶ Action space dimensions
Type: int
-
fc_layers
¶ Hidden layer dimensions
Type: tuple
-
-
class
genrl.core.values.
CnnNoisyValue
(*args, **kwargs)[source]¶ Bases:
genrl.core.values.CnnValue
,genrl.core.values.MlpNoisyValue
Class for Noisy DQN’s CNN Q-Value function
-
state_dim
¶ Number of previous frames to stack together
Type: int
-
action_dim
¶ Action space dimensions
Type: int
-
fc_layers
¶ Fully connected layer dimensions
Type: tuple
-
noisy_layers
¶ Noisy layer dimensions
Type: tuple
-
num_atoms
¶ Number of atoms used to discretise the Categorical DQN value distribution
Type: int
-
-
class
genrl.core.values.
CnnValue
(*args, **kwargs)[source]¶ Bases:
genrl.core.values.MlpValue
CNN Value Function class
param framestack: Number of previous frames to stack together param action_dim: Action dimension of environment param val_type: Specifies type of value function: ( - “V” for V(s), “Qs” for Q(s), “Qsa” for Q(s,a))
param fc_layers: Sizes of hidden layers type framestack: int type action_dim: int type val_type: string type fc_layers: tuple or list
-
class
genrl.core.values.
MlpCategoricalValue
(*args, **kwargs)[source]¶ Bases:
genrl.core.values.MlpNoisyValue
Class for Categorical DQN’s MLP Q-Value function
-
state_dim
¶ Observation space dimensions
Type: int
-
action_dim
¶ Action space dimensions
Type: int
-
fc_layers
¶ Fully connected layer dimensions
Type: tuple
-
noisy_layers
¶ Noisy layer dimensions
Type: tuple
-
num_atoms
¶ Number of atoms used to discretise the Categorical DQN value distribution
Type: int
-
-
class
genrl.core.values.
MlpDuelingValue
(*args, **kwargs)[source]¶ Bases:
genrl.core.values.MlpValue
Class for Dueling DQN’s MLP Q-Value function
-
state_dim
¶ Observation space dimensions
Type: int
-
action_dim
¶ Action space dimensions
Type: int
Hidden layer dimensions
Type: tuple
-
-
class
genrl.core.values.
MlpNoisyValue
(*args, noisy_layers: Tuple = (128, 512), **kwargs)[source]¶ Bases:
genrl.core.values.MlpValue
-
class
genrl.core.values.
MlpValue
(state_dim: int, action_dim: int = None, val_type: str = 'V', fc_layers: Tuple = (32, 32), **kwargs)[source]¶ Bases:
genrl.core.base.BaseValue
MLP Value Function class
param state_dim: State dimensions of environment param action_dim: Action dimensions of environment param val_type: Specifies type of value function: ( - “V” for V(s), “Qs” for Q(s), “Qsa” for Q(s,a))
param hidden: Sizes of hidden layers type state_dim: int type action_dim: int type val_type: string type hidden: tuple or list