Core

ActorCritic

class genrl.core.actor_critic.CNNActorCritic(framestack: int, action_dim: gym.spaces.space.Space, policy_layers: Tuple = (256, ), value_layers: Tuple = (256, ), val_type: str = 'V', discrete: bool = True, *args, **kwargs)[source]

Bases: genrl.core.base.BaseActorCritic

CNN Actor Critic

param framestack:
 Number of previous frames to stack together
param action_dim:
 Action dimensions of the environment
param fc_layers:
 Sizes of hidden layers
param val_type:Specifies type of value function: (
“V” for V(s), “Qs” for Q(s), “Qsa” for Q(s,a))
param discrete:True if action space is discrete, else False
param framestack:
 Number of previous frames to stack together
type action_dim:
 int
type fc_layers:tuple or list
type val_type:str
type discrete:bool
get_action(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]

Get action from the Actor based on input

param state:The state being passed as input to the Actor
param deterministic:
 (True if the action space is deterministic,
else False)
type state:Tensor
type deterministic:
 boolean
returns:action
get_params()[source]
get_value(inp: torch.Tensor) → torch.Tensor[source]

Get value from the Critic based on input

Parameters:inp (Tensor) – Input to the Critic
Returns:value
class genrl.core.actor_critic.MlpActorCritic(state_dim: gym.spaces.space.Space, action_dim: gym.spaces.space.Space, shared_layers: None, policy_layers: Tuple = (32, 32), value_layers: Tuple = (32, 32), val_type: str = 'V', discrete: bool = True, **kwargs)[source]

Bases: genrl.core.base.BaseActorCritic

MLP Actor Critic

state_dim

State dimensions of the environment

Type:int
action_dim

Action space dimensions of the environment

Type:int
policy_layers

Hidden layers in the policy MLP

Type:list or tuple
value_layers

Hidden layers in the value MLP

Type:list or tuple
val_type

Value type of the critic network

Type:str
discrete

True if the action space is discrete, else False

Type:bool
sac

True if a SAC-like network is needed, else False

Type:bool
activation

Activation function to be used. Can be either “tanh” or “relu”

Type:str
get_params()[source]
class genrl.core.actor_critic.MlpSharedActorCritic(state_dim: gym.spaces.space.Space, action_dim: gym.spaces.space.Space, shared_layers: Tuple = (32, 32), policy_layers: Tuple = (32, 32), value_layers: Tuple = (32, 32), val_type: str = 'V', discrete: bool = True, **kwargs)[source]

Bases: genrl.core.base.BaseActorCritic

MLP Shared Actor Critic

state_dim

State dimensions of the environment

Type:int
action_dim

Action space dimensions of the environment

Type:int
shared_layers

Hidden layers in the shared MLP

Type:list or tuple
policy_layers

Hidden layers in the policy MLP

Type:list or tuple
value_layers

Hidden layers in the value MLP

Type:list or tuple
val_type

Value type of the critic network

Type:str
discrete

True if the action space is discrete, else False

Type:bool
sac

True if a SAC-like network is needed, else False

Type:bool
activation

Activation function to be used. Can be either “tanh” or “relu”

Type:str
get_action(state: torch.Tensor, deterministic: bool = False)[source]

Get Actions from the actor

Arg:
state (torch.Tensor): The state(s) being passed to the critics deterministic (bool): True if the action space is deterministic, else False
Returns:List of actions as estimated by the critic distribution (): The distribution from which the action was sampled
(None if determinist
Return type:action (list)
get_features(state: torch.Tensor)[source]

Extract features from the state, which is then an input to get_action and get_value

Parameters:state (torch.Tensor) – The state(s) being passed
Returns:The feature(s) extracted from the state
Return type:features (torch.Tensor)
get_params()[source]
get_value(state: torch.Tensor)[source]

Get Values from the Critic

Arg:
state (torch.Tensor): The state(s) being passed to the critics
Returns:List of values as estimated by the critic
Return type:values (list)
class genrl.core.actor_critic.MlpSharedSingleActorTwoCritic(state_dim: gym.spaces.space.Space, action_dim: gym.spaces.space.Space, shared_layers: Tuple = (32, 32), policy_layers: Tuple = (32, 32), value_layers: Tuple = (32, 32), val_type: str = 'Qsa', discrete: bool = True, num_critics: int = 2, **kwargs)[source]

Bases: genrl.core.actor_critic.MlpSingleActorTwoCritic

MLP Actor Critic

state_dim

State dimensions of the environment

Type:int
action_dim

Action space dimensions of the environment

Type:int
shared_layers

Hidden layers in the shared MLP

Type:list or tuple
policy_layers

Hidden layers in the policy MLP

Type:list or tuple
value_layers

Hidden layers in the value MLP

Type:list or tuple
val_type

Value type of the critic network

Type:str
discrete

True if the action space is discrete, else False

Type:bool
num_critics

Number of critics in the architecture

Type:int
sac

True if a SAC-like network is needed, else False

Type:bool
activation

Activation function to be used. Can be either “tanh” or “relu”

Type:str
get_action(state: torch.Tensor, deterministic: bool = False)[source]

Get Actions from the actor

Arg:
state (torch.Tensor): The state(s) being passed to the critics deterministic (bool): True if the action space is deterministic, else False
Returns:List of actions as estimated by the critic distribution (): The distribution from which the action was sampled
(None if deterministic)
Return type:action (list)
get_features(state: torch.Tensor)[source]

Extract features from the state, which is then an input to get_action and get_value

Parameters:state (torch.Tensor) – The state(s) being passed
Returns:The feature(s) extracted from the state
Return type:features (torch.Tensor)
get_params()[source]
get_value(state: torch.Tensor, mode='first')[source]

Get Values from both the Critic

Arg:

state (torch.Tensor): The state(s) being passed to the critics mode (str): What values should be returned. Types:

“both” –> Both values will be returned “min” –> The minimum of both values will be returned “first” –> The value from the first critic only will be returned
Returns:List of values as estimated by each individual critic
Return type:values (list)
class genrl.core.actor_critic.MlpSingleActorTwoCritic(state_dim: gym.spaces.space.Space, action_dim: gym.spaces.space.Space, policy_layers: Tuple = (32, 32), value_layers: Tuple = (32, 32), val_type: str = 'V', discrete: bool = True, num_critics: int = 2, **kwargs)[source]

Bases: genrl.core.base.BaseActorCritic

MLP Actor Critic

state_dim

State dimensions of the environment

Type:int
action_dim

Action space dimensions of the environment

Type:int
policy_layers

Hidden layers in the policy MLP

Type:list or tuple
value_layers

Hidden layers in the value MLP

Type:list or tuple
val_type

Value type of the critic network

Type:str
discrete

True if the action space is discrete, else False

Type:bool
num_critics

Number of critics in the architecture

Type:int
sac

True if a SAC-like network is needed, else False

Type:bool
activation

Activation function to be used. Can be either “tanh” or “relu”

Type:str
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_action(state: torch.Tensor, deterministic: bool = False)[source]

Get Actions from the actor

Arg:
state (torch.Tensor): The state(s) being passed to the critics deterministic (bool): True if the action space is deterministic, else False
Returns:List of actions as estimated by the critic distribution (): The distribution from which the action was sampled
(None if determinist
Return type:action (list)
get_params()[source]
get_value(state: torch.Tensor, mode='first') → torch.Tensor[source]

Get Values from the Critic

Arg:

state (torch.Tensor): The state(s) being passed to the critics mode (str): What values should be returned. Types:

“both” –> Both values will be returned “min” –> The minimum of both values will be returned “first” –> The value from the first critic only will be returned
Returns:List of values as estimated by each individual critic
Return type:values (list)
genrl.core.actor_critic.get_actor_critic_from_name(name_: str)[source]

Returns Actor Critic given the type of the Actor Critic

Parameters:ac_name (str) – Name of the policy needed
Returns:Actor Critic class to be used

Base

class genrl.core.base.BaseActorCritic[source]

Bases: torch.nn.modules.module.Module

Basic implementation of a general Actor Critic

get_action(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]

Get action from the Actor based on input

param state:The state being passed as input to the Actor
param deterministic:
 (True if the action space is deterministic,
else False)
type state:Tensor
type deterministic:
 boolean
returns:action
get_value(state: torch.Tensor) → torch.Tensor[source]

Get value from the Critic based on input

Parameters:state (Tensor) – Input to the Critic
Returns:value
class genrl.core.base.BasePolicy(state_dim: int, action_dim: int, hidden: Tuple, discrete: bool, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Basic implementation of a general Policy

Parameters:
  • state_dim (int) – State dimensions of the environment
  • action_dim (int) – Action dimensions of the environment
  • hidden (tuple or list) – Sizes of hidden layers
  • discrete (bool) – True if action space is discrete, else False
forward(state: torch.Tensor) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]

Defines the computation performed at every call.

Parameters:state (Tensor) – The state being passed as input to the policy
get_action(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]

Get action from policy based on input

param state:The state being passed as input to the policy
param deterministic:
 (True if the action space is deterministic,
else False)
type state:Tensor
type deterministic:
 boolean
returns:action
class genrl.core.base.BaseValue(state_dim: int, action_dim: int)[source]

Bases: torch.nn.modules.module.Module

Basic implementation of a general Value function

forward(state: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Parameters:state (Tensor) – Input to value function
get_value(state: torch.Tensor) → torch.Tensor[source]

Get value from value function based on input

Parameters:state (Tensor) – Input to value function
Returns:Value

Buffers

class genrl.core.buffers.PrioritizedBuffer(capacity: int, alpha: float = 0.6, beta: float = 0.4)[source]

Bases: object

Implements the Prioritized Experience Replay Mechanism

Parameters:
  • capacity (int) – Size of the replay buffer
  • alpha (int) – Level of prioritization
pos
push(inp: Tuple) → None[source]

Adds new experience to buffer

param inp:(Tuple containing state, action, reward,
next_state and done)
type inp:tuple
returns:None
sample(batch_size: int, beta: float = None) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]
(Returns randomly sampled memories from replay memory along with their

respective indices and weights)

param batch_size:
 Number of samples per batch
param beta:(Bias exponent used to correct
Importance Sampling (IS) weights)
type batch_size:
 int
type beta:float
returns:(Tuple containing states, actions, next_states,

rewards, dones, indices and weights)

update_priorities(batch_indices: Tuple, batch_priorities: Tuple) → None[source]

Updates list of priorities with new order of priorities

param batch_indices:
 List of indices of batch
param batch_priorities:
 (List of priorities of the batch at the
specific indices)
type batch_indices:
 list or tuple
type batch_priorities:
 list or tuple
class genrl.core.buffers.PrioritizedReplayBufferSamples(states, actions, rewards, next_states, dones, indices, weights)[source]

Bases: tuple

actions

Alias for field number 1

dones

Alias for field number 4

indices

Alias for field number 5

next_states

Alias for field number 3

rewards

Alias for field number 2

states

Alias for field number 0

weights

Alias for field number 6

class genrl.core.buffers.ReplayBuffer(capacity: int)[source]

Bases: object

Implements the basic Experience Replay Mechanism

Parameters:capacity (int) – Size of the replay buffer
push(inp: Tuple) → None[source]

Adds new experience to buffer

Parameters:inp (tuple) – Tuple containing state, action, reward, next_state and done
Returns:None
sample(batch_size: int) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]

Returns randomly sampled experiences from replay memory

param batch_size:
 Number of samples per batch
type batch_size:
 int
returns:(Tuple composing of state, action, reward,

next_state and done)

class genrl.core.buffers.ReplayBufferSamples(states, actions, rewards, next_states, dones)[source]

Bases: tuple

actions

Alias for field number 1

dones

Alias for field number 4

next_states

Alias for field number 3

rewards

Alias for field number 2

states

Alias for field number 0

Noise

class genrl.core.noise.ActionNoise(mean: float, std: float)[source]

Bases: abc.ABC

Base class for Action Noise

Parameters:
  • mean (float) – Mean of noise distribution
  • std (float) – Standard deviation of noise distribution
mean

Returns mean of noise distribution

std

Returns standard deviation of noise distribution

class genrl.core.noise.NoisyLinear(in_features: int, out_features: int, std_init: float = 0.4)[source]

Bases: torch.nn.modules.module.Module

Noisy Linear Layer Class

Class to represent a Noisy Linear class (noisy version of nn.Linear)

in_features

Input dimensions

Type:int
out_features

Output dimensions

Type:int
std_init

Weight initialisation constant

Type:float
forward(state: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_noise() → None[source]

Reset noise components of layer

reset_parameters() → None[source]

Reset parameters of layer

class genrl.core.noise.NormalActionNoise(mean: float, std: float)[source]

Bases: genrl.core.noise.ActionNoise

Normal implementation of Action Noise

Parameters:
  • mean (float) – Mean of noise distribution
  • std (float) – Standard deviation of noise distribution
reset() → None[source]
class genrl.core.noise.OrnsteinUhlenbeckActionNoise(mean: float, std: float, theta: float = 0.15, dt: float = 0.01, initial_noise: torch.Tensor = None)[source]

Bases: genrl.core.noise.ActionNoise

Ornstein Uhlenbeck implementation of Action Noise

Parameters:
  • mean (float) – Mean of noise distribution
  • std (float) – Standard deviation of noise distribution
  • theta (float) – Parameter used to solve the Ornstein Uhlenbeck process
  • dt (float) – Small parameter used to solve the Ornstein Uhlenbeck process
  • initial_noise (torch.Tensor) – Initial noise distribution
reset() → None[source]

Reset the initial noise value for the noise distribution sampling

Policies

class genrl.core.policies.CNNPolicy(framestack: int, action_dim: int, hidden: Tuple = (32, 32), discrete: bool = True, *args, **kwargs)[source]

Bases: genrl.core.base.BasePolicy

CNN Policy

Parameters:
  • framestack (int) – Number of previous frames to stack together
  • action_dim (int) – Action dimensions of the environment
  • fc_layers (tuple or list) – Sizes of hidden layers
  • discrete (bool) – True if action space is discrete, else False
  • channels (list or tuple) – Channel sizes for cnn layers
forward(state: numpy.ndarray) → numpy.ndarray[source]

Defines the computation performed at every call.

Parameters:state (Tensor) – The state being passed as input to the policy
class genrl.core.policies.MlpPolicy(state_dim: int, action_dim: int, hidden: Tuple = (32, 32), discrete: bool = True, *args, **kwargs)[source]

Bases: genrl.core.base.BasePolicy

MLP Policy

Parameters:
  • state_dim (int) – State dimensions of the environment
  • action_dim (int) – Action dimensions of the environment
  • hidden (tuple or list) – Sizes of hidden layers
  • discrete (bool) – True if action space is discrete, else False
genrl.core.policies.get_policy_from_name(name_: str)[source]

Returns policy given the name of the policy

Parameters:name (str) – Name of the policy needed
Returns:Policy Function to be used

RolloutStorage

class genrl.core.rollout_storage.BaseBuffer(buffer_size: int, env: Union[gym.core.Env, genrl.environments.vec_env.vector_envs.VecEnv], device: Union[torch.device, str] = 'cpu')[source]

Bases: object

Base class that represent a buffer (rollout or replay) :param buffer_size: (int) Max number of element in the buffer :param env: (Environment) The environment being trained on :param device: (Union[torch.device, str]) PyTorch device

to which the values will be converted
Parameters:n_envs – (int) Number of parallel environments
add(*args, **kwargs) → None[source]

Add elements to the buffer.

extend(*args, **kwargs) → None[source]

Add a new batch of transitions to the buffer

reset() → None[source]

Reset the buffer.

sample(batch_size: int)[source]
Parameters:batch_size – (int) Number of element to sample
Returns:(Union[RolloutBufferSamples, ReplayBufferSamples])
size() → int[source]
Returns:(int) The current size of the buffer
static swap_and_flatten(arr: numpy.ndarray) → numpy.ndarray[source]

Swap and then flatten axes 0 (buffer_size) and 1 (n_envs) to convert shape from [n_steps, n_envs, …] (when … is the shape of the features) to [n_steps * n_envs, …] (which maintain the order) :param arr: (np.ndarray) :return: (np.ndarray)

to_torch(array: numpy.ndarray, copy: bool = True) → torch.Tensor[source]

Convert a numpy array to a PyTorch tensor. Note: it copies the data by default :param array: (np.ndarray) :param copy: (bool) Whether to copy or not the data

(may be useful to avoid changing things be reference)
Returns:(torch.Tensor)
class genrl.core.rollout_storage.ReplayBufferSamples(observations, actions, next_observations, dones, rewards)[source]

Bases: tuple

actions

Alias for field number 1

dones

Alias for field number 3

next_observations

Alias for field number 2

observations

Alias for field number 0

rewards

Alias for field number 4

class genrl.core.rollout_storage.RolloutBuffer(buffer_size: int, env: Union[gym.core.Env, genrl.environments.vec_env.vector_envs.VecEnv], device: Union[torch.device, str] = 'cpu', gae_lambda: float = 1, gamma: float = 0.99)[source]

Bases: genrl.core.rollout_storage.BaseBuffer

Rollout buffer used in on-policy algorithms like A2C/PPO. :param buffer_size: (int) Max number of element in the buffer :param env: (Environment) The environment being trained on :param device: (torch.device) :param gae_lambda: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator

Equivalent to classic advantage when set to 1.
Parameters:
  • gamma – (float) Discount factor
  • n_envs – (int) Number of parallel environments
add(obs: None._VariableFunctions.zeros, action: None._VariableFunctions.zeros, reward: None._VariableFunctions.zeros, done: None._VariableFunctions.zeros, value: torch.Tensor, log_prob: torch.Tensor) → None[source]
Parameters:
  • obs – (torch.zeros) Observation
  • action – (torch.zeros) Action
  • reward – (torch.zeros)
  • done – (torch.zeros) End of episode signal.
  • value – (torch.Tensor) estimated value of the current state following the current policy.
  • log_prob – (torch.Tensor) log probability of the action following the current policy.
get(batch_size: Optional[int] = None) → Generator[genrl.core.rollout_storage.RolloutBufferSamples, None, None][source]
reset() → None[source]

Reset the buffer.

class genrl.core.rollout_storage.RolloutBufferSamples(observations, actions, old_values, old_log_prob, advantages, returns)[source]

Bases: tuple

actions

Alias for field number 1

advantages

Alias for field number 4

observations

Alias for field number 0

old_log_prob

Alias for field number 3

old_values

Alias for field number 2

returns

Alias for field number 5

class genrl.core.rollout_storage.RolloutReturn(episode_reward, episode_timesteps, n_episodes, continue_training)[source]

Bases: tuple

continue_training

Alias for field number 3

episode_reward

Alias for field number 0

episode_timesteps

Alias for field number 1

n_episodes

Alias for field number 2

Values

class genrl.core.values.CnnCategoricalValue(*args, **kwargs)[source]

Bases: genrl.core.values.CnnNoisyValue

Class for Categorical DQN’s CNN Q-Value function

framestack

No. of frames being passed into the Q-value function

Type:int
action_dim

Action space dimensions

Type:int
fc_layers

Fully connected layer dimensions

Type:tuple
noisy_layers

Noisy layer dimensions

Type:tuple
num_atoms

Number of atoms used to discretise the Categorical DQN value distribution

Type:int
forward(state: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Parameters:state (Tensor) – Input to value function
class genrl.core.values.CnnDuelingValue(*args, **kwargs)[source]

Bases: genrl.core.values.CnnValue

Class for Dueling DQN’s MLP Q-Value function

framestack

No. of frames being passed into the Q-value function

Type:int
action_dim

Action space dimensions

Type:int
fc_layers

Hidden layer dimensions

Type:tuple
forward(inp: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Parameters:state (Tensor) – Input to value function
class genrl.core.values.CnnNoisyValue(*args, **kwargs)[source]

Bases: genrl.core.values.CnnValue, genrl.core.values.MlpNoisyValue

Class for Noisy DQN’s CNN Q-Value function

state_dim

Number of previous frames to stack together

Type:int
action_dim

Action space dimensions

Type:int
fc_layers

Fully connected layer dimensions

Type:tuple
noisy_layers

Noisy layer dimensions

Type:tuple
num_atoms

Number of atoms used to discretise the Categorical DQN value distribution

Type:int
forward(state: numpy.ndarray) → numpy.ndarray[source]

Defines the computation performed at every call.

Parameters:state (Tensor) – Input to value function
class genrl.core.values.CnnValue(*args, **kwargs)[source]

Bases: genrl.core.values.MlpValue

CNN Value Function class

param framestack:
 Number of previous frames to stack together
param action_dim:
 Action dimension of environment
param val_type:Specifies type of value function: (
“V” for V(s), “Qs” for Q(s), “Qsa” for Q(s,a))
param fc_layers:
 Sizes of hidden layers
type framestack:
 int
type action_dim:
 int
type val_type:string
type fc_layers:tuple or list
forward(state: numpy.ndarray) → numpy.ndarray[source]

Defines the computation performed at every call.

Parameters:state (Tensor) – Input to value function
class genrl.core.values.MlpCategoricalValue(*args, **kwargs)[source]

Bases: genrl.core.values.MlpNoisyValue

Class for Categorical DQN’s MLP Q-Value function

state_dim

Observation space dimensions

Type:int
action_dim

Action space dimensions

Type:int
fc_layers

Fully connected layer dimensions

Type:tuple
noisy_layers

Noisy layer dimensions

Type:tuple
num_atoms

Number of atoms used to discretise the Categorical DQN value distribution

Type:int
forward(state: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Parameters:state (Tensor) – Input to value function
class genrl.core.values.MlpDuelingValue(*args, **kwargs)[source]

Bases: genrl.core.values.MlpValue

Class for Dueling DQN’s MLP Q-Value function

state_dim

Observation space dimensions

Type:int
action_dim

Action space dimensions

Type:int
hidden

Hidden layer dimensions

Type:tuple
forward(state: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Parameters:state (Tensor) – Input to value function
class genrl.core.values.MlpNoisyValue(*args, noisy_layers: Tuple = (128, 512), **kwargs)[source]

Bases: genrl.core.values.MlpValue

reset_noise() → None[source]

Resets noise for any Noisy layers in Value function

class genrl.core.values.MlpValue(state_dim: int, action_dim: int = None, val_type: str = 'V', fc_layers: Tuple = (32, 32), **kwargs)[source]

Bases: genrl.core.base.BaseValue

MLP Value Function class

param state_dim:
 State dimensions of environment
param action_dim:
 Action dimensions of environment
param val_type:Specifies type of value function: (
“V” for V(s), “Qs” for Q(s), “Qsa” for Q(s,a))
param hidden:Sizes of hidden layers
type state_dim:int
type action_dim:
 int
type val_type:string
type hidden:tuple or list
genrl.core.values.get_value_from_name(name_: str) → Union[Type[genrl.core.values.MlpValue], Type[genrl.core.values.CnnValue]][source]

Gets the value function given the name of the value function

Parameters:name (string) – Name of the value function needed
Returns:Value function