Core¶

ActorCritic¶

class genrl.core.actor_critic.CNNActorCritic(framestack: int, action_dim: gym.spaces.space.Space, policy_layers: Tuple = (256, ), value_layers: Tuple = (256, ), val_type: str = 'V', discrete: bool = True, *args, **kwargs)[source]¶

Bases: genrl.core.base.BaseActorCritic

CNN Actor Critic

param framestack:

Number of previous frames to stack together

param action_dim:

Action dimensions of the environment

param fc_layers:

Sizes of hidden layers

param val_type: Specifies type of value function: (

“V” for V(s), “Qs” for Q(s), “Qsa” for Q(s,a))

param framestack:
param discrete:	True if action space is discrete, else False
	Number of previous frames to stack together
type action_dim:
	int
type fc_layers:	tuple or list
type val_type:	str
type discrete:	bool

get_action(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]¶

Get action from the Actor based on input

param state: The state being passed as input to the Actor

param deterministic:

(True if the action space is deterministic,

else False)

type deterministic:
type state:	Tensor
	boolean
returns:	action

get_params()[source]¶

get_value(inp: torch.Tensor) → torch.Tensor[source]¶

Get value from the Critic based on input

Parameters:	inp (Tensor) – Input to the Critic
Returns:	value

class genrl.core.actor_critic.MlpActorCritic(state_dim: gym.spaces.space.Space, action_dim: gym.spaces.space.Space, shared_layers: None, policy_layers: Tuple = (32, 32), value_layers: Tuple = (32, 32), val_type: str = 'V', discrete: bool = True, **kwargs)[source]¶

Bases: genrl.core.base.BaseActorCritic

MLP Actor Critic

state_dim¶

State dimensions of the environment

Type:	int

action_dim¶

Action space dimensions of the environment

Type:	int

policy_layers¶

Hidden layers in the policy MLP

Type:	`list` or `tuple`

value_layers¶

Hidden layers in the value MLP

Type:	`list` or `tuple`

val_type¶

Value type of the critic network

Type:	str

discrete¶

True if the action space is discrete, else False

Type:	bool

sac¶

True if a SAC-like network is needed, else False

Type:	bool

activation¶

Activation function to be used. Can be either “tanh” or “relu”

Type:	str

get_params()[source]¶

class genrl.core.actor_critic.MlpSharedActorCritic(state_dim: gym.spaces.space.Space, action_dim: gym.spaces.space.Space, shared_layers: Tuple = (32, 32), policy_layers: Tuple = (32, 32), value_layers: Tuple = (32, 32), val_type: str = 'V', discrete: bool = True, **kwargs)[source]¶

Bases: genrl.core.base.BaseActorCritic

MLP Shared Actor Critic

state_dim¶

State dimensions of the environment

Type:	int

action_dim¶

Action space dimensions of the environment

Type:	int

shared_layers¶

Hidden layers in the shared MLP

Type:	`list` or `tuple`

policy_layers¶

Hidden layers in the policy MLP

Type:	`list` or `tuple`

value_layers¶

Hidden layers in the value MLP

Type:	`list` or `tuple`

val_type¶

Value type of the critic network

Type:	str

discrete¶

True if the action space is discrete, else False

Type:	bool

sac¶

True if a SAC-like network is needed, else False

Type:	bool

activation¶

Activation function to be used. Can be either “tanh” or “relu”

Type:	str

get_action(state: torch.Tensor, deterministic: bool = False)[source]¶

Get Actions from the actor

Arg:: state (torch.Tensor): The state(s) being passed to the critics deterministic (bool): True if the action space is deterministic, else False

Returns:	List of actions as estimated by the critic distribution (): The distribution from which the action was sampled (None if determinist
Return type:	action (`list`)

get_features(state: torch.Tensor)[source]¶

Extract features from the state, which is then an input to get_action and get_value

Parameters:	state (`torch.Tensor`) – The state(s) being passed
Returns:	The feature(s) extracted from the state
Return type:	features (`torch.Tensor`)

get_params()[source]¶

get_value(state: torch.Tensor)[source]¶

Get Values from the Critic

Arg:: state (torch.Tensor): The state(s) being passed to the critics

Returns:	List of values as estimated by the critic
Return type:	values (`list`)

class genrl.core.actor_critic.MlpSharedSingleActorTwoCritic(state_dim: gym.spaces.space.Space, action_dim: gym.spaces.space.Space, shared_layers: Tuple = (32, 32), policy_layers: Tuple = (32, 32), value_layers: Tuple = (32, 32), val_type: str = 'Qsa', discrete: bool = True, num_critics: int = 2, **kwargs)[source]¶

Bases: genrl.core.actor_critic.MlpSingleActorTwoCritic

MLP Actor Critic

state_dim¶

State dimensions of the environment

Type:	int

action_dim¶

Action space dimensions of the environment

Type:	int

shared_layers¶

Hidden layers in the shared MLP

Type:	`list` or `tuple`

policy_layers¶

Hidden layers in the policy MLP

Type:	`list` or `tuple`

value_layers¶

Hidden layers in the value MLP

Type:	`list` or `tuple`

val_type¶

Value type of the critic network

Type:	str

discrete¶

True if the action space is discrete, else False

Type:	bool

num_critics¶

Number of critics in the architecture

Type:	int

sac¶

True if a SAC-like network is needed, else False

Type:	bool

activation¶

Activation function to be used. Can be either “tanh” or “relu”

Type:	str

get_action(state: torch.Tensor, deterministic: bool = False)[source]¶

Get Actions from the actor

Arg:: state (torch.Tensor): The state(s) being passed to the critics deterministic (bool): True if the action space is deterministic, else False

Returns:	List of actions as estimated by the critic distribution (): The distribution from which the action was sampled (None if deterministic)
Return type:	action (`list`)

get_features(state: torch.Tensor)[source]¶

Extract features from the state, which is then an input to get_action and get_value

Parameters:	state (`torch.Tensor`) – The state(s) being passed
Returns:	The feature(s) extracted from the state
Return type:	features (`torch.Tensor`)

get_params()[source]¶

get_value(state: torch.Tensor, mode='first')[source]¶

Get Values from both the Critic

Arg:: state (torch.Tensor): The state(s) being passed to the critics mode (str): What values should be returned. Types:

“both” –> Both values will be returned “min” –> The minimum of both values will be returned “first” –> The value from the first critic only will be returned

Returns:	List of values as estimated by each individual critic
Return type:	values (`list`)

class genrl.core.actor_critic.MlpSingleActorTwoCritic(state_dim: gym.spaces.space.Space, action_dim: gym.spaces.space.Space, policy_layers: Tuple = (32, 32), value_layers: Tuple = (32, 32), val_type: str = 'V', discrete: bool = True, num_critics: int = 2, **kwargs)[source]¶

Bases: genrl.core.base.BaseActorCritic

MLP Actor Critic

state_dim¶

State dimensions of the environment

Type:	int

action_dim¶

Action space dimensions of the environment

Type:	int

policy_layers¶

Hidden layers in the policy MLP

Type:	`list` or `tuple`

value_layers¶

Hidden layers in the value MLP

Type:	`list` or `tuple`

val_type¶

Value type of the critic network

Type:	str

discrete¶

True if the action space is discrete, else False

Type:	bool

num_critics¶

Number of critics in the architecture

Type:	int

sac¶

True if a SAC-like network is needed, else False

Type:	bool

activation¶

Activation function to be used. Can be either “tanh” or “relu”

Type:	str

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_action(state: torch.Tensor, deterministic: bool = False)[source]¶

Get Actions from the actor

Arg:: state (torch.Tensor): The state(s) being passed to the critics deterministic (bool): True if the action space is deterministic, else False

Returns:	List of actions as estimated by the critic distribution (): The distribution from which the action was sampled (None if determinist
Return type:	action (`list`)

get_params()[source]¶

get_value(state: torch.Tensor, mode='first') → torch.Tensor[source]¶

Get Values from the Critic

Arg:: state (torch.Tensor): The state(s) being passed to the critics mode (str): What values should be returned. Types:

“both” –> Both values will be returned “min” –> The minimum of both values will be returned “first” –> The value from the first critic only will be returned

Returns:	List of values as estimated by each individual critic
Return type:	values (`list`)

genrl.core.actor_critic.get_actor_critic_from_name(name_: str)[source]¶

Returns Actor Critic given the type of the Actor Critic

Parameters:	ac_name (str) – Name of the policy needed
Returns:	Actor Critic class to be used

Base¶

class genrl.core.base.BaseActorCritic[source]¶

Bases: torch.nn.modules.module.Module

Basic implementation of a general Actor Critic

get_action(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]¶

Get action from the Actor based on input

param state: The state being passed as input to the Actor

param deterministic:

(True if the action space is deterministic,

else False)

type deterministic:
type state:	Tensor
	boolean
returns:	action

get_value(state: torch.Tensor) → torch.Tensor[source]¶

Get value from the Critic based on input

Parameters:	state (Tensor) – Input to the Critic
Returns:	value

class genrl.core.base.BasePolicy(state_dim: int, action_dim: int, hidden: Tuple, discrete: bool, **kwargs)[source]¶

Bases: torch.nn.modules.module.Module

Basic implementation of a general Policy

Parameters:	state_dim (int) – State dimensions of the environment action_dim (int) – Action dimensions of the environment hidden (tuple or list) – Sizes of hidden layers discrete (bool) – True if action space is discrete, else False

forward(state: torch.Tensor) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶

Defines the computation performed at every call.

Parameters:	state (Tensor) – The state being passed as input to the policy

get_action(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]¶

Get action from policy based on input

param state: The state being passed as input to the policy

param deterministic:

(True if the action space is deterministic,

else False)

type deterministic:
type state:	Tensor
	boolean
returns:	action

class genrl.core.base.BaseValue(state_dim: int, action_dim: int)[source]¶

Bases: torch.nn.modules.module.Module

Basic implementation of a general Value function

forward(state: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Parameters:	state (Tensor) – Input to value function

get_value(state: torch.Tensor) → torch.Tensor[source]¶

Get value from value function based on input

Parameters:	state (Tensor) – Input to value function
Returns:	Value

Buffers¶

class genrl.core.buffers.PrioritizedBuffer(capacity: int, alpha: float = 0.6, beta: float = 0.4)[source]¶

Bases: object

Implements the Prioritized Experience Replay Mechanism

Parameters:	capacity (int) – Size of the replay buffer alpha (int) – Level of prioritization

pos¶

push(inp: Tuple) → None[source]¶

Adds new experience to buffer

param inp: (Tuple containing state, action, reward,

next_state and done)

type inp:	tuple
returns:	None

sample(batch_size: int, beta: float = None) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶

(Returns randomly sampled memories from replay memory along with their

respective indices and weights)

param batch_size:

Number of samples per batch

param beta: (Bias exponent used to correct

Importance Sampling (IS) weights)

type batch_size:
	int
type beta:	float
returns:	(Tuple containing states, actions, next_states,

rewards, dones, indices and weights)

update_priorities(batch_indices: Tuple, batch_priorities: Tuple) → None[source]¶

Updates list of priorities with new order of priorities

param batch_indices:

List of indices of batch

param batch_priorities:

(List of priorities of the batch at the

specific indices)

type batch_indices:
	list or tuple
type batch_priorities:
	list or tuple

class genrl.core.buffers.PrioritizedReplayBufferSamples(states, actions, rewards, next_states, dones, indices, weights)[source]¶

Bases: tuple

actions¶: Alias for field number 1

dones¶: Alias for field number 4

indices¶: Alias for field number 5

next_states¶: Alias for field number 3

rewards¶: Alias for field number 2

states¶: Alias for field number 0

weights¶: Alias for field number 6

class genrl.core.buffers.ReplayBuffer(capacity: int)[source]¶

Bases: object

Implements the basic Experience Replay Mechanism

Parameters:	capacity (int) – Size of the replay buffer

push(inp: Tuple) → None[source]¶

Adds new experience to buffer

Parameters:	inp (tuple) – Tuple containing state, action, reward, next_state and done
Returns:	None

sample(batch_size: int) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Returns randomly sampled experiences from replay memory

param batch_size:

Number of samples per batch

type batch_size:

int

returns: (Tuple composing of state, action, reward,

next_state and done)

class genrl.core.buffers.ReplayBufferSamples(states, actions, rewards, next_states, dones)[source]¶

Bases: tuple

actions¶: Alias for field number 1

dones¶: Alias for field number 4

next_states¶: Alias for field number 3

rewards¶: Alias for field number 2

states¶: Alias for field number 0

Noise¶

class genrl.core.noise.ActionNoise(mean: float, std: float)[source]¶

Bases: abc.ABC

Base class for Action Noise

Parameters:	mean (float) – Mean of noise distribution std (float) – Standard deviation of noise distribution

mean¶: Returns mean of noise distribution

std¶: Returns standard deviation of noise distribution

class genrl.core.noise.NoisyLinear(in_features: int, out_features: int, std_init: float = 0.4)[source]¶

Bases: torch.nn.modules.module.Module

Noisy Linear Layer Class

Class to represent a Noisy Linear class (noisy version of nn.Linear)

in_features¶

Input dimensions

Type:	int

out_features¶

Output dimensions

Type:	int

std_init¶

Weight initialisation constant

Type:	float

forward(state: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_noise() → None[source]¶: Reset noise components of layer

reset_parameters() → None[source]¶: Reset parameters of layer

class genrl.core.noise.NormalActionNoise(mean: float, std: float)[source]¶

Bases: genrl.core.noise.ActionNoise

Normal implementation of Action Noise

Parameters:	mean (float) – Mean of noise distribution std (float) – Standard deviation of noise distribution

reset() → None[source]¶

class genrl.core.noise.OrnsteinUhlenbeckActionNoise(mean: float, std: float, theta: float = 0.15, dt: float = 0.01, initial_noise: torch.Tensor = None)[source]¶

Bases: genrl.core.noise.ActionNoise

Ornstein Uhlenbeck implementation of Action Noise

Parameters:	mean (float) – Mean of noise distribution std (float) – Standard deviation of noise distribution theta (float) – Parameter used to solve the Ornstein Uhlenbeck process dt (float) – Small parameter used to solve the Ornstein Uhlenbeck process initial_noise (torch.Tensor) – Initial noise distribution

reset() → None[source]¶: Reset the initial noise value for the noise distribution sampling

Policies¶

class genrl.core.policies.CNNPolicy(framestack: int, action_dim: int, hidden: Tuple = (32, 32), discrete: bool = True, *args, **kwargs)[source]¶

Bases: genrl.core.base.BasePolicy

CNN Policy

Parameters:	framestack (int) – Number of previous frames to stack together action_dim (int) – Action dimensions of the environment fc_layers (tuple or list) – Sizes of hidden layers discrete (bool) – True if action space is discrete, else False channels (list or tuple) – Channel sizes for cnn layers

forward(state: numpy.ndarray) → numpy.ndarray[source]¶

Defines the computation performed at every call.

Parameters:	state (Tensor) – The state being passed as input to the policy

class genrl.core.policies.MlpPolicy(state_dim: int, action_dim: int, hidden: Tuple = (32, 32), discrete: bool = True, *args, **kwargs)[source]¶

Bases: genrl.core.base.BasePolicy

MLP Policy

Parameters:	state_dim (int) – State dimensions of the environment action_dim (int) – Action dimensions of the environment hidden (tuple or list) – Sizes of hidden layers discrete (bool) – True if action space is discrete, else False

genrl.core.policies.get_policy_from_name(name_: str)[source]¶

Returns policy given the name of the policy

Parameters:	name (str) – Name of the policy needed
Returns:	Policy Function to be used

RolloutStorage¶

class genrl.core.rollout_storage.BaseBuffer(buffer_size: int, env: Union[gym.core.Env, genrl.environments.vec_env.vector_envs.VecEnv], device: Union[torch.device, str] = 'cpu')[source]¶

Bases: object

Base class that represent a buffer (rollout or replay) :param buffer_size: (int) Max number of element in the buffer :param env: (Environment) The environment being trained on :param device: (Union[torch.device, str]) PyTorch device

to which the values will be converted

Parameters:	n_envs – (int) Number of parallel environments

add(*args, **kwargs) → None[source]¶: Add elements to the buffer.

extend(*args, **kwargs) → None[source]¶: Add a new batch of transitions to the buffer

reset() → None[source]¶: Reset the buffer.

sample(batch_size: int)[source]¶

Parameters:	batch_size – (int) Number of element to sample
Returns:	(Union[RolloutBufferSamples, ReplayBufferSamples])

size() → int[source]¶

Returns:	(int) The current size of the buffer

static swap_and_flatten(arr: numpy.ndarray) → numpy.ndarray[source]¶: Swap and then flatten axes 0 (buffer_size) and 1 (n_envs) to convert shape from [n_steps, n_envs, …] (when … is the shape of the features) to [n_steps * n_envs, …] (which maintain the order) :param arr: (np.ndarray) :return: (np.ndarray)

to_torch(array: numpy.ndarray, copy: bool = True) → torch.Tensor[source]¶

Convert a numpy array to a PyTorch tensor. Note: it copies the data by default :param array: (np.ndarray) :param copy: (bool) Whether to copy or not the data

(may be useful to avoid changing things be reference)

Returns:	(torch.Tensor)

class genrl.core.rollout_storage.ReplayBufferSamples(observations, actions, next_observations, dones, rewards)[source]¶

Bases: tuple

actions¶: Alias for field number 1

dones¶: Alias for field number 3

next_observations¶: Alias for field number 2

observations¶: Alias for field number 0

rewards¶: Alias for field number 4

class genrl.core.rollout_storage.RolloutBuffer(buffer_size: int, env: Union[gym.core.Env, genrl.environments.vec_env.vector_envs.VecEnv], device: Union[torch.device, str] = 'cpu', gae_lambda: float = 1, gamma: float = 0.99)[source]¶

Bases: genrl.core.rollout_storage.BaseBuffer

Rollout buffer used in on-policy algorithms like A2C/PPO. :param buffer_size: (int) Max number of element in the buffer :param env: (Environment) The environment being trained on :param device: (torch.device) :param gae_lambda: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator

Equivalent to classic advantage when set to 1.

Parameters:	gamma – (float) Discount factor n_envs – (int) Number of parallel environments

add(obs: None._VariableFunctions.zeros, action: None._VariableFunctions.zeros, reward: None._VariableFunctions.zeros, done: None._VariableFunctions.zeros, value: torch.Tensor, log_prob: torch.Tensor) → None[source]¶

Parameters:	obs – (torch.zeros) Observation action – (torch.zeros) Action reward – (torch.zeros) done – (torch.zeros) End of episode signal. value – (torch.Tensor) estimated value of the current state following the current policy. log_prob – (torch.Tensor) log probability of the action following the current policy.

get(batch_size: Optional[int] = None) → Generator[genrl.core.rollout_storage.RolloutBufferSamples, None, None][source]¶

reset() → None[source]¶: Reset the buffer.

class genrl.core.rollout_storage.RolloutBufferSamples(observations, actions, old_values, old_log_prob, advantages, returns)[source]¶

Bases: tuple

actions¶: Alias for field number 1

advantages¶: Alias for field number 4

observations¶: Alias for field number 0

old_log_prob¶: Alias for field number 3

old_values¶: Alias for field number 2

returns¶: Alias for field number 5

class genrl.core.rollout_storage.RolloutReturn(episode_reward, episode_timesteps, n_episodes, continue_training)[source]¶

Bases: tuple

continue_training¶: Alias for field number 3

episode_reward¶: Alias for field number 0

episode_timesteps¶: Alias for field number 1

n_episodes¶: Alias for field number 2

Values¶

class genrl.core.values.CnnCategoricalValue(*args, **kwargs)[source]¶

Bases: genrl.core.values.CnnNoisyValue

Class for Categorical DQN’s CNN Q-Value function

framestack¶

No. of frames being passed into the Q-value function

Type:	int

action_dim¶

Action space dimensions

Type:	int

fc_layers¶

Fully connected layer dimensions

Type:	`tuple`

noisy_layers¶

Noisy layer dimensions

Type:	`tuple`

num_atoms¶

Number of atoms used to discretise the Categorical DQN value distribution

Type:	int

forward(state: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Parameters:	state (Tensor) – Input to value function

class genrl.core.values.CnnDuelingValue(*args, **kwargs)[source]¶

Bases: genrl.core.values.CnnValue

Class for Dueling DQN’s MLP Q-Value function

framestack¶

No. of frames being passed into the Q-value function

Type:	int

action_dim¶

Action space dimensions

Type:	int

fc_layers¶

Hidden layer dimensions

Type:	`tuple`

forward(inp: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Parameters:	state (Tensor) – Input to value function

class genrl.core.values.CnnNoisyValue(*args, **kwargs)[source]¶

Bases: genrl.core.values.CnnValue, genrl.core.values.MlpNoisyValue

Class for Noisy DQN’s CNN Q-Value function

state_dim¶

Number of previous frames to stack together

Type:	int

action_dim¶

Action space dimensions

Type:	int

fc_layers¶

Fully connected layer dimensions

Type:	`tuple`

noisy_layers¶

Noisy layer dimensions

Type:	`tuple`

num_atoms¶

Number of atoms used to discretise the Categorical DQN value distribution

Type:	int

forward(state: numpy.ndarray) → numpy.ndarray[source]¶

Defines the computation performed at every call.

Parameters:	state (Tensor) – Input to value function

class genrl.core.values.CnnValue(*args, **kwargs)[source]¶

Bases: genrl.core.values.MlpValue

CNN Value Function class

param framestack:

Number of previous frames to stack together

param action_dim:

Action dimension of environment

param val_type: Specifies type of value function: (

“V” for V(s), “Qs” for Q(s), “Qsa” for Q(s,a))

param fc_layers:
	Sizes of hidden layers
type framestack:
	int
type action_dim:
	int
type val_type:	string
type fc_layers:	tuple or list

forward(state: numpy.ndarray) → numpy.ndarray[source]¶

Defines the computation performed at every call.

Parameters:	state (Tensor) – Input to value function

class genrl.core.values.MlpCategoricalValue(*args, **kwargs)[source]¶

Bases: genrl.core.values.MlpNoisyValue

Class for Categorical DQN’s MLP Q-Value function

state_dim¶

Observation space dimensions

Type:	int

action_dim¶

Action space dimensions

Type:	int

fc_layers¶

Fully connected layer dimensions

Type:	`tuple`

noisy_layers¶

Noisy layer dimensions

Type:	`tuple`

num_atoms¶

Number of atoms used to discretise the Categorical DQN value distribution

Type:	int

forward(state: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Parameters:	state (Tensor) – Input to value function

class genrl.core.values.MlpDuelingValue(*args, **kwargs)[source]¶

Bases: genrl.core.values.MlpValue

Class for Dueling DQN’s MLP Q-Value function

state_dim¶

Observation space dimensions

Type:	int

action_dim¶

Action space dimensions

Type:	int

hidden¶

Hidden layer dimensions

Type:	`tuple`

forward(state: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Parameters:	state (Tensor) – Input to value function

class genrl.core.values.MlpNoisyValue(*args, noisy_layers: Tuple = (128, 512), **kwargs)[source]¶

Bases: genrl.core.values.MlpValue

reset_noise() → None[source]¶: Resets noise for any Noisy layers in Value function

class genrl.core.values.MlpValue(state_dim: int, action_dim: int = None, val_type: str = 'V', fc_layers: Tuple = (32, 32), **kwargs)[source]¶

Bases: genrl.core.base.BaseValue

MLP Value Function class

param state_dim:

State dimensions of environment

param action_dim:

Action dimensions of environment

param val_type: Specifies type of value function: (

“V” for V(s), “Qs” for Q(s), “Qsa” for Q(s,a))

type action_dim:
param hidden:	Sizes of hidden layers
type state_dim:	int
	int
type val_type:	string
type hidden:	tuple or list

genrl.core.values.get_value_from_name(name_: str) → Union[Type[genrl.core.values.MlpValue], Type[genrl.core.values.CnnValue]][source]¶

Gets the value function given the name of the value function

Parameters:	name (string) – Name of the value function needed
Returns:	Value function