A2C¶

genrl.agents.deep.a2c.a2c module¶

class genrl.agents.deep.a2c.a2c.A2C(*args, noise: Any = None, noise_std: float = 0.1, value_coeff: float = 0.5, entropy_coeff: float = 0.01, **kwargs)[source]¶

Bases: genrl.agents.deep.base.onpolicy.OnPolicyAgent

Advantage Actor Critic algorithm (A2C)

The synchronous version of A3C Paper: https://arxiv.org/abs/1602.01783

network¶

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:	str

env¶

The environment that the agent is supposed to act on

Type:	Environment

create_model¶

Whether the model of the algo should be created when initialised

Type:	bool

batch_size¶

Mini batch size for loading experiences

Type:	int

gamma¶

The discount factor for rewards

Type:	float

layers¶

Layers in the Neural Network of the Q-value function

Type:	`tuple` of `int`

lr_policy¶

Learning rate for the policy/actor

Type:	float

lr_value¶

Learning rate for the critic

Type:	float

rollout_size¶

Capacity of the Replay Buffer

Type:	int

buffer_type¶

Choose the type of Buffer: [“rollout”]

Type:	str

noise¶

Action Noise function added to aid in exploration

Type:	`ActionNoise`

noise_std¶

Standard deviation of the action noise distribution

Type:	float

value_coeff¶

Ratio of magnitude of value updates to policy updates

Type:	float

entropy_coeff¶

Ratio of magnitude of entropy updates to policy updates

Type:	float

seed¶

Seed for randomness

Type:	int

render¶

Should the env be rendered during training?

Type:	bool

device¶

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:	str

empty_logs()[source]¶: Empties logs

evaluate_actions(states: torch.Tensor, actions: torch.Tensor)[source]¶

Evaluates actions taken by actor

Actions taken by actor and their respective states are analysed to get log probabilities and values from critics

Parameters:	states (`torch.Tensor`) – States encountered in rollout actions (`torch.Tensor`) – Actions taken in response to respective states
Returns:	Values of states encountered during the rollout log_probs (`torch.Tensor`): Log of action probabilities given a state
Return type:	values (`torch.Tensor`)

get_hyperparams() → Dict[str, Any][source]¶

Get relevant hyperparameters to save

Returns:	Hyperparameters to be saved
Return type:	hyperparams (`dict`)

get_logging_params() → Dict[str, Any][source]¶

Gets relevant parameters for logging

Returns:	Logging parameters for monitoring training
Return type:	logs (`dict`)

get_traj_loss(values: torch.Tensor, dones: torch.Tensor) → None[source]¶

Get loss from trajectory traversed by agent during rollouts

Computes the returns and advantages needed for calculating loss

Parameters:	values (`torch.Tensor`) – Values of states encountered during the rollout dones (`list` of bool) – Game over statuses of each environment

load_weights(weights) → None[source]¶

Load weights for the agent from pretrained model

Parameters:	weights (`dict`) – Dictionary of different neural net weights

select_action(state: numpy.ndarray, deterministic: bool = False) → numpy.ndarray[source]¶

Select action given state

Action Selection for On Policy Agents with Actor Critic

Parameters:	state (`np.ndarray`) – Current state of the environment deterministic (bool) – Should the policy be deterministic or stochastic
Returns:	Action taken by the agent value (`torch.Tensor`): Value of given state log_prob (`torch.Tensor`): Log probability of selected action
Return type:	action (`np.ndarray`)

update_params() → None[source]¶

Updates the the A2C network

Function to update the A2C actor-critic architecture