A2C

genrl.agents.deep.a2c.a2c module

class genrl.agents.deep.a2c.a2c.A2C(*args, noise: Any = None, noise_std: float = 0.1, value_coeff: float = 0.5, entropy_coeff: float = 0.01, **kwargs)[source]

Bases: genrl.agents.deep.base.onpolicy.OnPolicyAgent

Advantage Actor Critic algorithm (A2C)

The synchronous version of A3C Paper: https://arxiv.org/abs/1602.01783

network

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:str
env

The environment that the agent is supposed to act on

Type:Environment
create_model

Whether the model of the algo should be created when initialised

Type:bool
batch_size

Mini batch size for loading experiences

Type:int
gamma

The discount factor for rewards

Type:float
layers

Layers in the Neural Network of the Q-value function

Type:tuple of int
lr_policy

Learning rate for the policy/actor

Type:float
lr_value

Learning rate for the critic

Type:float
rollout_size

Capacity of the Replay Buffer

Type:int
buffer_type

Choose the type of Buffer: [“rollout”]

Type:str
noise

Action Noise function added to aid in exploration

Type:ActionNoise
noise_std

Standard deviation of the action noise distribution

Type:float
value_coeff

Ratio of magnitude of value updates to policy updates

Type:float
entropy_coeff

Ratio of magnitude of entropy updates to policy updates

Type:float
seed

Seed for randomness

Type:int
render

Should the env be rendered during training?

Type:bool
device

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:str
empty_logs()[source]

Empties logs

evaluate_actions(states: torch.Tensor, actions: torch.Tensor)[source]

Evaluates actions taken by actor

Actions taken by actor and their respective states are analysed to get log probabilities and values from critics

Parameters:
  • states (torch.Tensor) – States encountered in rollout
  • actions (torch.Tensor) – Actions taken in response to respective states
Returns:

Values of states encountered during the rollout log_probs (torch.Tensor): Log of action probabilities given a state

Return type:

values (torch.Tensor)

get_hyperparams() → Dict[str, Any][source]

Get relevant hyperparameters to save

Returns:Hyperparameters to be saved
Return type:hyperparams (dict)
get_logging_params() → Dict[str, Any][source]

Gets relevant parameters for logging

Returns:Logging parameters for monitoring training
Return type:logs (dict)
get_traj_loss(values: torch.Tensor, dones: torch.Tensor) → None[source]

Get loss from trajectory traversed by agent during rollouts

Computes the returns and advantages needed for calculating loss

Parameters:
  • values (torch.Tensor) – Values of states encountered during the rollout
  • dones (list of bool) – Game over statuses of each environment
load_weights(weights) → None[source]

Load weights for the agent from pretrained model

Parameters:weights (dict) – Dictionary of different neural net weights
select_action(state: numpy.ndarray, deterministic: bool = False) → numpy.ndarray[source]

Select action given state

Action Selection for On Policy Agents with Actor Critic

Parameters:
  • state (np.ndarray) – Current state of the environment
  • deterministic (bool) – Should the policy be deterministic or stochastic
Returns:

Action taken by the agent value (torch.Tensor): Value of given state log_prob (torch.Tensor): Log probability of selected action

Return type:

action (np.ndarray)

update_params() → None[source]

Updates the the A2C network

Function to update the A2C actor-critic architecture