A2C¶
genrl.agents.deep.a2c.a2c module¶
-
class
genrl.agents.deep.a2c.a2c.
A2C
(*args, noise: Any = None, noise_std: float = 0.1, value_coeff: float = 0.5, entropy_coeff: float = 0.01, **kwargs)[source]¶ Bases:
genrl.agents.deep.base.onpolicy.OnPolicyAgent
Advantage Actor Critic algorithm (A2C)
The synchronous version of A3C Paper: https://arxiv.org/abs/1602.01783
-
network
¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env
¶ The environment that the agent is supposed to act on
Type: Environment
-
create_model
¶ Whether the model of the algo should be created when initialised
Type: bool
-
batch_size
¶ Mini batch size for loading experiences
Type: int
-
gamma
¶ The discount factor for rewards
Type: float
-
layers
¶ Layers in the Neural Network of the Q-value function
Type: tuple
ofint
-
lr_policy
¶ Learning rate for the policy/actor
Type: float
-
lr_value
¶ Learning rate for the critic
Type: float
-
rollout_size
¶ Capacity of the Replay Buffer
Type: int
-
buffer_type
¶ Choose the type of Buffer: [“rollout”]
Type: str
-
noise
¶ Action Noise function added to aid in exploration
Type: ActionNoise
-
noise_std
¶ Standard deviation of the action noise distribution
Type: float
-
value_coeff
¶ Ratio of magnitude of value updates to policy updates
Type: float
-
entropy_coeff
¶ Ratio of magnitude of entropy updates to policy updates
Type: float
-
seed
¶ Seed for randomness
Type: int
-
render
¶ Should the env be rendered during training?
Type: bool
-
device
¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
evaluate_actions
(states: torch.Tensor, actions: torch.Tensor)[source]¶ Evaluates actions taken by actor
Actions taken by actor and their respective states are analysed to get log probabilities and values from critics
Parameters: - states (
torch.Tensor
) – States encountered in rollout - actions (
torch.Tensor
) – Actions taken in response to respective states
Returns: Values of states encountered during the rollout log_probs (
torch.Tensor
): Log of action probabilities given a stateReturn type: values (
torch.Tensor
)- states (
-
get_hyperparams
() → Dict[str, Any][source]¶ Get relevant hyperparameters to save
Returns: Hyperparameters to be saved Return type: hyperparams ( dict
)
-
get_logging_params
() → Dict[str, Any][source]¶ Gets relevant parameters for logging
Returns: Logging parameters for monitoring training Return type: logs ( dict
)
-
get_traj_loss
(values: torch.Tensor, dones: torch.Tensor) → None[source]¶ Get loss from trajectory traversed by agent during rollouts
Computes the returns and advantages needed for calculating loss
Parameters: - values (
torch.Tensor
) – Values of states encountered during the rollout - dones (
list
of bool) – Game over statuses of each environment
- values (
-
load_weights
(weights) → None[source]¶ Load weights for the agent from pretrained model
Parameters: weights ( dict
) – Dictionary of different neural net weights
-
select_action
(state: numpy.ndarray, deterministic: bool = False) → numpy.ndarray[source]¶ Select action given state
Action Selection for On Policy Agents with Actor Critic
Parameters: - state (
np.ndarray
) – Current state of the environment - deterministic (bool) – Should the policy be deterministic or stochastic
Returns: Action taken by the agent value (
torch.Tensor
): Value of given state log_prob (torch.Tensor
): Log probability of selected actionReturn type: action (
np.ndarray
)- state (
-