DDPG

genrl.agents.deep.ddpg.ddpg module

class genrl.agents.deep.ddpg.ddpg.DDPG(*args, noise: genrl.core.noise.ActionNoise = None, noise_std: float = 0.2, **kwargs)[source]

Bases: genrl.agents.deep.base.offpolicy.OffPolicyAgentAC

Deep Deterministic Policy Gradient Algorithm

Paper: https://arxiv.org/abs/1509.02971

network

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:str
env

The environment that the agent is supposed to act on

Type:Environment
create_model

Whether the model of the algo should be created when initialised

Type:bool
batch_size

Mini batch size for loading experiences

Type:int
gamma

The discount factor for rewards

Type:float
layers

Layers in the Neural Network of the Q-value function

Type:tuple of int
lr_policy

Learning rate for the policy/actor

Type:float
lr_value

Learning rate for the critic

Type:float
replay_size

Capacity of the Replay Buffer

Type:int
buffer_type

Choose the type of Buffer: [“push”, “prioritized”]

Type:str
polyak

Target model update parameter (1 for hard update)

Type:float
noise

Action Noise function added to aid in exploration

Type:ActionNoise
noise_std

Standard deviation of the action noise distribution

Type:float
seed

Seed for randomness

Type:int
render

Should the env be rendered during training?

Type:bool
device

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:str
empty_logs()[source]

Empties logs

get_hyperparams() → Dict[str, Any][source]

Get relevant hyperparameters to save

Returns:Hyperparameters to be saved
Return type:hyperparams (dict)
get_logging_params() → Dict[str, Any][source]

Gets relevant parameters for logging

Returns:Logging parameters for monitoring training
Return type:logs (dict)
update_params(update_interval: int) → None[source]

Update parameters of the model

Parameters:update_interval (int) – Interval between successive updates of the target model