DQN

genrl.agents.deep.dqn.base module

class genrl.agents.deep.dqn.base.DQN(*args, max_epsilon: float = 1.0, min_epsilon: float = 0.01, epsilon_decay: int = 500, **kwargs)[source]

Bases: genrl.agents.deep.base.offpolicy.OffPolicyAgent

Base DQN Class

Paper: https://arxiv.org/abs/1312.5602

network

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:str
env

The environment that the agent is supposed to act on

Type:Environment
create_model

Whether the model of the algo should be created when initialised

Type:bool
batch_size

Mini batch size for loading experiences

Type:int
gamma

The discount factor for rewards

Type:float
value_layers

Layers in the Neural Network of the Q-value function

Type:tuple of int
lr_value

Learning rate for the Q-value function

Type:float
replay_size

Capacity of the Replay Buffer

Type:int
buffer_type

Choose the type of Buffer: [“push”, “prioritized”]

Type:str
max_epsilon

Maximum epsilon for exploration

Type:str
min_epsilon

Minimum epsilon for exploration

Type:str
epsilon_decay

Rate of decay of epsilon (in order to decrease exploration with time)

Type:str
seed

Seed for randomness

Type:int
render

Should the env be rendered during training?

Type:bool
device

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:str
calculate_epsilon_by_frame() → float[source]

Helper function to calculate epsilon after every timestep

Exponentially decays exploration rate from max epsilon to min epsilon The greater the value of epsilon_decay, the slower the decrease in epsilon

empty_logs() → None[source]

Empties logs

get_greedy_action(state: torch.Tensor) → torch.Tensor[source]

Greedy action selection

Parameters:state (torch.Tensor) – Current state of the environment
Returns:Action taken by the agent
Return type:action (torch.Tensor)
get_hyperparams() → Dict[str, Any][source]

Get relevant hyperparameters to save

Returns:Hyperparameters to be saved weights (torch.Tensor): Neural network weights
Return type:hyperparams (dict)
get_logging_params() → Dict[str, Any][source]

Gets relevant parameters for logging

Returns:Logging parameters for monitoring training
Return type:logs (dict)
get_q_values(states: torch.Tensor, actions: torch.Tensor) → torch.Tensor[source]

Get Q values corresponding to specific states and actions

Parameters:
  • states (torch.Tensor) – States for which Q-values need to be found
  • actions (torch.Tensor) – Actions taken at respective states
Returns:

Q values for the given states and actions

Return type:

q_values (torch.Tensor)

get_target_q_values(next_states: torch.Tensor, rewards: List[float], dones: List[bool]) → torch.Tensor[source]

Get target Q values for the DQN

Parameters:
  • next_states (torch.Tensor) – Next states for which target Q-values need to be found
  • rewards (list) – Rewards at each timestep for each environment
  • dones (list) – Game over status for each environment
Returns:

Target Q values for the DQN

Return type:

target_q_values (torch.Tensor)

load_weights(weights) → None[source]

Load weights for the agent from pretrained model

Parameters:weights (torch.Tensor) – neural net weights
select_action(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]

Select action given state

Epsilon-greedy action-selection

Parameters:
  • state (torch.Tensor) – Current state of the environment
  • deterministic (bool) – Should the policy be deterministic or stochastic
Returns:

Action taken by the agent

Return type:

action (torch.Tensor)

update_params(update_interval: int) → None[source]

Update parameters of the model

Parameters:update_interval (int) – Interval between successive updates of the target model
update_params_before_select_action(timestep: int) → None[source]

Update necessary parameters before selecting an action

This updates the epsilon (exploration rate) of the agent every timestep

Parameters:timestep (int) – Timestep of training
update_target_model() → None[source]

Function to update the target Q model

Updates the target model with the training model’s weights when called

genrl.agents.deep.dqn.categorical module

class genrl.agents.deep.dqn.categorical.CategoricalDQN(*args, noisy_layers: Tuple = (32, 128), num_atoms: int = 51, v_min: int = -10, v_max: int = 10, **kwargs)[source]

Bases: genrl.agents.deep.dqn.base.DQN

Categorical DQN Algorithm

Paper: https://arxiv.org/pdf/1707.06887.pdf

network

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:str
env

The environment that the agent is supposed to act on

Type:Environment
create_model

Whether the model of the algo should be created when initialised

Type:bool
batch_size

Mini batch size for loading experiences

Type:int
gamma

The discount factor for rewards

Type:float
layers

Layers in the Neural Network of the Q-value function

Type:tuple of int
lr_value

Learning rate for the Q-value function

Type:float
replay_size

Capacity of the Replay Buffer

Type:int
buffer_type

Choose the type of Buffer: [“push”, “prioritized”]

Type:str
max_epsilon

Maximum epsilon for exploration

Type:str
min_epsilon

Minimum epsilon for exploration

Type:str
epsilon_decay

Rate of decay of epsilon (in order to decrease exploration with time)

Type:str
noisy_layers

Noisy layers in the Neural Network of the Q-value function

Type:tuple of int
num_atoms

Number of atoms used in the discrete distribution

Type:int
v_min

Lower bound of value distribution

Type:int
v_max

Upper bound of value distribution

Type:int
seed

Seed for randomness

Type:int
render

Should the env be rendered during training?

Type:bool
device

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:str
get_greedy_action(state: torch.Tensor) → torch.Tensor[source]

Greedy action selection

Parameters:state (torch.Tensor) – Current state of the environment
Returns:Action taken by the agent
Return type:action (torch.Tensor)
get_q_loss(batch: collections.namedtuple)[source]

Categorical DQN loss function to calculate the loss of the Q-function

Parameters:batch (collections.namedtuple of torch.Tensor) – Batch of experiences
Returns:Calculateed loss of the Q-function
Return type:loss (torch.Tensor)
get_q_values(states: torch.Tensor, actions: torch.Tensor)[source]

Get Q values corresponding to specific states and actions

Parameters:
  • states (torch.Tensor) – States for which Q-values need to be found
  • actions (torch.Tensor) – Actions taken at respective states
Returns:

Q values for the given states and actions

Return type:

q_values (torch.Tensor)

get_target_q_values(next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor)[source]

Projected Distribution of Q-values

Helper function for Categorical/Distributional DQN

Parameters:
  • next_states (torch.Tensor) – Next states being encountered by the agent
  • rewards (torch.Tensor) – Rewards received by the agent
  • dones (torch.Tensor) – Game over status of each environment
Returns:

Projected Q-value Distribution or Target Q Values

Return type:

target_q_values (object)

genrl.agents.deep.dqn.double module

class genrl.agents.deep.dqn.double.DoubleDQN(*args, **kwargs)[source]

Bases: genrl.agents.deep.dqn.base.DQN

Double DQN Class

Paper: https://arxiv.org/abs/1509.06461

network

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:str
env

The environment that the agent is supposed to act on

Type:Environment
batch_size

Mini batch size for loading experiences

Type:int
gamma

The discount factor for rewards

Type:float
layers

Layers in the Neural Network of the Q-value function

Type:tuple of int
lr_value

Learning rate for the Q-value function

Type:float
replay_size

Capacity of the Replay Buffer

Type:int
buffer_type

Choose the type of Buffer: [“push”, “prioritized”]

Type:str
max_epsilon

Maximum epsilon for exploration

Type:str
min_epsilon

Minimum epsilon for exploration

Type:str
epsilon_decay

Rate of decay of epsilon (in order to decrease exploration with time)

Type:str
seed

Seed for randomness

Type:int
render

Should the env be rendered during training?

Type:bool
device

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:str
get_target_q_values(next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor) → torch.Tensor[source]

Get target Q values for the DQN

Parameters:
  • next_states (torch.Tensor) – Next states for which target Q-values need to be found
  • rewards (list) – Rewards at each timestep for each environment
  • dones (list) – Game over status for each environment
Returns:

Target Q values for the DQN

Return type:

target_q_values (torch.Tensor)

genrl.agents.deep.dqn.dueling module

class genrl.agents.deep.dqn.dueling.DuelingDQN(*args, **kwargs)[source]

Bases: genrl.agents.deep.dqn.base.DQN

Dueling DQN class

Paper: https://arxiv.org/abs/1511.06581

network

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:str
env

The environment that the agent is supposed to act on

Type:Environment
batch_size

Mini batch size for loading experiences

Type:int
gamma

The discount factor for rewards

Type:float
layers

Layers in the Neural Network of the Q-value function

Type:tuple of int
lr_value

Learning rate for the Q-value function

Type:float
replay_size

Capacity of the Replay Buffer

Type:int
buffer_type

Choose the type of Buffer: [“push”, “prioritized”]

Type:str
max_epsilon

Maximum epsilon for exploration

Type:str
min_epsilon

Minimum epsilon for exploration

Type:str
epsilon_decay

Rate of decay of epsilon (in order to decrease exploration with time)

Type:str
seed

Seed for randomness

Type:int
render

Should the env be rendered during training?

Type:bool
device

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:str

genrl.agents.deep.dqn.noisy module

class genrl.agents.deep.dqn.noisy.NoisyDQN(*args, noisy_layers: Tuple = (128, 128), **kwargs)[source]

Bases: genrl.agents.deep.dqn.base.DQN

Noisy DQN Algorithm

Paper: https://arxiv.org/abs/1706.10295

network

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:str
env

The environment that the agent is supposed to act on

Type:Environment
batch_size

Mini batch size for loading experiences

Type:int
gamma

The discount factor for rewards

Type:float
layers

Layers in the Neural Network of the Q-value function

Type:tuple of int
lr_value

Learning rate for the Q-value function

Type:float
replay_size

Capacity of the Replay Buffer

Type:int
buffer_type

Choose the type of Buffer: [“push”, “prioritized”]

Type:str
max_epsilon

Maximum epsilon for exploration

Type:str
min_epsilon

Minimum epsilon for exploration

Type:str
epsilon_decay

Rate of decay of epsilon (in order to decrease exploration with time)

Type:str
noisy_layers

Noisy layers in the Neural Network of the Q-value function

Type:tuple of int
seed

Seed for randomness

Type:int
render

Should the env be rendered during training?

Type:bool
device

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:str

genrl.agents.deep.dqn.prioritized module

class genrl.agents.deep.dqn.prioritized.PrioritizedReplayDQN(*args, alpha: float = 0.6, beta: float = 0.4, **kwargs)[source]

Bases: genrl.agents.deep.dqn.base.DQN

Prioritized Replay DQN Class

Paper: https://arxiv.org/abs/1511.05952

network

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:str
env

The environment that the agent is supposed to act on

Type:Environment
batch_size

Mini batch size for loading experiences

Type:int
gamma

The discount factor for rewards

Type:float
layers

Layers in the Neural Network of the Q-value function

Type:tuple of int
lr_value

Learning rate for the Q-value function

Type:float
replay_size

Capacity of the Replay Buffer

Type:int
buffer_type

Choose the type of Buffer: [“push”, “prioritized”]

Type:str
max_epsilon

Maximum epsilon for exploration

Type:str
min_epsilon

Minimum epsilon for exploration

Type:str
epsilon_decay

Rate of decay of epsilon (in order to decrease exploration with time)

Type:str
alpha

Prioritization constant

Type:float
beta

Importance Sampling bias

Type:float
seed

Seed for randomness

Type:int
render

Should the env be rendered during training?

Type:bool
device

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:str
get_q_loss(batch: collections.namedtuple) → torch.Tensor[source]

Normal Function to calculate the loss of the Q-function

Parameters:batch (collections.namedtuple of torch.Tensor) – Batch of experiences
Returns:Calculateed loss of the Q-function
Return type:loss (torch.Tensor)

genrl.agents.deep.dqn.utils module

genrl.agents.deep.dqn.utils.categorical_greedy_action(agent: genrl.agents.deep.dqn.base.DQN, state: torch.Tensor) → torch.Tensor[source]

Greedy action selection for Categorical DQN

Parameters:
  • agent (DQN) – The agent
  • state (torch.Tensor) – Current state of the environment
Returns:

Action taken by the agent

Return type:

action (torch.Tensor)

genrl.agents.deep.dqn.utils.categorical_q_loss(agent: genrl.agents.deep.dqn.base.DQN, batch: collections.namedtuple)[source]

Categorical DQN loss function to calculate the loss of the Q-function

Parameters:
  • agent (DQN) – The agent
  • batch (collections.namedtuple of torch.Tensor) – Batch of experiences
Returns:

Calculateed loss of the Q-function

Return type:

loss (torch.Tensor)

genrl.agents.deep.dqn.utils.categorical_q_target(agent: genrl.agents.deep.dqn.base.DQN, next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor)[source]

Projected Distribution of Q-values

Helper function for Categorical/Distributional DQN

Parameters:
  • agent (DQN) – The agent
  • next_states (torch.Tensor) – Next states being encountered by the agent
  • rewards (torch.Tensor) – Rewards received by the agent
  • dones (torch.Tensor) – Game over status of each environment
Returns:

Projected Q-value Distribution or Target Q Values

Return type:

target_q_values (object)

genrl.agents.deep.dqn.utils.categorical_q_values(agent: genrl.agents.deep.dqn.base.DQN, states: torch.Tensor, actions: torch.Tensor)[source]

Get Q values given state for a Categorical DQN

Parameters:
  • agent (DQN) – The agent
  • states (torch.Tensor) – States being replayed
  • actions (torch.Tensor) – Actions being replayed
Returns:

Q values for the given states and actions

Return type:

q_values (torch.Tensor)

genrl.agents.deep.dqn.utils.ddqn_q_target(agent: genrl.agents.deep.dqn.base.DQN, next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor) → torch.Tensor[source]

Double Q-learning target

Can be used to replace the get_target_values method of the Base DQN class in any DQN algorithm

Parameters:
  • agent (DQN) – The agent
  • next_states (torch.Tensor) – Next states being encountered by the agent
  • rewards (torch.Tensor) – Rewards received by the agent
  • dones (torch.Tensor) – Game over status of each environment
Returns:

Target Q values using Double Q-learning

Return type:

target_q_values (torch.Tensor)

genrl.agents.deep.dqn.utils.prioritized_q_loss(agent: genrl.agents.deep.dqn.base.DQN, batch: collections.namedtuple)[source]

Function to calculate the loss of the Q-function

Returns:The agent loss (torch.Tensor): Calculateed loss of the Q-function
Return type:agent (DQN)