DQN¶

genrl.agents.deep.dqn.base module¶

class genrl.agents.deep.dqn.base.DQN(*args, max_epsilon: float = 1.0, min_epsilon: float = 0.01, epsilon_decay: int = 500, **kwargs)[source]¶

Bases: genrl.agents.deep.base.offpolicy.OffPolicyAgent

Base DQN Class

Paper: https://arxiv.org/abs/1312.5602

network¶

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:	str

env¶

The environment that the agent is supposed to act on

Type:	Environment

create_model¶

Whether the model of the algo should be created when initialised

Type:	bool

batch_size¶

Mini batch size for loading experiences

Type:	int

gamma¶

The discount factor for rewards

Type:	float

value_layers¶

Layers in the Neural Network of the Q-value function

Type:	`tuple` of `int`

lr_value¶

Learning rate for the Q-value function

Type:	float

replay_size¶

Capacity of the Replay Buffer

Type:	int

buffer_type¶

Choose the type of Buffer: [“push”, “prioritized”]

Type:	str

max_epsilon¶

Maximum epsilon for exploration

Type:	str

min_epsilon¶

Minimum epsilon for exploration

Type:	str

epsilon_decay¶

Rate of decay of epsilon (in order to decrease exploration with time)

Type:	str

seed¶

Seed for randomness

Type:	int

render¶

Should the env be rendered during training?

Type:	bool

device¶

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:	str

calculate_epsilon_by_frame() → float[source]¶

Helper function to calculate epsilon after every timestep

Exponentially decays exploration rate from max epsilon to min epsilon The greater the value of epsilon_decay, the slower the decrease in epsilon

empty_logs() → None[source]¶: Empties logs

get_greedy_action(state: torch.Tensor) → torch.Tensor[source]¶

Greedy action selection

Parameters:	state (`torch.Tensor`) – Current state of the environment
Returns:	Action taken by the agent
Return type:	action (`torch.Tensor`)

get_hyperparams() → Dict[str, Any][source]¶

Get relevant hyperparameters to save

Returns:	Hyperparameters to be saved weights (`torch.Tensor`): Neural network weights
Return type:	hyperparams (`dict`)

get_logging_params() → Dict[str, Any][source]¶

Gets relevant parameters for logging

Returns:	Logging parameters for monitoring training
Return type:	logs (`dict`)

get_q_values(states: torch.Tensor, actions: torch.Tensor) → torch.Tensor[source]¶

Get Q values corresponding to specific states and actions

Parameters:	states (`torch.Tensor`) – States for which Q-values need to be found actions (`torch.Tensor`) – Actions taken at respective states
Returns:	Q values for the given states and actions
Return type:	q_values (`torch.Tensor`)

get_target_q_values(next_states: torch.Tensor, rewards: List[float], dones: List[bool]) → torch.Tensor[source]¶

Get target Q values for the DQN

Parameters:	next_states (`torch.Tensor`) – Next states for which target Q-values need to be found rewards (`list`) – Rewards at each timestep for each environment dones (`list`) – Game over status for each environment
Returns:	Target Q values for the DQN
Return type:	target_q_values (`torch.Tensor`)

load_weights(weights) → None[source]¶

Load weights for the agent from pretrained model

Parameters:	weights (`torch.Tensor`) – neural net weights

select_action(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]¶

Select action given state

Epsilon-greedy action-selection

Parameters:	state (`torch.Tensor`) – Current state of the environment deterministic (bool) – Should the policy be deterministic or stochastic
Returns:	Action taken by the agent
Return type:	action (`torch.Tensor`)

update_params(update_interval: int) → None[source]¶

Update parameters of the model

Parameters:	update_interval (int) – Interval between successive updates of the target model

update_params_before_select_action(timestep: int) → None[source]¶

Update necessary parameters before selecting an action

This updates the epsilon (exploration rate) of the agent every timestep

Parameters:	timestep (int) – Timestep of training

update_target_model() → None[source]¶

Function to update the target Q model

Updates the target model with the training model’s weights when called

genrl.agents.deep.dqn.categorical module¶

class genrl.agents.deep.dqn.categorical.CategoricalDQN(*args, noisy_layers: Tuple = (32, 128), num_atoms: int = 51, v_min: int = -10, v_max: int = 10, **kwargs)[source]¶

Bases: genrl.agents.deep.dqn.base.DQN

Categorical DQN Algorithm

Paper: https://arxiv.org/pdf/1707.06887.pdf

network¶

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:	str

env¶

The environment that the agent is supposed to act on

Type:	Environment

create_model¶

Whether the model of the algo should be created when initialised

Type:	bool

batch_size¶

Mini batch size for loading experiences

Type:	int

gamma¶

The discount factor for rewards

Type:	float

layers¶

Layers in the Neural Network of the Q-value function

Type:	`tuple` of `int`

lr_value¶

Learning rate for the Q-value function

Type:	float

replay_size¶

Capacity of the Replay Buffer

Type:	int

buffer_type¶

Choose the type of Buffer: [“push”, “prioritized”]

Type:	str

max_epsilon¶

Maximum epsilon for exploration

Type:	str

min_epsilon¶

Minimum epsilon for exploration

Type:	str

epsilon_decay¶

Rate of decay of epsilon (in order to decrease exploration with time)

Type:	str

noisy_layers¶

Noisy layers in the Neural Network of the Q-value function

Type:	`tuple` of `int`

num_atoms¶

Number of atoms used in the discrete distribution

Type:	int

v_min¶

Lower bound of value distribution

Type:	int

v_max¶

Upper bound of value distribution

Type:	int

seed¶

Seed for randomness

Type:	int

render¶

Should the env be rendered during training?

Type:	bool

device¶

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:	str

get_greedy_action(state: torch.Tensor) → torch.Tensor[source]¶

Greedy action selection

Parameters:	state (`torch.Tensor`) – Current state of the environment
Returns:	Action taken by the agent
Return type:	action (`torch.Tensor`)

get_q_loss(batch: collections.namedtuple)[source]¶

Categorical DQN loss function to calculate the loss of the Q-function

Parameters:	batch (`collections.namedtuple` of `torch.Tensor`) – Batch of experiences
Returns:	Calculateed loss of the Q-function
Return type:	loss (`torch.Tensor`)

get_q_values(states: torch.Tensor, actions: torch.Tensor)[source]¶

Get Q values corresponding to specific states and actions

Parameters:	states (`torch.Tensor`) – States for which Q-values need to be found actions (`torch.Tensor`) – Actions taken at respective states
Returns:	Q values for the given states and actions
Return type:	q_values (`torch.Tensor`)

get_target_q_values(next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor)[source]¶

Projected Distribution of Q-values

Helper function for Categorical/Distributional DQN

Parameters:	next_states (`torch.Tensor`) – Next states being encountered by the agent rewards (`torch.Tensor`) – Rewards received by the agent dones (`torch.Tensor`) – Game over status of each environment
Returns:	Projected Q-value Distribution or Target Q Values
Return type:	target_q_values (object)

genrl.agents.deep.dqn.double module¶

class genrl.agents.deep.dqn.double.DoubleDQN(*args, **kwargs)[source]¶

Bases: genrl.agents.deep.dqn.base.DQN

Double DQN Class

Paper: https://arxiv.org/abs/1509.06461

network¶

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:	str

env¶

The environment that the agent is supposed to act on

Type:	Environment

batch_size¶

Mini batch size for loading experiences

Type:	int

gamma¶

The discount factor for rewards

Type:	float

layers¶

Layers in the Neural Network of the Q-value function

Type:	`tuple` of `int`

lr_value¶

Learning rate for the Q-value function

Type:	float

replay_size¶

Capacity of the Replay Buffer

Type:	int

buffer_type¶

Choose the type of Buffer: [“push”, “prioritized”]

Type:	str

max_epsilon¶

Maximum epsilon for exploration

Type:	str

min_epsilon¶

Minimum epsilon for exploration

Type:	str

epsilon_decay¶

Rate of decay of epsilon (in order to decrease exploration with time)

Type:	str

seed¶

Seed for randomness

Type:	int

render¶

Should the env be rendered during training?

Type:	bool

device¶

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:	str

get_target_q_values(next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor) → torch.Tensor[source]¶

Get target Q values for the DQN

Parameters:	next_states (`torch.Tensor`) – Next states for which target Q-values need to be found rewards (`list`) – Rewards at each timestep for each environment dones (`list`) – Game over status for each environment
Returns:	Target Q values for the DQN
Return type:	target_q_values (`torch.Tensor`)

genrl.agents.deep.dqn.dueling module¶

class genrl.agents.deep.dqn.dueling.DuelingDQN(*args, **kwargs)[source]¶

Bases: genrl.agents.deep.dqn.base.DQN

Dueling DQN class

Paper: https://arxiv.org/abs/1511.06581

network¶

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:	str

env¶

The environment that the agent is supposed to act on

Type:	Environment

batch_size¶

Mini batch size for loading experiences

Type:	int

gamma¶

The discount factor for rewards

Type:	float

layers¶

Layers in the Neural Network of the Q-value function

Type:	`tuple` of `int`

lr_value¶

Learning rate for the Q-value function

Type:	float

replay_size¶

Capacity of the Replay Buffer

Type:	int

buffer_type¶

Choose the type of Buffer: [“push”, “prioritized”]

Type:	str

max_epsilon¶

Maximum epsilon for exploration

Type:	str

min_epsilon¶

Minimum epsilon for exploration

Type:	str

epsilon_decay¶

Rate of decay of epsilon (in order to decrease exploration with time)

Type:	str

seed¶

Seed for randomness

Type:	int

render¶

Should the env be rendered during training?

Type:	bool

device¶

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:	str

genrl.agents.deep.dqn.noisy module¶

class genrl.agents.deep.dqn.noisy.NoisyDQN(*args, noisy_layers: Tuple = (128, 128), **kwargs)[source]¶

Bases: genrl.agents.deep.dqn.base.DQN

Noisy DQN Algorithm

Paper: https://arxiv.org/abs/1706.10295

network¶

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:	str

env¶

The environment that the agent is supposed to act on

Type:	Environment

batch_size¶

Mini batch size for loading experiences

Type:	int

gamma¶

The discount factor for rewards

Type:	float

layers¶

Layers in the Neural Network of the Q-value function

Type:	`tuple` of `int`

lr_value¶

Learning rate for the Q-value function

Type:	float

replay_size¶

Capacity of the Replay Buffer

Type:	int

buffer_type¶

Choose the type of Buffer: [“push”, “prioritized”]

Type:	str

max_epsilon¶

Maximum epsilon for exploration

Type:	str

min_epsilon¶

Minimum epsilon for exploration

Type:	str

epsilon_decay¶

Rate of decay of epsilon (in order to decrease exploration with time)

Type:	str

noisy_layers¶

Noisy layers in the Neural Network of the Q-value function

Type:	`tuple` of `int`

seed¶

Seed for randomness

Type:	int

render¶

Should the env be rendered during training?

Type:	bool

device¶

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:	str

genrl.agents.deep.dqn.prioritized module¶

class genrl.agents.deep.dqn.prioritized.PrioritizedReplayDQN(*args, alpha: float = 0.6, beta: float = 0.4, **kwargs)[source]¶

Bases: genrl.agents.deep.dqn.base.DQN

Prioritized Replay DQN Class

Paper: https://arxiv.org/abs/1511.05952

network¶

The network type of the Q-value function. Supported types: [“cnn”, “mlp”]

Type:	str

env¶

The environment that the agent is supposed to act on

Type:	Environment

batch_size¶

Mini batch size for loading experiences

Type:	int

gamma¶

The discount factor for rewards

Type:	float

layers¶

Layers in the Neural Network of the Q-value function

Type:	`tuple` of `int`

lr_value¶

Learning rate for the Q-value function

Type:	float

replay_size¶

Capacity of the Replay Buffer

Type:	int

buffer_type¶

Choose the type of Buffer: [“push”, “prioritized”]

Type:	str

max_epsilon¶

Maximum epsilon for exploration

Type:	str

min_epsilon¶

Minimum epsilon for exploration

Type:	str

epsilon_decay¶

Rate of decay of epsilon (in order to decrease exploration with time)

Type:	str

alpha¶

Prioritization constant

Type:	float

beta¶

Importance Sampling bias

Type:	float

seed¶

Seed for randomness

Type:	int

render¶

Should the env be rendered during training?

Type:	bool

device¶

Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]

Type:	str

get_q_loss(batch: collections.namedtuple) → torch.Tensor[source]¶

Normal Function to calculate the loss of the Q-function

Parameters:	batch (`collections.namedtuple` of `torch.Tensor`) – Batch of experiences
Returns:	Calculateed loss of the Q-function
Return type:	loss (`torch.Tensor`)

genrl.agents.deep.dqn.utils module¶

genrl.agents.deep.dqn.utils.categorical_greedy_action(agent: genrl.agents.deep.dqn.base.DQN, state: torch.Tensor) → torch.Tensor[source]¶

Greedy action selection for Categorical DQN

Parameters:	agent (`DQN`) – The agent state (`torch.Tensor`) – Current state of the environment
Returns:	Action taken by the agent
Return type:	action (`torch.Tensor`)

genrl.agents.deep.dqn.utils.categorical_q_loss(agent: genrl.agents.deep.dqn.base.DQN, batch: collections.namedtuple)[source]¶

Categorical DQN loss function to calculate the loss of the Q-function

Parameters:	agent (`DQN`) – The agent batch (`collections.namedtuple` of `torch.Tensor`) – Batch of experiences
Returns:	Calculateed loss of the Q-function
Return type:	loss (`torch.Tensor`)

genrl.agents.deep.dqn.utils.categorical_q_target(agent: genrl.agents.deep.dqn.base.DQN, next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor)[source]¶

Projected Distribution of Q-values

Helper function for Categorical/Distributional DQN

Parameters:	agent (`DQN`) – The agent next_states (`torch.Tensor`) – Next states being encountered by the agent rewards (`torch.Tensor`) – Rewards received by the agent dones (`torch.Tensor`) – Game over status of each environment
Returns:	Projected Q-value Distribution or Target Q Values
Return type:	target_q_values (object)

genrl.agents.deep.dqn.utils.categorical_q_values(agent: genrl.agents.deep.dqn.base.DQN, states: torch.Tensor, actions: torch.Tensor)[source]¶

Get Q values given state for a Categorical DQN

Parameters:	agent (`DQN`) – The agent states (`torch.Tensor`) – States being replayed actions (`torch.Tensor`) – Actions being replayed
Returns:	Q values for the given states and actions
Return type:	q_values (`torch.Tensor`)

genrl.agents.deep.dqn.utils.ddqn_q_target(agent: genrl.agents.deep.dqn.base.DQN, next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor) → torch.Tensor[source]¶

Double Q-learning target

Can be used to replace the get_target_values method of the Base DQN class in any DQN algorithm

Parameters:	agent (`DQN`) – The agent next_states (`torch.Tensor`) – Next states being encountered by the agent rewards (`torch.Tensor`) – Rewards received by the agent dones (`torch.Tensor`) – Game over status of each environment
Returns:	Target Q values using Double Q-learning
Return type:	target_q_values (`torch.Tensor`)

genrl.agents.deep.dqn.utils.prioritized_q_loss(agent: genrl.agents.deep.dqn.base.DQN, batch: collections.namedtuple)[source]¶

Function to calculate the loss of the Q-function

Returns:	The agent loss (`torch.Tensor`): Calculateed loss of the Q-function
Return type:	agent (`DQN`)