DQN¶
genrl.agents.deep.dqn.base module¶
-
class
genrl.agents.deep.dqn.base.
DQN
(*args, max_epsilon: float = 1.0, min_epsilon: float = 0.01, epsilon_decay: int = 500, **kwargs)[source]¶ Bases:
genrl.agents.deep.base.offpolicy.OffPolicyAgent
Base DQN Class
Paper: https://arxiv.org/abs/1312.5602
-
network
¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env
¶ The environment that the agent is supposed to act on
Type: Environment
-
create_model
¶ Whether the model of the algo should be created when initialised
Type: bool
-
batch_size
¶ Mini batch size for loading experiences
Type: int
-
gamma
¶ The discount factor for rewards
Type: float
-
value_layers
¶ Layers in the Neural Network of the Q-value function
Type: tuple
ofint
-
lr_value
¶ Learning rate for the Q-value function
Type: float
-
replay_size
¶ Capacity of the Replay Buffer
Type: int
-
buffer_type
¶ Choose the type of Buffer: [“push”, “prioritized”]
Type: str
-
max_epsilon
¶ Maximum epsilon for exploration
Type: str
-
min_epsilon
¶ Minimum epsilon for exploration
Type: str
-
epsilon_decay
¶ Rate of decay of epsilon (in order to decrease exploration with time)
Type: str
-
seed
¶ Seed for randomness
Type: int
-
render
¶ Should the env be rendered during training?
Type: bool
-
device
¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
calculate_epsilon_by_frame
() → float[source]¶ Helper function to calculate epsilon after every timestep
Exponentially decays exploration rate from max epsilon to min epsilon The greater the value of epsilon_decay, the slower the decrease in epsilon
-
get_greedy_action
(state: torch.Tensor) → torch.Tensor[source]¶ Greedy action selection
Parameters: state ( torch.Tensor
) – Current state of the environmentReturns: Action taken by the agent Return type: action ( torch.Tensor
)
-
get_hyperparams
() → Dict[str, Any][source]¶ Get relevant hyperparameters to save
Returns: Hyperparameters to be saved weights ( torch.Tensor
): Neural network weightsReturn type: hyperparams ( dict
)
-
get_logging_params
() → Dict[str, Any][source]¶ Gets relevant parameters for logging
Returns: Logging parameters for monitoring training Return type: logs ( dict
)
-
get_q_values
(states: torch.Tensor, actions: torch.Tensor) → torch.Tensor[source]¶ Get Q values corresponding to specific states and actions
Parameters: - states (
torch.Tensor
) – States for which Q-values need to be found - actions (
torch.Tensor
) – Actions taken at respective states
Returns: Q values for the given states and actions
Return type: q_values (
torch.Tensor
)- states (
-
get_target_q_values
(next_states: torch.Tensor, rewards: List[float], dones: List[bool]) → torch.Tensor[source]¶ Get target Q values for the DQN
Parameters: - next_states (
torch.Tensor
) – Next states for which target Q-values need to be found - rewards (
list
) – Rewards at each timestep for each environment - dones (
list
) – Game over status for each environment
Returns: Target Q values for the DQN
Return type: target_q_values (
torch.Tensor
)- next_states (
-
load_weights
(weights) → None[source]¶ Load weights for the agent from pretrained model
Parameters: weights ( torch.Tensor
) – neural net weights
-
select_action
(state: torch.Tensor, deterministic: bool = False) → torch.Tensor[source]¶ Select action given state
Epsilon-greedy action-selection
Parameters: - state (
torch.Tensor
) – Current state of the environment - deterministic (bool) – Should the policy be deterministic or stochastic
Returns: Action taken by the agent
Return type: action (
torch.Tensor
)- state (
-
update_params
(update_interval: int) → None[source]¶ Update parameters of the model
Parameters: update_interval (int) – Interval between successive updates of the target model
-
genrl.agents.deep.dqn.categorical module¶
-
class
genrl.agents.deep.dqn.categorical.
CategoricalDQN
(*args, noisy_layers: Tuple = (32, 128), num_atoms: int = 51, v_min: int = -10, v_max: int = 10, **kwargs)[source]¶ Bases:
genrl.agents.deep.dqn.base.DQN
Categorical DQN Algorithm
Paper: https://arxiv.org/pdf/1707.06887.pdf
-
network
¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env
¶ The environment that the agent is supposed to act on
Type: Environment
-
create_model
¶ Whether the model of the algo should be created when initialised
Type: bool
-
batch_size
¶ Mini batch size for loading experiences
Type: int
-
gamma
¶ The discount factor for rewards
Type: float
-
layers
¶ Layers in the Neural Network of the Q-value function
Type: tuple
ofint
-
lr_value
¶ Learning rate for the Q-value function
Type: float
-
replay_size
¶ Capacity of the Replay Buffer
Type: int
-
buffer_type
¶ Choose the type of Buffer: [“push”, “prioritized”]
Type: str
-
max_epsilon
¶ Maximum epsilon for exploration
Type: str
-
min_epsilon
¶ Minimum epsilon for exploration
Type: str
-
epsilon_decay
¶ Rate of decay of epsilon (in order to decrease exploration with time)
Type: str
-
noisy_layers
¶ Noisy layers in the Neural Network of the Q-value function
Type: tuple
ofint
-
num_atoms
¶ Number of atoms used in the discrete distribution
Type: int
-
v_min
¶ Lower bound of value distribution
Type: int
-
v_max
¶ Upper bound of value distribution
Type: int
-
seed
¶ Seed for randomness
Type: int
-
render
¶ Should the env be rendered during training?
Type: bool
-
device
¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
get_greedy_action
(state: torch.Tensor) → torch.Tensor[source]¶ Greedy action selection
Parameters: state ( torch.Tensor
) – Current state of the environmentReturns: Action taken by the agent Return type: action ( torch.Tensor
)
-
get_q_loss
(batch: collections.namedtuple)[source]¶ Categorical DQN loss function to calculate the loss of the Q-function
Parameters: batch ( collections.namedtuple
oftorch.Tensor
) – Batch of experiencesReturns: Calculateed loss of the Q-function Return type: loss ( torch.Tensor
)
-
get_q_values
(states: torch.Tensor, actions: torch.Tensor)[source]¶ Get Q values corresponding to specific states and actions
Parameters: - states (
torch.Tensor
) – States for which Q-values need to be found - actions (
torch.Tensor
) – Actions taken at respective states
Returns: Q values for the given states and actions
Return type: q_values (
torch.Tensor
)- states (
-
get_target_q_values
(next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor)[source]¶ Projected Distribution of Q-values
Helper function for Categorical/Distributional DQN
Parameters: - next_states (
torch.Tensor
) – Next states being encountered by the agent - rewards (
torch.Tensor
) – Rewards received by the agent - dones (
torch.Tensor
) – Game over status of each environment
Returns: Projected Q-value Distribution or Target Q Values
Return type: target_q_values (object)
- next_states (
-
genrl.agents.deep.dqn.double module¶
-
class
genrl.agents.deep.dqn.double.
DoubleDQN
(*args, **kwargs)[source]¶ Bases:
genrl.agents.deep.dqn.base.DQN
Double DQN Class
Paper: https://arxiv.org/abs/1509.06461
-
network
¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env
¶ The environment that the agent is supposed to act on
Type: Environment
-
batch_size
¶ Mini batch size for loading experiences
Type: int
-
gamma
¶ The discount factor for rewards
Type: float
-
layers
¶ Layers in the Neural Network of the Q-value function
Type: tuple
ofint
-
lr_value
¶ Learning rate for the Q-value function
Type: float
-
replay_size
¶ Capacity of the Replay Buffer
Type: int
-
buffer_type
¶ Choose the type of Buffer: [“push”, “prioritized”]
Type: str
-
max_epsilon
¶ Maximum epsilon for exploration
Type: str
-
min_epsilon
¶ Minimum epsilon for exploration
Type: str
-
epsilon_decay
¶ Rate of decay of epsilon (in order to decrease exploration with time)
Type: str
-
seed
¶ Seed for randomness
Type: int
-
render
¶ Should the env be rendered during training?
Type: bool
-
device
¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
get_target_q_values
(next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor) → torch.Tensor[source]¶ Get target Q values for the DQN
Parameters: - next_states (
torch.Tensor
) – Next states for which target Q-values need to be found - rewards (
list
) – Rewards at each timestep for each environment - dones (
list
) – Game over status for each environment
Returns: Target Q values for the DQN
Return type: target_q_values (
torch.Tensor
)- next_states (
-
genrl.agents.deep.dqn.dueling module¶
-
class
genrl.agents.deep.dqn.dueling.
DuelingDQN
(*args, **kwargs)[source]¶ Bases:
genrl.agents.deep.dqn.base.DQN
Dueling DQN class
Paper: https://arxiv.org/abs/1511.06581
-
network
¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env
¶ The environment that the agent is supposed to act on
Type: Environment
-
batch_size
¶ Mini batch size for loading experiences
Type: int
-
gamma
¶ The discount factor for rewards
Type: float
-
layers
¶ Layers in the Neural Network of the Q-value function
Type: tuple
ofint
-
lr_value
¶ Learning rate for the Q-value function
Type: float
-
replay_size
¶ Capacity of the Replay Buffer
Type: int
-
buffer_type
¶ Choose the type of Buffer: [“push”, “prioritized”]
Type: str
-
max_epsilon
¶ Maximum epsilon for exploration
Type: str
-
min_epsilon
¶ Minimum epsilon for exploration
Type: str
-
epsilon_decay
¶ Rate of decay of epsilon (in order to decrease exploration with time)
Type: str
-
seed
¶ Seed for randomness
Type: int
-
render
¶ Should the env be rendered during training?
Type: bool
-
device
¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
genrl.agents.deep.dqn.noisy module¶
-
class
genrl.agents.deep.dqn.noisy.
NoisyDQN
(*args, noisy_layers: Tuple = (128, 128), **kwargs)[source]¶ Bases:
genrl.agents.deep.dqn.base.DQN
Noisy DQN Algorithm
Paper: https://arxiv.org/abs/1706.10295
-
network
¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env
¶ The environment that the agent is supposed to act on
Type: Environment
-
batch_size
¶ Mini batch size for loading experiences
Type: int
-
gamma
¶ The discount factor for rewards
Type: float
-
layers
¶ Layers in the Neural Network of the Q-value function
Type: tuple
ofint
-
lr_value
¶ Learning rate for the Q-value function
Type: float
-
replay_size
¶ Capacity of the Replay Buffer
Type: int
-
buffer_type
¶ Choose the type of Buffer: [“push”, “prioritized”]
Type: str
-
max_epsilon
¶ Maximum epsilon for exploration
Type: str
-
min_epsilon
¶ Minimum epsilon for exploration
Type: str
-
epsilon_decay
¶ Rate of decay of epsilon (in order to decrease exploration with time)
Type: str
-
noisy_layers
¶ Noisy layers in the Neural Network of the Q-value function
Type: tuple
ofint
-
seed
¶ Seed for randomness
Type: int
-
render
¶ Should the env be rendered during training?
Type: bool
-
device
¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
genrl.agents.deep.dqn.prioritized module¶
-
class
genrl.agents.deep.dqn.prioritized.
PrioritizedReplayDQN
(*args, alpha: float = 0.6, beta: float = 0.4, **kwargs)[source]¶ Bases:
genrl.agents.deep.dqn.base.DQN
Prioritized Replay DQN Class
Paper: https://arxiv.org/abs/1511.05952
-
network
¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env
¶ The environment that the agent is supposed to act on
Type: Environment
-
batch_size
¶ Mini batch size for loading experiences
Type: int
-
gamma
¶ The discount factor for rewards
Type: float
-
layers
¶ Layers in the Neural Network of the Q-value function
Type: tuple
ofint
-
lr_value
¶ Learning rate for the Q-value function
Type: float
-
replay_size
¶ Capacity of the Replay Buffer
Type: int
-
buffer_type
¶ Choose the type of Buffer: [“push”, “prioritized”]
Type: str
-
max_epsilon
¶ Maximum epsilon for exploration
Type: str
-
min_epsilon
¶ Minimum epsilon for exploration
Type: str
-
epsilon_decay
¶ Rate of decay of epsilon (in order to decrease exploration with time)
Type: str
-
alpha
¶ Prioritization constant
Type: float
-
beta
¶ Importance Sampling bias
Type: float
-
seed
¶ Seed for randomness
Type: int
-
render
¶ Should the env be rendered during training?
Type: bool
-
device
¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
genrl.agents.deep.dqn.utils module¶
-
genrl.agents.deep.dqn.utils.
categorical_greedy_action
(agent: genrl.agents.deep.dqn.base.DQN, state: torch.Tensor) → torch.Tensor[source]¶ Greedy action selection for Categorical DQN
Parameters: - agent (
DQN
) – The agent - state (
torch.Tensor
) – Current state of the environment
Returns: Action taken by the agent
Return type: action (
torch.Tensor
)- agent (
-
genrl.agents.deep.dqn.utils.
categorical_q_loss
(agent: genrl.agents.deep.dqn.base.DQN, batch: collections.namedtuple)[source]¶ Categorical DQN loss function to calculate the loss of the Q-function
Parameters: - agent (
DQN
) – The agent - batch (
collections.namedtuple
oftorch.Tensor
) – Batch of experiences
Returns: Calculateed loss of the Q-function
Return type: loss (
torch.Tensor
)- agent (
-
genrl.agents.deep.dqn.utils.
categorical_q_target
(agent: genrl.agents.deep.dqn.base.DQN, next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor)[source]¶ Projected Distribution of Q-values
Helper function for Categorical/Distributional DQN
Parameters: - agent (
DQN
) – The agent - next_states (
torch.Tensor
) – Next states being encountered by the agent - rewards (
torch.Tensor
) – Rewards received by the agent - dones (
torch.Tensor
) – Game over status of each environment
Returns: Projected Q-value Distribution or Target Q Values
Return type: target_q_values (object)
- agent (
-
genrl.agents.deep.dqn.utils.
categorical_q_values
(agent: genrl.agents.deep.dqn.base.DQN, states: torch.Tensor, actions: torch.Tensor)[source]¶ Get Q values given state for a Categorical DQN
Parameters: - agent (
DQN
) – The agent - states (
torch.Tensor
) – States being replayed - actions (
torch.Tensor
) – Actions being replayed
Returns: Q values for the given states and actions
Return type: q_values (
torch.Tensor
)- agent (
-
genrl.agents.deep.dqn.utils.
ddqn_q_target
(agent: genrl.agents.deep.dqn.base.DQN, next_states: torch.Tensor, rewards: torch.Tensor, dones: torch.Tensor) → torch.Tensor[source]¶ Double Q-learning target
Can be used to replace the get_target_values method of the Base DQN class in any DQN algorithm
Parameters: - agent (
DQN
) – The agent - next_states (
torch.Tensor
) – Next states being encountered by the agent - rewards (
torch.Tensor
) – Rewards received by the agent - dones (
torch.Tensor
) – Game over status of each environment
Returns: Target Q values using Double Q-learning
Return type: target_q_values (
torch.Tensor
)- agent (