genrl.agents.deep.base package¶
Submodules¶
genrl.agents.deep.base.base module¶
-
class
genrl.agents.deep.base.base.BaseAgent(network: Any, env: Any, create_model: bool = True, batch_size: int = 64, gamma: float = 0.99, policy_layers: Tuple = (64, 64), value_layers: Tuple = (64, 64), lr_policy: float = 0.0001, lr_value: float = 0.001, **kwargs)[source]¶ Bases:
abc.ABCBase Agent Class
-
network¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env¶ The environment that the agent is supposed to act on
Type: Environment
-
create_model¶ Whether the model of the algo should be created when initialised
Type: bool
-
batch_size¶ Mini batch size for loading experiences
Type: int
-
gamma¶ The discount factor for rewards
Type: float
-
layers¶ Layers in the Neural Network of the Q-value function
Type: tupleofint
-
lr_policy¶ Learning rate for the policy/actor
Type: float
-
lr_value¶ Learning rate for the Q-value function
Type: float
-
seed¶ Seed for randomness
Type: int
-
render¶ Should the env be rendered during training?
Type: bool
-
device¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
get_hyperparams() → Dict[str, Any][source]¶ Get relevant hyperparameters to save
Returns: Hyperparameters to be saved Return type: hyperparams ( dict)
-
get_logging_params() → Dict[str, Any][source]¶ Gets relevant parameters for logging
Returns: Logging parameters for monitoring training Return type: logs ( dict)
-
load_weights(weights) → None[source]¶ Load weights for the agent from pretrained model
Parameters: weights ( dict) – Dictionary of different neural net weights
-
select_action(state: numpy.ndarray, deterministic: bool = False) → numpy.ndarray[source]¶ Select action given state
Action selection method
Parameters: - state (
np.ndarray) – Current state of the environment - deterministic (bool) – Should the policy be deterministic or stochastic
Returns: Action taken by the agent
Return type: action (
np.ndarray)- state (
-
genrl.agents.deep.base.offpolicy module¶
-
class
genrl.agents.deep.base.offpolicy.OffPolicyAgent(*args, replay_size: int = 5000, buffer_type: str = 'push', **kwargs)[source]¶ Bases:
genrl.agents.deep.base.base.BaseAgentOff Policy Agent Base Class
-
network¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env¶ The environment that the agent is supposed to act on
Type: Environment
-
create_model¶ Whether the model of the algo should be created when initialised
Type: bool
-
batch_size¶ Mini batch size for loading experiences
Type: int
-
gamma¶ The discount factor for rewards
Type: float
-
layers¶ Layers in the Neural Network of the Q-value function
Type: tupleofint
-
lr_policy¶ Learning rate for the policy/actor
Type: float
-
lr_value¶ Learning rate for the Q-value function
Type: float
-
replay_size¶ Capacity of the Replay Buffer
Type: int
-
buffer_type¶ Choose the type of Buffer: [“push”, “prioritized”]
Type: str
-
seed¶ Seed for randomness
Type: int
-
render¶ Should the env be rendered during training?
Type: bool
-
device¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
get_q_loss(batch: collections.namedtuple) → torch.Tensor[source]¶ Normal Function to calculate the loss of the Q-function or critic
Parameters: batch ( collections.namedtupleoftorch.Tensor) – Batch of experiencesReturns: Calculated loss of the Q-function Return type: loss ( torch.Tensor)
-
sample_from_buffer(beta: float = None)[source]¶ Samples experiences from the buffer and converts them into usable formats
Parameters: beta (float) – Importance-Sampling beta for prioritized replay Returns: Replay experiences sampled from the buffer Return type: batch ( list)
-
-
class
genrl.agents.deep.base.offpolicy.OffPolicyAgentAC(*args, polyak=0.995, **kwargs)[source]¶ Bases:
genrl.agents.deep.base.offpolicy.OffPolicyAgentOff Policy Agent Base Class
-
network¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env¶ The environment that the agent is supposed to act on
Type: Environment
-
create_model¶ Whether the model of the algo should be created when initialised
Type: bool
-
batch_size¶ Mini batch size for loading experiences
Type: int
-
gamma¶ The discount factor for rewards
Type: float
-
layers¶ Layers in the Neural Network of the Q-value function
Type: tupleofint
-
lr_policy¶ Learning rate for the policy/actor
Type: float
-
lr_value¶ Learning rate for the Q-value function
Type: float
-
replay_size¶ Capacity of the Replay Buffer
Type: int
-
buffer_type¶ Choose the type of Buffer: [“push”, “prioritized”]
Type: str
-
seed¶ Seed for randomness
Type: int
-
render¶ Should the env be rendered during training?
Type: bool
-
device¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
get_p_loss(states: torch.Tensor) → torch.Tensor[source]¶ Function to get the Policy loss
Parameters: states ( torch.Tensor) – States for which Q-values need to be foundReturns: Calculated policy loss Return type: loss ( torch.Tensor)
-
get_q_loss(batch: collections.namedtuple) → torch.Tensor[source]¶ Actor Critic Function to calculate the loss of the Q-function or critic
Parameters: batch ( collections.namedtupleoftorch.Tensor) – Batch of experiencesReturns: Calculated loss of the Q-function Return type: loss ( torch.Tensor)
-
get_q_values(states: torch.Tensor, actions: torch.Tensor) → torch.Tensor[source]¶ Get Q values corresponding to specific states and actions
Parameters: - states (
torch.Tensor) – States for which Q-values need to be found - actions (
torch.Tensor) – Actions taken at respective states
Returns: Q values for the given states and actions
Return type: q_values (
torch.Tensor)- states (
-
get_target_q_values(next_states: torch.Tensor, rewards: List[float], dones: List[bool]) → torch.Tensor[source]¶ Get target Q values for the TD3
Parameters: - next_states (
torch.Tensor) – Next states for which target Q-values need to be found - rewards (
list) – Rewards at each timestep for each environment - dones (
list) – Game over status for each environment
Returns: Target Q values for the TD3
Return type: target_q_values (
torch.Tensor)- next_states (
-
load_weights(weights) → None[source]¶ Load weights for the agent from pretrained model
Parameters: weights ( dict) – Dictionary of different neural net weights
-
select_action(state: numpy.ndarray, deterministic: bool = True) → numpy.ndarray[source]¶ Select action given state
Deterministic Action Selection with Noise
Parameters: - state (
np.ndarray) – Current state of the environment - deterministic (bool) – Should the policy be deterministic or stochastic
Returns: Action taken by the agent
Return type: action (
np.ndarray)- state (
-
genrl.agents.deep.base.onpolicy module¶
-
class
genrl.agents.deep.base.onpolicy.OnPolicyAgent(*args, rollout_size: int = 1024, buffer_type: str = 'rollout', **kwargs)[source]¶ Bases:
genrl.agents.deep.base.base.BaseAgentBase On Policy Agent Class
-
network¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env¶ The environment that the agent is supposed to act on
Type: Environment
-
create_model¶ Whether the model of the algo should be created when initialised
Type: bool
-
batch_size¶ Mini batch size for loading experiences
Type: int
-
gamma¶ The discount factor for rewards
Type: float
-
layers¶ Layers in the Neural Network of the Q-value function
Type: tupleofint
-
lr_policy¶ Learning rate for the policy/actor
Type: float
-
lr_value¶ Learning rate for the Q-value function
Type: float
-
rollout_size¶ Capacity of the Rollout Buffer
Type: int
-
buffer_type¶ Choose the type of Buffer: [“rollout”]
Type: str
-
seed¶ Seed for randomness
Type: int
-
render¶ Should the env be rendered during training?
Type: bool
-
device¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
collect_rewards(dones: List[bool], timestep: int)[source]¶ Helper function to collect rewards
Runs through all the envs and collects rewards accumulated during rollouts
Parameters: - dones (
listof bool) – Game over statuses of each environment - timestep (int) – Timestep during rollout
- dones (
-
collect_rollouts(state: torch.Tensor)[source]¶ Function to collect rollouts
Collects rollouts by playing the env like a human agent and inputs information into the rollout buffer.
Parameters: state ( torch.Tensor) – The starting state of the environmentReturns: Values of states encountered during the rollout dones ( listof bool): Game over statuses of each environmentReturn type: values ( torch.Tensor)
-