Contextual Bandit¶
Base¶
-
class
genrl.agents.bandits.contextual.base.
DCBAgent
(bandit: genrl.core.bandit.Bandit, device: str = 'cpu', **kwargs)[source]¶ Bases:
genrl.core.bandit.BanditAgent
Base class for deep contextual bandit solving agents
Parameters: - bandit (gennav.deep.bandit.data_bandits.DataBasedBandit) – The bandit to solve
- device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
-
bandit
¶ The bandit to solve
Type: gennav.deep.bandit.data_bandits.DataBasedBandit
-
device
¶ Device to use for tensor operations.
Type: torch.device
-
select_action
(context: torch.Tensor) → int[source]¶ Select an action based on given context
Parameters: context (torch.Tensor) – The context vector to select action for Note
This method needs to be implemented in the specific agent.
Returns: The action to take Return type: int
-
update_parameters
(action: Optional[int] = None, batch_size: Optional[int] = None, train_epochs: Optional[int] = None) → None[source]¶ Update parameters of the agent.
Parameters: - action (Optional[int], optional) – Action to update the parameters for. Defaults to None.
- batch_size (Optional[int], optional) – Size of batch to update parameters with. Defaults to None.
- train_epochs (Optional[int], optional) – Epochs to train neural network for. Defaults to None.
Note
This method needs to be implemented in the specific agent.
Bootstrap Neural¶
-
class
genrl.agents.bandits.contextual.bootstrap_neural.
BootstrapNeuralAgent
(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶ Bases:
genrl.agents.bandits.contextual.base.DCBAgent
Bootstraped ensemble agentfor deep contextual bandits.
Parameters: - bandit (DataBasedBandit) – The bandit to solve
- init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
- hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
- init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
- lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
- lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
- max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
- dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
- eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
- n (int, optional) – Number of models in ensemble. Defaults to 10.
- add_prob (float, optional) – Probability of adding a transition to a database. Defaults to 0.95.
- device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
-
select_action
(context: torch.Tensor) → int[source]¶ Select an action based on given context.
Selects an action by computing a forward pass through a randomly selected network from the ensemble.
Parameters: context (torch.Tensor) – The context vector to select action for. Returns: The action to take. Return type: int
-
update_db
(context: torch.Tensor, action: int, reward: int)[source]¶ Updates transition database with given transition
The transition is added to each database with a certain probability.
Parameters: - context (torch.Tensor) – Context recieved
- action (int) – Action taken
- reward (int) – Reward recieved
-
update_params
(action: Optional[int] = None, batch_size: int = 512, train_epochs: int = 20)[source]¶ Update parameters of the agent.
Trains each neural network in the ensemble.
Parameters: - action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None.
- batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512
- train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20
Fixed¶
-
class
genrl.agents.bandits.contextual.fixed.
FixedAgent
(bandit: genrl.utils.data_bandits.base.DataBasedBandit, p: List[float] = None, device: str = 'cpu')[source]¶
Linear Posterior¶
-
class
genrl.agents.bandits.contextual.linpos.
LinearPosteriorAgent
(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶ Bases:
genrl.agents.bandits.contextual.base.DCBAgent
Deep contextual bandit agent using bayesian regression for posterior inference.
Parameters: - bandit (DataBasedBandit) – The bandit to solve
- init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
- lambda_prior (float, optional) – Guassian prior for linear model. Defaults to 0.25.
- a0 (float, optional) – Inverse gamma prior for noise. Defaults to 6.0.
- b0 (float, optional) – Inverse gamma prior for noise. Defaults to 6.0.
- device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
-
select_action
(context: torch.Tensor) → int[source]¶ Select an action based on given context.
Selecting action with highest predicted reward computed through betas sampled from posterior.
Parameters: context (torch.Tensor) – The context vector to select action for. Returns: The action to take. Return type: int
-
update_db
(context: torch.Tensor, action: int, reward: int)[source]¶ Updates transition database with given transition
Parameters: - context (torch.Tensor) – Context recieved
- action (int) – Action taken
- reward (int) – Reward recieved
-
update_params
(action: int, batch_size: int = 512, train_epochs: Optional[int] = None)[source]¶ Update parameters of the agent.
Updated the posterior over beta though bayesian regression.
Parameters: - action (int) – Action to update the parameters for.
- batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512
- train_epochs (Optional[int], optional) – Epochs to train neural network for. Not applicable in this agent. Defaults to None
Neural Greedy¶
-
class
genrl.agents.bandits.contextual.neural_greedy.
NeuralGreedyAgent
(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶ Bases:
genrl.agents.bandits.contextual.base.DCBAgent
Deep contextual bandit agent using epsilon greedy with a neural network.
Parameters: - bandit (DataBasedBandit) – The bandit to solve
- init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
- hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
- init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
- lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
- lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
- max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
- dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
- eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
- epsilon (float, optional) – Probability of selecting a random action. Defaults to 0.0.
- device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
-
select_action
(context: torch.Tensor) → int[source]¶ Select an action based on given context.
Selects an action by computing a forward pass through network with an epsillon probability of selecting a random action.
Parameters: context (torch.Tensor) – The context vector to select action for. Returns: The action to take. Return type: int
-
update_db
(context: torch.Tensor, action: int, reward: int)[source]¶ Updates transition database with given transition
Parameters: - context (torch.Tensor) – Context recieved
- action (int) – Action taken
- reward (int) – Reward recieved
-
update_params
(action: Optional[int] = None, batch_size: int = 512, train_epochs: int = 20)[source]¶ Update parameters of the agent.
Trains neural network.
Parameters: - action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None.
- batch_size (int, optional) – Size of batch to update parameters with. Defaults tp 512
- train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20
Neural Linear Posterior¶
-
class
genrl.agents.bandits.contextual.neural_linpos.
NeuralLinearPosteriorAgent
(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶ Bases:
genrl.agents.bandits.contextual.base.DCBAgent
Deep contextual bandit agent using bayesian regression on for posterior inference
A neural network is used to transform context vector to a latent represntation on which bayesian regression is performed.
Parameters: - bandit (DataBasedBandit) – The bandit to solve
- init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
- hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
- init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
- lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
- lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
- max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
- dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
- eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
- nn_update_ratio (int, optional) – . Defaults to 2.
- lambda_prior (float, optional) – Guassian prior for linear model. Defaults to 0.25.
- a0 (float, optional) – Inverse gamma prior for noise. Defaults to 3.0.
- b0 (float, optional) – Inverse gamma prior for noise. Defaults to 3.0.
- device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
-
select_action
(context: torch.Tensor) → int[source]¶ Select an action based on given context.
Selects an action by computing a forward pass through network to output a representation of the context on which bayesian linear regression is performed to select an action.
Parameters: context (torch.Tensor) – The context vector to select action for. Returns: The action to take. Return type: int
-
update_db
(context: torch.Tensor, action: int, reward: int)[source]¶ Updates transition database with given transition
Updates latent context and predicted rewards seperately.
Parameters: - context (torch.Tensor) – Context recieved
- action (int) – Action taken
- reward (int) – Reward recieved
-
update_params
(action: int, batch_size: int = 512, train_epochs: int = 20)[source]¶ Update parameters of the agent.
Trains neural network and updates bayesian regression parameters.
Parameters: - action (int) – Action to update the parameters for.
- batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512
- train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20
Neural Noise Sampling¶
-
class
genrl.agents.bandits.contextual.neural_noise_sampling.
NeuralNoiseSamplingAgent
(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶ Bases:
genrl.agents.bandits.contextual.base.DCBAgent
Deep contextual bandit agent with noise sampling for neural network parameters.
Parameters: - bandit (DataBasedBandit) – The bandit to solve
- init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
- hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
- init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
- lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
- lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
- max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
- dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
- eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
- noise_std_dev (float, optional) – Standard deviation of sampled noise. Defaults to 0.05.
- eps (float, optional) – Small constant for bounding KL divergece of noise. Defaults to 0.1.
- noise_update_batch_size (int, optional) – Batch size for updating noise parameters. Defaults to 256.
- device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
-
select_action
(context: torch.Tensor) → int[source]¶ Select an action based on given context.
Selects an action by adding noise to neural network paramters and the computing forward with the context vector as input.
Parameters: context (torch.Tensor) – The context vector to select action for. Returns: The action to take Return type: int
-
update_db
(context: torch.Tensor, action: int, reward: int)[source]¶ Updates transition database with given transition
Parameters: - context (torch.Tensor) – Context recieved
- action (int) – Action taken
- reward (int) – Reward recieved
-
update_params
(action: Optional[int] = None, batch_size: int = 512, train_epochs: int = 20)[source]¶ Update parameters of the agent.
Trains each neural network in the ensemble.
Parameters: - action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None.
- batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512
- train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20
Variational¶
-
class
genrl.agents.bandits.contextual.variational.
VariationalAgent
(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶ Bases:
genrl.agents.bandits.contextual.base.DCBAgent
Deep contextual bandit agent using variation inference.
Parameters: - bandit (DataBasedBandit) – The bandit to solve
- init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
- hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
- init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
- lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
- lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
- max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
- dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
- eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
- noise_std (float, optional) – Standard deviation of noise in bayesian neural network. Defaults to 0.1.
- device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
-
select_action
(context: torch.Tensor) → int[source]¶ Select an action based on given context.
Selects an action by computing a forward pass through the bayesian neural network.
Parameters: context (torch.Tensor) – The context vector to select action for. Returns: The action to take. Return type: int
-
update_db
(context: torch.Tensor, action: int, reward: int)[source]¶ Updates transition database with given transition
Parameters: - context (torch.Tensor) – Context recieved
- action (int) – Action taken
- reward (int) – Reward recieved
-
update_params
(action: int, batch_size: int = 512, train_epochs: int = 20)[source]¶ Update parameters of the agent.
Trains each neural network in the ensemble.
Parameters: - action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None.
- batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512
- train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20