Contextual Bandit

Base

class genrl.agents.bandits.contextual.base.DCBAgent(bandit: genrl.core.bandit.Bandit, device: str = 'cpu', **kwargs)[source]

Bases: genrl.core.bandit.BanditAgent

Base class for deep contextual bandit solving agents

Parameters:
  • bandit (gennav.deep.bandit.data_bandits.DataBasedBandit) – The bandit to solve
  • device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
bandit

The bandit to solve

Type:gennav.deep.bandit.data_bandits.DataBasedBandit
device

Device to use for tensor operations.

Type:torch.device
select_action(context: torch.Tensor) → int[source]

Select an action based on given context

Parameters:context (torch.Tensor) – The context vector to select action for

Note

This method needs to be implemented in the specific agent.

Returns:The action to take
Return type:int
update_parameters(action: Optional[int] = None, batch_size: Optional[int] = None, train_epochs: Optional[int] = None) → None[source]

Update parameters of the agent.

Parameters:
  • action (Optional[int], optional) – Action to update the parameters for. Defaults to None.
  • batch_size (Optional[int], optional) – Size of batch to update parameters with. Defaults to None.
  • train_epochs (Optional[int], optional) – Epochs to train neural network for. Defaults to None.

Note

This method needs to be implemented in the specific agent.

Bootstrap Neural

class genrl.agents.bandits.contextual.bootstrap_neural.BootstrapNeuralAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Bootstraped ensemble agentfor deep contextual bandits.

Parameters:
  • bandit (DataBasedBandit) – The bandit to solve
  • init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
  • hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
  • init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
  • lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
  • lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
  • max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
  • dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
  • eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
  • n (int, optional) – Number of models in ensemble. Defaults to 10.
  • add_prob (float, optional) – Probability of adding a transition to a database. Defaults to 0.95.
  • device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
select_action(context: torch.Tensor) → int[source]

Select an action based on given context.

Selects an action by computing a forward pass through a randomly selected network from the ensemble.

Parameters:context (torch.Tensor) – The context vector to select action for.
Returns:The action to take.
Return type:int
update_db(context: torch.Tensor, action: int, reward: int)[source]

Updates transition database with given transition

The transition is added to each database with a certain probability.

Parameters:
  • context (torch.Tensor) – Context recieved
  • action (int) – Action taken
  • reward (int) – Reward recieved
update_params(action: Optional[int] = None, batch_size: int = 512, train_epochs: int = 20)[source]

Update parameters of the agent.

Trains each neural network in the ensemble.

Parameters:
  • action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None.
  • batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512
  • train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20

Fixed

class genrl.agents.bandits.contextual.fixed.FixedAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, p: List[float] = None, device: str = 'cpu')[source]

Bases: genrl.agents.bandits.contextual.base.DCBAgent

select_action(context: torch.Tensor) → int[source]

Select an action based on fixed probabilities.

Parameters:context (torch.Tensor) – The context vector to select action for. In this agent, context vector is not considered.
Returns:The action to take.
Return type:int
update_db(*args, **kwargs)[source]
update_params(*args, **kwargs)[source]

Linear Posterior

class genrl.agents.bandits.contextual.linpos.LinearPosteriorAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Deep contextual bandit agent using bayesian regression for posterior inference.

Parameters:
  • bandit (DataBasedBandit) – The bandit to solve
  • init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
  • lambda_prior (float, optional) – Guassian prior for linear model. Defaults to 0.25.
  • a0 (float, optional) – Inverse gamma prior for noise. Defaults to 6.0.
  • b0 (float, optional) – Inverse gamma prior for noise. Defaults to 6.0.
  • device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
select_action(context: torch.Tensor) → int[source]

Select an action based on given context.

Selecting action with highest predicted reward computed through betas sampled from posterior.

Parameters:context (torch.Tensor) – The context vector to select action for.
Returns:The action to take.
Return type:int
update_db(context: torch.Tensor, action: int, reward: int)[source]

Updates transition database with given transition

Parameters:
  • context (torch.Tensor) – Context recieved
  • action (int) – Action taken
  • reward (int) – Reward recieved
update_params(action: int, batch_size: int = 512, train_epochs: Optional[int] = None)[source]

Update parameters of the agent.

Updated the posterior over beta though bayesian regression.

Parameters:
  • action (int) – Action to update the parameters for.
  • batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512
  • train_epochs (Optional[int], optional) – Epochs to train neural network for. Not applicable in this agent. Defaults to None

Neural Greedy

class genrl.agents.bandits.contextual.neural_greedy.NeuralGreedyAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Deep contextual bandit agent using epsilon greedy with a neural network.

Parameters:
  • bandit (DataBasedBandit) – The bandit to solve
  • init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
  • hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
  • init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
  • lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
  • lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
  • max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
  • dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
  • eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
  • epsilon (float, optional) – Probability of selecting a random action. Defaults to 0.0.
  • device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
select_action(context: torch.Tensor) → int[source]

Select an action based on given context.

Selects an action by computing a forward pass through network with an epsillon probability of selecting a random action.

Parameters:context (torch.Tensor) – The context vector to select action for.
Returns:The action to take.
Return type:int
update_db(context: torch.Tensor, action: int, reward: int)[source]

Updates transition database with given transition

Parameters:
  • context (torch.Tensor) – Context recieved
  • action (int) – Action taken
  • reward (int) – Reward recieved
update_params(action: Optional[int] = None, batch_size: int = 512, train_epochs: int = 20)[source]

Update parameters of the agent.

Trains neural network.

Parameters:
  • action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None.
  • batch_size (int, optional) – Size of batch to update parameters with. Defaults tp 512
  • train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20

Neural Linear Posterior

class genrl.agents.bandits.contextual.neural_linpos.NeuralLinearPosteriorAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Deep contextual bandit agent using bayesian regression on for posterior inference

A neural network is used to transform context vector to a latent represntation on which bayesian regression is performed.

Parameters:
  • bandit (DataBasedBandit) – The bandit to solve
  • init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
  • hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
  • init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
  • lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
  • lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
  • max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
  • dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
  • eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
  • nn_update_ratio (int, optional) – . Defaults to 2.
  • lambda_prior (float, optional) – Guassian prior for linear model. Defaults to 0.25.
  • a0 (float, optional) – Inverse gamma prior for noise. Defaults to 3.0.
  • b0 (float, optional) – Inverse gamma prior for noise. Defaults to 3.0.
  • device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
select_action(context: torch.Tensor) → int[source]

Select an action based on given context.

Selects an action by computing a forward pass through network to output a representation of the context on which bayesian linear regression is performed to select an action.

Parameters:context (torch.Tensor) – The context vector to select action for.
Returns:The action to take.
Return type:int
update_db(context: torch.Tensor, action: int, reward: int)[source]

Updates transition database with given transition

Updates latent context and predicted rewards seperately.

Parameters:
  • context (torch.Tensor) – Context recieved
  • action (int) – Action taken
  • reward (int) – Reward recieved
update_params(action: int, batch_size: int = 512, train_epochs: int = 20)[source]

Update parameters of the agent.

Trains neural network and updates bayesian regression parameters.

Parameters:
  • action (int) – Action to update the parameters for.
  • batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512
  • train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20

Neural Noise Sampling

class genrl.agents.bandits.contextual.neural_noise_sampling.NeuralNoiseSamplingAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Deep contextual bandit agent with noise sampling for neural network parameters.

Parameters:
  • bandit (DataBasedBandit) – The bandit to solve
  • init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
  • hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
  • init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
  • lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
  • lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
  • max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
  • dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
  • eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
  • noise_std_dev (float, optional) – Standard deviation of sampled noise. Defaults to 0.05.
  • eps (float, optional) – Small constant for bounding KL divergece of noise. Defaults to 0.1.
  • noise_update_batch_size (int, optional) – Batch size for updating noise parameters. Defaults to 256.
  • device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
select_action(context: torch.Tensor) → int[source]

Select an action based on given context.

Selects an action by adding noise to neural network paramters and the computing forward with the context vector as input.

Parameters:context (torch.Tensor) – The context vector to select action for.
Returns:The action to take
Return type:int
update_db(context: torch.Tensor, action: int, reward: int)[source]

Updates transition database with given transition

Parameters:
  • context (torch.Tensor) – Context recieved
  • action (int) – Action taken
  • reward (int) – Reward recieved
update_params(action: Optional[int] = None, batch_size: int = 512, train_epochs: int = 20)[source]

Update parameters of the agent.

Trains each neural network in the ensemble.

Parameters:
  • action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None.
  • batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512
  • train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20

Variational

class genrl.agents.bandits.contextual.variational.VariationalAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Deep contextual bandit agent using variation inference.

Parameters:
  • bandit (DataBasedBandit) – The bandit to solve
  • init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
  • hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
  • init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
  • lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
  • lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
  • max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
  • dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
  • eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
  • noise_std (float, optional) – Standard deviation of noise in bayesian neural network. Defaults to 0.1.
  • device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
select_action(context: torch.Tensor) → int[source]

Select an action based on given context.

Selects an action by computing a forward pass through the bayesian neural network.

Parameters:context (torch.Tensor) – The context vector to select action for.
Returns:The action to take.
Return type:int
update_db(context: torch.Tensor, action: int, reward: int)[source]

Updates transition database with given transition

Parameters:
  • context (torch.Tensor) – Context recieved
  • action (int) – Action taken
  • reward (int) – Reward recieved
update_params(action: int, batch_size: int = 512, train_epochs: int = 20)[source]

Update parameters of the agent.

Trains each neural network in the ensemble.

Parameters:
  • action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None.
  • batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512
  • train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20