Contextual Bandit¶

Base¶

class genrl.agents.bandits.contextual.base.DCBAgent(bandit: genrl.core.bandit.Bandit, device: str = 'cpu', **kwargs)[source]¶

Bases: genrl.core.bandit.BanditAgent

Base class for deep contextual bandit solving agents

Parameters:	bandit (gennav.deep.bandit.data_bandits.DataBasedBandit) – The bandit to solve device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.

bandit¶

The bandit to solve

Type:	gennav.deep.bandit.data_bandits.DataBasedBandit

device¶

Device to use for tensor operations.

Type:	torch.device

select_action(context: torch.Tensor) → int[source]¶

Select an action based on given context

Parameters:	context (torch.Tensor) – The context vector to select action for

Note

This method needs to be implemented in the specific agent.

Returns:	The action to take
Return type:	int

update_parameters(action: Optional[int] = None, batch_size: Optional[int] = None, train_epochs: Optional[int] = None) → None[source]¶

Update parameters of the agent.

Parameters:	action (Optional[int], optional) – Action to update the parameters for. Defaults to None. batch_size (Optional[int], optional) – Size of batch to update parameters with. Defaults to None. train_epochs (Optional[int], optional) – Epochs to train neural network for. Defaults to None.

Note

This method needs to be implemented in the specific agent.

Bootstrap Neural¶

class genrl.agents.bandits.contextual.bootstrap_neural.BootstrapNeuralAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Bootstraped ensemble agentfor deep contextual bandits.

Parameters:

bandit (DataBasedBandit) – The bandit to solve
init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
n (int, optional) – Number of models in ensemble. Defaults to 10.
add_prob (float, optional) – Probability of adding a transition to a database. Defaults to 0.95.
device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.

select_action(context: torch.Tensor) → int[source]¶

Select an action based on given context.

Selects an action by computing a forward pass through a randomly selected network from the ensemble.

Parameters:	context (torch.Tensor) – The context vector to select action for.
Returns:	The action to take.
Return type:	int

update_db(context: torch.Tensor, action: int, reward: int)[source]¶

Updates transition database with given transition

The transition is added to each database with a certain probability.

Parameters:	context (torch.Tensor) – Context recieved action (int) – Action taken reward (int) – Reward recieved

update_params(action: Optional[int] = None, batch_size: int = 512, train_epochs: int = 20)[source]¶

Update parameters of the agent.

Trains each neural network in the ensemble.

Parameters:	action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None. batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512 train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20

Fixed¶

class genrl.agents.bandits.contextual.fixed.FixedAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, p: List[float] = None, device: str = 'cpu')[source]¶

Bases: genrl.agents.bandits.contextual.base.DCBAgent

select_action(context: torch.Tensor) → int[source]¶

Select an action based on fixed probabilities.

Parameters:	context (torch.Tensor) – The context vector to select action for. In this agent, context vector is not considered.
Returns:	The action to take.
Return type:	int

update_db(*args, **kwargs)[source]¶

update_params(*args, **kwargs)[source]¶

Linear Posterior¶

class genrl.agents.bandits.contextual.linpos.LinearPosteriorAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Deep contextual bandit agent using bayesian regression for posterior inference.

Parameters:

bandit (DataBasedBandit) – The bandit to solve
init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
lambda_prior (float, optional) – Guassian prior for linear model. Defaults to 0.25.
a0 (float, optional) – Inverse gamma prior for noise. Defaults to 6.0.
b0 (float, optional) – Inverse gamma prior for noise. Defaults to 6.0.
device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.

select_action(context: torch.Tensor) → int[source]¶

Select an action based on given context.

Selecting action with highest predicted reward computed through betas sampled from posterior.

Parameters:	context (torch.Tensor) – The context vector to select action for.
Returns:	The action to take.
Return type:	int

update_db(context: torch.Tensor, action: int, reward: int)[source]¶

Updates transition database with given transition

Parameters:	context (torch.Tensor) – Context recieved action (int) – Action taken reward (int) – Reward recieved

update_params(action: int, batch_size: int = 512, train_epochs: Optional[int] = None)[source]¶

Update parameters of the agent.

Updated the posterior over beta though bayesian regression.

Parameters:	action (int) – Action to update the parameters for. batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512 train_epochs (Optional[int], optional) – Epochs to train neural network for. Not applicable in this agent. Defaults to None

Neural Greedy¶

class genrl.agents.bandits.contextual.neural_greedy.NeuralGreedyAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Deep contextual bandit agent using epsilon greedy with a neural network.

Parameters:

bandit (DataBasedBandit) – The bandit to solve
init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
epsilon (float, optional) – Probability of selecting a random action. Defaults to 0.0.
device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.

select_action(context: torch.Tensor) → int[source]¶

Select an action based on given context.

Selects an action by computing a forward pass through network with an epsillon probability of selecting a random action.

Parameters:	context (torch.Tensor) – The context vector to select action for.
Returns:	The action to take.
Return type:	int

update_db(context: torch.Tensor, action: int, reward: int)[source]¶

Updates transition database with given transition

Parameters:	context (torch.Tensor) – Context recieved action (int) – Action taken reward (int) – Reward recieved

update_params(action: Optional[int] = None, batch_size: int = 512, train_epochs: int = 20)[source]¶

Update parameters of the agent.

Trains neural network.

Parameters:	action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None. batch_size (int, optional) – Size of batch to update parameters with. Defaults tp 512 train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20

Neural Linear Posterior¶

class genrl.agents.bandits.contextual.neural_linpos.NeuralLinearPosteriorAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Deep contextual bandit agent using bayesian regression on for posterior inference

A neural network is used to transform context vector to a latent represntation on which bayesian regression is performed.

Parameters:

bandit (DataBasedBandit) – The bandit to solve
init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
nn_update_ratio (int, optional) – . Defaults to 2.
lambda_prior (float, optional) – Guassian prior for linear model. Defaults to 0.25.
a0 (float, optional) – Inverse gamma prior for noise. Defaults to 3.0.
b0 (float, optional) – Inverse gamma prior for noise. Defaults to 3.0.
device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.

select_action(context: torch.Tensor) → int[source]¶

Select an action based on given context.

Selects an action by computing a forward pass through network to output a representation of the context on which bayesian linear regression is performed to select an action.

Parameters:	context (torch.Tensor) – The context vector to select action for.
Returns:	The action to take.
Return type:	int

update_db(context: torch.Tensor, action: int, reward: int)[source]¶

Updates transition database with given transition

Updates latent context and predicted rewards seperately.

Parameters:	context (torch.Tensor) – Context recieved action (int) – Action taken reward (int) – Reward recieved

update_params(action: int, batch_size: int = 512, train_epochs: int = 20)[source]¶

Update parameters of the agent.

Trains neural network and updates bayesian regression parameters.

Parameters:	action (int) – Action to update the parameters for. batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512 train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20

Neural Noise Sampling¶

class genrl.agents.bandits.contextual.neural_noise_sampling.NeuralNoiseSamplingAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Deep contextual bandit agent with noise sampling for neural network parameters.

Parameters:

bandit (DataBasedBandit) – The bandit to solve
init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
noise_std_dev (float, optional) – Standard deviation of sampled noise. Defaults to 0.05.
eps (float, optional) – Small constant for bounding KL divergece of noise. Defaults to 0.1.
noise_update_batch_size (int, optional) – Batch size for updating noise parameters. Defaults to 256.
device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.

select_action(context: torch.Tensor) → int[source]¶

Select an action based on given context.

Selects an action by adding noise to neural network paramters and the computing forward with the context vector as input.

Parameters:	context (torch.Tensor) – The context vector to select action for.
Returns:	The action to take
Return type:	int

update_db(context: torch.Tensor, action: int, reward: int)[source]¶

Updates transition database with given transition

Parameters:	context (torch.Tensor) – Context recieved action (int) – Action taken reward (int) – Reward recieved

update_params(action: Optional[int] = None, batch_size: int = 512, train_epochs: int = 20)[source]¶

Update parameters of the agent.

Trains each neural network in the ensemble.

Parameters:	action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None. batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512 train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20

Variational¶

class genrl.agents.bandits.contextual.variational.VariationalAgent(bandit: genrl.utils.data_bandits.base.DataBasedBandit, **kwargs)[source]¶

Bases: genrl.agents.bandits.contextual.base.DCBAgent

Deep contextual bandit agent using variation inference.

Parameters:

bandit (DataBasedBandit) – The bandit to solve
init_pulls (int, optional) – Number of times to select each action initially. Defaults to 3.
hidden_dims (List[int], optional) – Dimensions of hidden layers of network. Defaults to [50, 50].
init_lr (float, optional) – Initial learning rate. Defaults to 0.1.
lr_decay (float, optional) – Decay rate for learning rate. Defaults to 0.5.
lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to True.
max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping. Defaults to 0.5.
dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
eval_with_dropout (bool, optional) – Whether or not to use dropout at inference. Defaults to False.
noise_std (float, optional) – Standard deviation of noise in bayesian neural network. Defaults to 0.1.
device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.

select_action(context: torch.Tensor) → int[source]¶

Select an action based on given context.

Selects an action by computing a forward pass through the bayesian neural network.

Parameters:	context (torch.Tensor) – The context vector to select action for.
Returns:	The action to take.
Return type:	int

update_db(context: torch.Tensor, action: int, reward: int)[source]¶

Updates transition database with given transition

Parameters:	context (torch.Tensor) – Context recieved action (int) – Action taken reward (int) – Reward recieved

update_params(action: int, batch_size: int = 512, train_epochs: int = 20)[source]¶

Update parameters of the agent.

Trains each neural network in the ensemble.

Parameters:	action (Optional[int], optional) – Action to update the parameters for. Not applicable in this agent. Defaults to None. batch_size (int, optional) – Size of batch to update parameters with. Defaults to 512 train_epochs (int, optional) – Epochs to train neural network for. Defaults to 20