Bandit Common

genrl.bandit.core

genrl.bandit.trainer

genrl.bandit.agents.cb_agents.common.base_model

class genrl.agents.bandits.contextual.common.base_model.Model(layer, **kwargs)[source]

Bases: torch.nn.modules.module.Module, abc.ABC

Bayesian Neural Network used in Deep Contextual Bandit Models.

Parameters:
  • context_dim (int) – Length of context vector.
  • hidden_dims (List[int], optional) – Dimensions of hidden layers of network.
  • n_actions (int) – Number of actions that can be selected. Taken as length of output vector for network to predict.
  • init_lr (float, optional) – Initial learning rate.
  • max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping.
  • lr_decay (float, optional) – Decay rate for learning rate.
  • lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to False.
  • dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
  • noise_std (float) – Standard deviation of noise used in the network. Defaults to 0.1
use_dropout

Indicated whether or not dropout should be used in forward pass.

Type:int
forward(context: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][source]

Computes forward pass through the network.

Parameters:context (torch.Tensor) – The context vector to perform forward pass on.
Returns:Dictionary of outputs
Return type:Dict[str, torch.Tensor]
train_model(db: genrl.agents.bandits.contextual.common.transition.TransitionDB, epochs: int, batch_size: int)[source]

Trains the network on a given database for given epochs and batch_size.

Parameters:
  • db (TransitionDB) – The database of transitions to train on.
  • epochs (int) – Number of gradient steps to take.
  • batch_size (int) – The size of each batch to perform gradient descent on.

genrl.bandit.agents.cb_agents.common.bayesian

class genrl.agents.bandits.contextual.common.bayesian.BayesianLinear(in_features: int, out_features: int, bias: bool = True)[source]

Bases: torch.nn.modules.module.Module

Linear Layer for Bayesian Neural Networks.

Parameters:
  • in_features (int) – size of each input sample
  • out_features (int) – size of each output sample
  • bias (bool, optional) – Whether to use an additive bias. Defaults to True.
forward(x: torch.Tensor, kl: bool = True, frozen: bool = False) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]

Apply linear transormation to input.

The weight and bias is sampled for each forward pass from a normal distribution. The KL divergence of the sampled weigth and bias can also be computed if specified.

Parameters:
  • x (torch.Tensor) – Input to be transformed
  • kl (bool, optional) – Whether to compute the KL divergence. Defaults to True.
  • frozen (bool, optional) – Whether to freeze current parameters. Defaults to False.
Returns:

The transformed input and optionally

the computed KL divergence value.

Return type:

Tuple[torch.Tensor, Optional[torch.Tensor]]

reset_parameters() → None[source]

Resets weight and bias parameters of the layer.

class genrl.agents.bandits.contextual.common.bayesian.BayesianNNBanditModel(**kwargs)[source]

Bases: genrl.agents.bandits.contextual.common.base_model.Model

Bayesian Neural Network used in Deep Contextual Bandit Models.

Parameters:
  • context_dim (int) – Length of context vector.
  • hidden_dims (List[int], optional) – Dimensions of hidden layers of network.
  • n_actions (int) – Number of actions that can be selected. Taken as length of output vector for network to predict.
  • init_lr (float, optional) – Initial learning rate.
  • max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping.
  • lr_decay (float, optional) – Decay rate for learning rate.
  • lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to False.
  • dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
  • noise_std (float) – Standard deviation of noise used in the network. Defaults to 0.1
use_dropout

Indicated whether or not dropout should be used in forward pass.

Type:int
forward(context: torch.Tensor, kl: bool = True) → Dict[str, torch.Tensor][source]

Computes forward pass through the network.

Parameters:context (torch.Tensor) – The context vector to perform forward pass on.
Returns:Dictionary of outputs
Return type:Dict[str, torch.Tensor]

genrl.bandit.agents.cb_agents.common.neural

class genrl.agents.bandits.contextual.common.neural.NeuralBanditModel(**kwargs)[source]

Bases: genrl.agents.bandits.contextual.common.base_model.Model

Neural Network used in Deep Contextual Bandit Models.

Parameters:
  • context_dim (int) – Length of context vector.
  • hidden_dims (List[int], optional) – Dimensions of hidden layers of network.
  • n_actions (int) – Number of actions that can be selected. Taken as length of output vector for network to predict.
  • init_lr (float, optional) – Initial learning rate.
  • max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping.
  • lr_decay (float, optional) – Decay rate for learning rate.
  • lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to False.
  • dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
use_dropout

Indicated whether or not dropout should be used in forward pass.

Type:bool
forward(context: torch.Tensor) → Dict[str, torch.Tensor][source]

Computes forward pass through the network.

Parameters:context (torch.Tensor) – The context vector to perform forward pass on.
Returns:Dictionary of outputs
Return type:Dict[str, torch.Tensor]

genrl.bandit.agents.cb_agents.common.transition

class genrl.agents.bandits.contextual.common.transition.TransitionDB(device: Union[str, torch.device] = 'cpu')[source]

Bases: object

Database for storing (context, action, reward) transitions.

Parameters:device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.
db

Dictionary containing list of transitions.

Type:dict
db_size

Number of transitions stored in database.

Type:int
device

Device to use for tensor operations.

Type:torch.device
add(context: torch.Tensor, action: int, reward: int)[source]

Add (context, action, reward) transition to database

Parameters:
  • context (torch.Tensor) – Context recieved
  • action (int) – Action taken
  • reward (int) – Reward recieved
get_data(batch_size: Optional[int] = None) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Get a batch of transition from database

Parameters:batch_size (Union[int, None], optional) – Size of batch required. Defaults to None which implies all transitions in the database are to be included in batch.
Returns:
Tuple of stacked
contexts, actions, rewards tensors.
Return type:Tuple[torch.Tensor, torch.Tensor, torch.Tensor]
get_data_for_action(action: int, batch_size: Optional[int] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Get a batch of transition from database for a given action.

Parameters:
  • action (int) – The action to sample transitions for.
  • batch_size (Union[int, None], optional) – Size of batch required. Defaults to None which implies all transitions in the database are to be included in batch.
Returns:

Tuple of stacked

contexts and rewards tensors.

Return type:

Tuple[torch.Tensor, torch.Tensor]