Bandit Common¶

genrl.bandit.core¶

genrl.bandit.trainer¶

genrl.bandit.agents.cb_agents.common.base_model¶

class genrl.agents.bandits.contextual.common.base_model.Model(layer, **kwargs)[source]¶

Bases: torch.nn.modules.module.Module, abc.ABC

Bayesian Neural Network used in Deep Contextual Bandit Models.

Parameters:

context_dim (int) – Length of context vector.
hidden_dims (List[int], optional) – Dimensions of hidden layers of network.
n_actions (int) – Number of actions that can be selected. Taken as length of output vector for network to predict.
init_lr (float, optional) – Initial learning rate.
max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping.
lr_decay (float, optional) – Decay rate for learning rate.
lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to False.
dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
noise_std (float) – Standard deviation of noise used in the network. Defaults to 0.1

use_dropout¶

Indicated whether or not dropout should be used in forward pass.

Type:	int

forward(context: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][source]¶

Computes forward pass through the network.

Parameters:	context (torch.Tensor) – The context vector to perform forward pass on.
Returns:	Dictionary of outputs
Return type:	Dict[str, torch.Tensor]

train_model(db: genrl.agents.bandits.contextual.common.transition.TransitionDB, epochs: int, batch_size: int)[source]¶

Trains the network on a given database for given epochs and batch_size.

Parameters:	db (TransitionDB) – The database of transitions to train on. epochs (int) – Number of gradient steps to take. batch_size (int) – The size of each batch to perform gradient descent on.

genrl.bandit.agents.cb_agents.common.bayesian¶

class genrl.agents.bandits.contextual.common.bayesian.BayesianLinear(in_features: int, out_features: int, bias: bool = True)[source]¶

Bases: torch.nn.modules.module.Module

Linear Layer for Bayesian Neural Networks.

Parameters:	in_features (int) – size of each input sample out_features (int) – size of each output sample bias (bool, optional) – Whether to use an additive bias. Defaults to True.

forward(x: torch.Tensor, kl: bool = True, frozen: bool = False) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶

Apply linear transormation to input.

The weight and bias is sampled for each forward pass from a normal distribution. The KL divergence of the sampled weigth and bias can also be computed if specified.

Parameters:

x (torch.Tensor) – Input to be transformed
kl (bool, optional) – Whether to compute the KL divergence. Defaults to True.
frozen (bool, optional) – Whether to freeze current parameters. Defaults to False.

Returns:

The transformed input and optionally: the computed KL divergence value.

Return type:

Tuple[torch.Tensor, Optional[torch.Tensor]]

reset_parameters() → None[source]¶: Resets weight and bias parameters of the layer.

class genrl.agents.bandits.contextual.common.bayesian.BayesianNNBanditModel(**kwargs)[source]¶

Bases: genrl.agents.bandits.contextual.common.base_model.Model

Bayesian Neural Network used in Deep Contextual Bandit Models.

Parameters:

context_dim (int) – Length of context vector.
hidden_dims (List[int], optional) – Dimensions of hidden layers of network.
n_actions (int) – Number of actions that can be selected. Taken as length of output vector for network to predict.
init_lr (float, optional) – Initial learning rate.
max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping.
lr_decay (float, optional) – Decay rate for learning rate.
lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to False.
dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
noise_std (float) – Standard deviation of noise used in the network. Defaults to 0.1

use_dropout¶

Indicated whether or not dropout should be used in forward pass.

Type:	int

forward(context: torch.Tensor, kl: bool = True) → Dict[str, torch.Tensor][source]¶

Computes forward pass through the network.

Parameters:	context (torch.Tensor) – The context vector to perform forward pass on.
Returns:	Dictionary of outputs
Return type:	Dict[str, torch.Tensor]

genrl.bandit.agents.cb_agents.common.neural¶

class genrl.agents.bandits.contextual.common.neural.NeuralBanditModel(**kwargs)[source]¶

Bases: genrl.agents.bandits.contextual.common.base_model.Model

Neural Network used in Deep Contextual Bandit Models.

Parameters:

context_dim (int) – Length of context vector.
hidden_dims (List[int], optional) – Dimensions of hidden layers of network.
n_actions (int) – Number of actions that can be selected. Taken as length of output vector for network to predict.
init_lr (float, optional) – Initial learning rate.
max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping.
lr_decay (float, optional) – Decay rate for learning rate.
lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to False.
dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.

use_dropout¶

Indicated whether or not dropout should be used in forward pass.

Type:	bool

forward(context: torch.Tensor) → Dict[str, torch.Tensor][source]¶

Computes forward pass through the network.

Parameters:	context (torch.Tensor) – The context vector to perform forward pass on.
Returns:	Dictionary of outputs
Return type:	Dict[str, torch.Tensor]

genrl.bandit.agents.cb_agents.common.transition¶

class genrl.agents.bandits.contextual.common.transition.TransitionDB(device: Union[str, torch.device] = 'cpu')[source]¶

Bases: object

Database for storing (context, action, reward) transitions.

Parameters:	device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”.

db¶

Dictionary containing list of transitions.

Type:	dict

db_size¶

Number of transitions stored in database.

Type:	int

device¶

Device to use for tensor operations.

Type:	torch.device

add(context: torch.Tensor, action: int, reward: int)[source]¶

Add (context, action, reward) transition to database

Parameters:	context (torch.Tensor) – Context recieved action (int) – Action taken reward (int) – Reward recieved

get_data(batch_size: Optional[int] = None) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Get a batch of transition from database

Parameters:	batch_size (Union[int, None], optional) – Size of batch required. Defaults to None which implies all transitions in the database are to be included in batch.
Returns:	Tuple of stacked contexts, actions, rewards tensors.
Return type:	Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

get_data_for_action(action: int, batch_size: Optional[int] = None) → Tuple[torch.Tensor, torch.Tensor][source]¶

Get a batch of transition from database for a given action.

Parameters:

action (int) – The action to sample transitions for.
batch_size (Union[int, None], optional) – Size of batch required. Defaults to None which implies all transitions in the database are to be included in batch.

Returns:

Tuple of stacked: contexts and rewards tensors.

Return type:

Tuple[torch.Tensor, torch.Tensor]