Bandit Common¶
genrl.bandit.core¶
genrl.bandit.trainer¶
genrl.bandit.agents.cb_agents.common.base_model¶
-
class
genrl.agents.bandits.contextual.common.base_model.
Model
(layer, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
,abc.ABC
Bayesian Neural Network used in Deep Contextual Bandit Models.
Parameters: - context_dim (int) – Length of context vector.
- hidden_dims (List[int], optional) – Dimensions of hidden layers of network.
- n_actions (int) – Number of actions that can be selected. Taken as length of output vector for network to predict.
- init_lr (float, optional) – Initial learning rate.
- max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping.
- lr_decay (float, optional) – Decay rate for learning rate.
- lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to False.
- dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
- noise_std (float) – Standard deviation of noise used in the network. Defaults to 0.1
-
use_dropout
¶ Indicated whether or not dropout should be used in forward pass.
Type: int
-
forward
(context: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][source]¶ Computes forward pass through the network.
Parameters: context (torch.Tensor) – The context vector to perform forward pass on. Returns: Dictionary of outputs Return type: Dict[str, torch.Tensor]
-
train_model
(db: genrl.agents.bandits.contextual.common.transition.TransitionDB, epochs: int, batch_size: int)[source]¶ Trains the network on a given database for given epochs and batch_size.
Parameters: - db (TransitionDB) – The database of transitions to train on.
- epochs (int) – Number of gradient steps to take.
- batch_size (int) – The size of each batch to perform gradient descent on.
genrl.bandit.agents.cb_agents.common.bayesian¶
-
class
genrl.agents.bandits.contextual.common.bayesian.
BayesianLinear
(in_features: int, out_features: int, bias: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
Linear Layer for Bayesian Neural Networks.
Parameters: - in_features (int) – size of each input sample
- out_features (int) – size of each output sample
- bias (bool, optional) – Whether to use an additive bias. Defaults to True.
-
forward
(x: torch.Tensor, kl: bool = True, frozen: bool = False) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶ Apply linear transormation to input.
The weight and bias is sampled for each forward pass from a normal distribution. The KL divergence of the sampled weigth and bias can also be computed if specified.
Parameters: - x (torch.Tensor) – Input to be transformed
- kl (bool, optional) – Whether to compute the KL divergence. Defaults to True.
- frozen (bool, optional) – Whether to freeze current parameters. Defaults to False.
Returns: - The transformed input and optionally
the computed KL divergence value.
Return type: Tuple[torch.Tensor, Optional[torch.Tensor]]
-
class
genrl.agents.bandits.contextual.common.bayesian.
BayesianNNBanditModel
(**kwargs)[source]¶ Bases:
genrl.agents.bandits.contextual.common.base_model.Model
Bayesian Neural Network used in Deep Contextual Bandit Models.
Parameters: - context_dim (int) – Length of context vector.
- hidden_dims (List[int], optional) – Dimensions of hidden layers of network.
- n_actions (int) – Number of actions that can be selected. Taken as length of output vector for network to predict.
- init_lr (float, optional) – Initial learning rate.
- max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping.
- lr_decay (float, optional) – Decay rate for learning rate.
- lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to False.
- dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
- noise_std (float) – Standard deviation of noise used in the network. Defaults to 0.1
-
use_dropout
¶ Indicated whether or not dropout should be used in forward pass.
Type: int
genrl.bandit.agents.cb_agents.common.neural¶
-
class
genrl.agents.bandits.contextual.common.neural.
NeuralBanditModel
(**kwargs)[source]¶ Bases:
genrl.agents.bandits.contextual.common.base_model.Model
Neural Network used in Deep Contextual Bandit Models.
Parameters: - context_dim (int) – Length of context vector.
- hidden_dims (List[int], optional) – Dimensions of hidden layers of network.
- n_actions (int) – Number of actions that can be selected. Taken as length of output vector for network to predict.
- init_lr (float, optional) – Initial learning rate.
- max_grad_norm (float, optional) – Maximum norm of gradients for gradient clipping.
- lr_decay (float, optional) – Decay rate for learning rate.
- lr_reset (bool, optional) – Whether to reset learning rate ever train interval. Defaults to False.
- dropout_p (Optional[float], optional) – Probability for dropout. Defaults to None which implies dropout is not to be used.
-
use_dropout
¶ Indicated whether or not dropout should be used in forward pass.
Type: bool
genrl.bandit.agents.cb_agents.common.transition¶
-
class
genrl.agents.bandits.contextual.common.transition.
TransitionDB
(device: Union[str, torch.device] = 'cpu')[source]¶ Bases:
object
Database for storing (context, action, reward) transitions.
Parameters: device (str) – Device to use for tensor operations. “cpu” for cpu or “cuda” for cuda. Defaults to “cpu”. -
db
¶ Dictionary containing list of transitions.
Type: dict
-
db_size
¶ Number of transitions stored in database.
Type: int
-
device
¶ Device to use for tensor operations.
Type: torch.device
-
add
(context: torch.Tensor, action: int, reward: int)[source]¶ Add (context, action, reward) transition to database
Parameters: - context (torch.Tensor) – Context recieved
- action (int) – Action taken
- reward (int) – Reward recieved
-
get_data
(batch_size: Optional[int] = None) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶ Get a batch of transition from database
Parameters: batch_size (Union[int, None], optional) – Size of batch required. Defaults to None which implies all transitions in the database are to be included in batch. Returns: - Tuple of stacked
- contexts, actions, rewards tensors.
Return type: Tuple[torch.Tensor, torch.Tensor, torch.Tensor]
-
get_data_for_action
(action: int, batch_size: Optional[int] = None) → Tuple[torch.Tensor, torch.Tensor][source]¶ Get a batch of transition from database for a given action.
Parameters: - action (int) – The action to sample transitions for.
- batch_size (Union[int, None], optional) – Size of batch required. Defaults to None which implies all transitions in the database are to be included in batch.
Returns: - Tuple of stacked
contexts and rewards tensors.
Return type: Tuple[torch.Tensor, torch.Tensor]
-