Variational InferenceΒΆ
For an introduction to the Contextual Bandit problem, refer to Contextual Bandits Overview.
In this method, we try find a distribution \(P_{\theta}(r | \mathbf{x}, a)\) by minimising the KL divergence with the true distribution. For the model we take a neueral network where each weight is modelled by independant gaussians, also known as Bayesian Neural Nets.
An example of using a variational inference based agent in genrl
with bayesian net of hidden layer of 128 neurons and standard deviation
of 0.1 for al the weights -
from genrl.bandit import VariationalAgent, DCBTrainer
agent = VariationalAgent(bandit, hidden_dims=[128], noise_std=0.1, device="cuda")
trainer = DCBTrainer(agent, bandit)
trainer.train()
Refer to the VariationalAgent, and DCBTrainer docs for more details.