# Linear Posterior Inference¶

For an introduction to the Contextual Bandit problem, refer to Contextual Bandits Overview.

In this agent we assume a linear relationship between context and reward distribution of the form

$Y = X^T \beta + \epsilon \ \ \text{where} \ \epsilon \sim \mathcal{N}(0, \sigma^2)$

We can utilise bayesian linear regression to find the parameters $$\beta$$ and $$\sigma$$. Since our agent is continually learning, the parameters of the model will being updated according the ($$\mathbf{x}$$, $$a$$, $$r$$) transitions it observes.

For more complex non linear relations, we can make use of neural networks to transform the context into a learned embedding space. The above method can then be used on this latent embedding to model the reward.

An example of using a neural network based linear posterior agent in genrl -

from genrl.bandit import NeuralLinearPosteriorAgent, DCBTrainer

agent = NeuralLinearPosteriorAgent(bandit, lambda_prior=0.5, a0=2, b0=2, device="cuda")

trainer = DCBTrainer(agent, bandit)
trainer.train()

Note that the priors here are used to parameterise the initial distribution over $$\beta$$ and $$\sigma$$. More specificaly lambda_prior is used to parameterise a guassian distribution for $$\beta$$ while a0 and b0 are paramters of an inverse gamma distribution over $$\sigma^2$$. These are updated over the course of exploring a bandit. More details can be found in Section 3 of this paper.

All hyperparameters can be tuned for individual use cases to improve training efficiency and achieve convergence faster.

Refer to the LinearPosteriorAgent, NeuralLinearPosteriorAgent and DCBTrainer docs for more details.