SARSA

genrl.agents.classical.sarsa.sarsa module

class genrl.agents.classical.sarsa.sarsa.SARSA(env: gym.core.Env, epsilon: float = 0.9, lmbda: float = 0.9, gamma: float = 0.95, lr: float = 0.01)[source]

Bases: object

SARSA Algorithm.

Paper- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.2539&rep=rep1&type=pdf

env

Environment with which agent interacts.

Type:gym.Env
epsilon

exploration coefficient for epsilon-greedy exploration.

Type:float, optional
gamma

discount factor.

Type:float, optional
lr

learning rate for optimizer.

Type:float, optional
get_action(state: numpy.ndarray, explore: bool = True) → numpy.ndarray[source]

Epsilon greedy selection of epsilon in the explore phase.

Parameters:
  • state (np.ndarray) – Environment state.
  • explore (bool, optional) – True if exploration is required. False if not.
Returns:

action.

Return type:

np.ndarray

update(transition: Tuple) → None[source]

Update the Q table and e values

Parameters:transition (Tuple) – transition 4-tuple used to update Q-table. In the form (state, action, reward, next_state)