SARSA¶

genrl.agents.classical.sarsa.sarsa module¶

class genrl.agents.classical.sarsa.sarsa.SARSA(env: gym.core.Env, epsilon: float = 0.9, lmbda: float = 0.9, gamma: float = 0.95, lr: float = 0.01)[source]¶

Bases: object

SARSA Algorithm.

Paper- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.2539&rep=rep1&type=pdf

env¶

Environment with which agent interacts.

Type:	gym.Env

epsilon¶

exploration coefficient for epsilon-greedy exploration.

Type:	float, optional

gamma¶

discount factor.

Type:	float, optional

lr¶

learning rate for optimizer.

Type:	float, optional

get_action(state: numpy.ndarray, explore: bool = True) → numpy.ndarray[source]¶

Epsilon greedy selection of epsilon in the explore phase.

Parameters:	state (np.ndarray) – Environment state. explore (bool, optional) – True if exploration is required. False if not.
Returns:	action.
Return type:	np.ndarray

update(transition: Tuple) → None[source]¶

Update the Q table and e values

Parameters:	transition (Tuple) – transition 4-tuple used to update Q-table. In the form (state, action, reward, next_state)