Q-Learning

genrl.agents.classical.qlearning.qlearning module

class genrl.agents.classical.qlearning.qlearning.QLearning(env: gym.core.Env, epsilon: float = 0.9, gamma: float = 0.95, lr: float = 0.01)[source]

Bases: object

Q-Learning Algorithm.

Paper- https://link.springer.com/article/10.1007/BF00992698

env

Environment with which agent interacts.

Type:gym.Env
epsilon

exploration coefficient for epsilon-greedy exploration.

Type:float, optional
gamma

discount factor.

Type:float, optional
lr

learning rate for optimizer.

Type:float, optional
get_action(state: numpy.ndarray, explore: bool = True) → numpy.ndarray[source]

Epsilon greedy selection of epsilon in the explore phase.

Parameters:
  • state (np.ndarray) – Environment state.
  • explore (bool, optional) – True if exploration is required. False if not.
Returns:

action.

Return type:

np.ndarray

get_hyperparams() → Dict[str, Any][source]
update(transition: Tuple) → None[source]

Update the Q table.

Parameters:transition (Tuple) – transition 4-tuple used to update Q-table. In the form (state, action, reward, next_state)