Q-Learning¶

genrl.agents.classical.qlearning.qlearning module¶

class genrl.agents.classical.qlearning.qlearning.QLearning(env: gym.core.Env, epsilon: float = 0.9, gamma: float = 0.95, lr: float = 0.01)[source]¶

Bases: object

Q-Learning Algorithm.

Paper- https://link.springer.com/article/10.1007/BF00992698

env¶

Environment with which agent interacts.

Type:	gym.Env

epsilon¶

exploration coefficient for epsilon-greedy exploration.

Type:	float, optional

gamma¶

discount factor.

Type:	float, optional

lr¶

learning rate for optimizer.

Type:	float, optional

get_action(state: numpy.ndarray, explore: bool = True) → numpy.ndarray[source]¶

Epsilon greedy selection of epsilon in the explore phase.

Parameters:	state (np.ndarray) – Environment state. explore (bool, optional) – True if exploration is required. False if not.
Returns:	action.
Return type:	np.ndarray

get_hyperparams() → Dict[str, Any][source]¶

update(transition: Tuple) → None[source]¶

Update the Q table.

Parameters:	transition (Tuple) – transition 4-tuple used to update Q-table. In the form (state, action, reward, next_state)