Environments

Submodules

genrl.environments.action_wrappers module

class genrl.environments.action_wrappers.ClipAction(env: Union[gym.core.Env, genrl.environments.vec_env.vector_envs.VecEnv])[source]

Bases: gym.core.ActionWrapper

Action Wrapper to clip actions

Parameters:env (object) – The environment whose actions need to be clipped
action(action: numpy.ndarray) → numpy.ndarray[source]
class genrl.environments.action_wrappers.RescaleAction(env: Union[gym.core.Env, genrl.environments.vec_env.vector_envs.VecEnv], low: int, high: int)[source]

Bases: gym.core.ActionWrapper

Action Wrapper to rescale actions

Parameters:
  • env (object) – The environment whose actions need to be rescaled
  • low (int) – Lower limit of action
  • high (int) – Upper limit of action
action(action: numpy.ndarray) → numpy.ndarray[source]

genrl.environments.atari_preprocessing module

class genrl.environments.atari_preprocessing.AtariPreprocessing(env: gym.core.Env, frameskip: Union[Tuple, int] = (2, 5), grayscale: bool = True, screen_size: int = 84)[source]

Bases: gym.core.Wrapper

Implementation for Image preprocessing for Gym Atari environments. Implements: 1) Frameskip 2) Grayscale 3) Downsampling to square image

param env:Atari environment
param frameskip:
 Number of steps between actions. E.g. frameskip=4 will mean 1 action will be taken for every 4 frames. It’ll be a tuple
if non-deterministic and a random number will be chosen from (2, 5)
param grayscale:
 Whether or not the output should be converted to grayscale
param screen_size:
 Size of the output screen (square output)
type env:Gym Environment
type frameskip:tuple or int
type grayscale:boolean
type screen_size:
 int
reset() → numpy.ndarray[source]

Resets state of environment

Returns:Initial state
Return type:NumPy array
step(action: numpy.ndarray) → numpy.ndarray[source]

Step through Atari environment for given action

Parameters:action (NumPy array) – Action taken by agent
Returns:Current state, reward(for frameskip number of actions), done, info

genrl.environments.atari_wrappers module

class genrl.environments.atari_wrappers.FireReset(env: gym.core.Env)[source]

Bases: gym.core.Wrapper

Some Atari environments do not actually do anything until a specific action (the fire action) is taken, so we make it take the action before starting the training process

Parameters:env (Gym Environment) – Atari environment
reset() → numpy.ndarray[source]

Resets state of environment. Performs the noop action a random number of times to introduce stochasticity

Returns:Initial state
Return type:NumPy array
class genrl.environments.atari_wrappers.NoopReset(env: gym.core.Env, max_noops: int = 30)[source]

Bases: gym.core.Wrapper

Some Atari environments always reset to the same state. So we take a random number of some empty (noop) action to introduce some stochasticity.

Parameters:
  • env (Gym Environment) – Atari environment
  • max_noops (int) – Maximum number of Noops to be taken
reset() → numpy.ndarray[source]

Resets state of environment. Performs the noop action a random number of times to introduce stochasticity

Returns:Initial state
Return type:NumPy array
step(action: numpy.ndarray) → numpy.ndarray[source]

Step through underlying Atari environment for given action

Parameters:action (NumPy array) – Action taken by agent
Returns:Current state, reward(for frameskip number of actions), done, info

genrl.environments.base_wrapper module

class genrl.environments.base_wrapper.BaseWrapper(env: Any, batch_size: int = None)[source]

Bases: abc.ABC

Base class for all wrappers

batch_size

The number of batches trained per update

close() → None[source]

Closes environment and performs any other cleanup

Must be overridden by subclasses

render() → None[source]

Render the environment

reset() → None[source]

Resets state of environment

Must be overriden by subclasses

Returns:Initial state
seed(seed: int = None) → None[source]

Set seed for environment

step(action: numpy.ndarray) → None[source]

Step through the environment

Must be overriden by subclasses

genrl.environments.frame_stack module

class genrl.environments.frame_stack.FrameStack(env: gym.core.Env, framestack: int = 4, compress: bool = True)[source]

Bases: gym.core.Wrapper

Wrapper to stack the last few(4 by default) observations of agent efficiently

Parameters:
  • env (Gym Environment) – Environment to be wrapped
  • framestack (int) – Number of frames to be stacked
  • compress (bool) – True if we want to use LZ4 compression to conserve memory usage
reset() → numpy.ndarray[source]

Resets environment

Returns:Initial state of environment
Return type:NumPy Array
step(action: numpy.ndarray) → numpy.ndarray[source]

Steps through environment

Parameters:action (NumPy Array) – Action taken by agent
Returns:Next state, reward, done, info
Return type:NumPy Array, float, boolean, dict
class genrl.environments.frame_stack.LazyFrames(frames: List[T], compress: bool = False)[source]

Bases: object

Efficient data structure to save each frame only once. Can use LZ4 compression to optimizer memory usage.

Parameters:
  • frames (collections.deque) – List of frames that needs to converted to a LazyFrames data structure
  • compress (boolean) – True if we want to use LZ4 compression to conserve memory usage
shape

Returns dimensions of other object

genrl.environments.gym_wrapper module

class genrl.environments.gym_wrapper.GymWrapper(env: gym.core.Env)[source]

Bases: gym.core.Wrapper

Wrapper class for all Gym Environments

Parameters:
  • env (string) – Gym environment name
  • n_envs (None, int) – Number of environments. None if not vectorised
  • parallel (boolean) – If vectorised, should environments be run through serially or parallelly
action_shape
close() → None[source]

Closes environment

obs_shape
render(mode: str = 'human') → None[source]

Renders all envs in a tiles format similar to baselines.

Parameters:mode (string) – Can either be ‘human’ or ‘rgb_array’. Displays tiled images in ‘human’ and returns tiled images in ‘rgb_array’
reset() → numpy.ndarray[source]

Resets environment

Returns:Initial state
sample() → numpy.ndarray[source]

Shortcut method to directly sample from environment’s action space

Returns:Random action from action space
Return type:NumPy Array
seed(seed: int = None) → None[source]

Set environment seed

Parameters:seed (int) – Value of seed
step(action: numpy.ndarray) → numpy.ndarray[source]

Steps the env through given action

Parameters:action (NumPy array) – Action taken by agent
Returns:Next observation, reward, game status and debugging info

genrl.environments.suite module

genrl.environments.suite.AtariEnv(env_id: str, wrapper_list: List[T] = [<class 'genrl.environments.atari_preprocessing.AtariPreprocessing'>, <class 'genrl.environments.atari_wrappers.NoopReset'>, <class 'genrl.environments.atari_wrappers.FireReset'>, <class 'genrl.environments.time_limit.AtariTimeLimit'>, <class 'genrl.environments.frame_stack.FrameStack'>]) → gym.core.Env[source]

Function to apply wrappers for all Atari envs by Trainer class

Parameters:
  • env (string) – Environment Name
  • wrapper_list (list or tuple) – List of wrappers to use
Returns:

Gym Atari Environment

Return type:

object

genrl.environments.suite.GymEnv(env_id: str) → gym.core.Env[source]

Function to apply wrappers for all regular Gym envs by Trainer class

Parameters:env (string) – Environment Name
Returns:Gym Environment
Return type:object
genrl.environments.suite.VectorEnv(env_id: str, n_envs: int = 2, parallel: int = False, env_type: str = 'gym') → genrl.environments.vec_env.vector_envs.VecEnv[source]

Chooses the kind of Vector Environment that is required

param env_id:Gym environment to be vectorised
param n_envs:Number of environments
param parallel:True if we want environments to run parallely and (
subprocesses, False if we want environments to run serially one after the other)
param env_type:Type of environment. Currently, we support [“gym”, “atari”]
type env_id:string
type n_envs:int
type parallel:False
type env_type:string
returns:Vector Environment
rtype:object

genrl.environments.time_limit module

class genrl.environments.time_limit.AtariTimeLimit(env, max_episode_len=None)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:the initial observation.
Return type:observation (object)
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:action (object) – an action provided by the agent
Returns:agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:observation (object)
class genrl.environments.time_limit.TimeLimit(env, max_episode_len=None)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:the initial observation.
Return type:observation (object)
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:action (object) – an action provided by the agent
Returns:agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:observation (object)

Module contents