concepts.benchmark.blocksworld.blocksworld_env.DenseStackBlockWorldEnv#
- class DenseStackBlockWorldEnv[source]#
- Bases: - StackBlockWorldEnv- Methods - close()- Override close in your subclass to perform any necessary cleanup. - render([mode])- Renders the environment. - reset()- Reset the environment. - reset_nr_blocks(nr_blocks)- Reset the number of blocks. - seed([seed])- Sets the seed for this env's random number generator(s). - step(action)- Run one timestep of the environment's dynamics. - Attributes - Initializes the np_random field if not done already. - Get the number of objects in the environment. - Completely unwrap this env. - The current blocksworld. - Whether the current episode is over. - The result of the current episode. - The height of the highest block towel. - __init__(nr_blocks, random_order=False, prob_unchanged=0.0, prob_fall=0.0, np_random=None, seed=None)#
- Initialize the blocksworld environment. - Parameters:
- nr_blocks (int) – number of blocks. 
- random_order (bool) – randomly permute the indexes of the blocks. This option prevents the models from memorizing the configurations. 
- prob_unchanged (float) – the probability of not changing the state. 
- prob_fall (float) – the probability of falling to the ground. 
- np_random (RandomState | None) 
- seed (int | None) 
 
 
 - __new__(**kwargs)#
 - close()#
- Override close in your subclass to perform any necessary cleanup. - Environments will automatically close() themselves when garbage collected or when the program exits. 
 - get_current_state()#
 - render(mode='human')#
- Renders the environment. - The set of supported modes varies per environment. (And some third-party environments may not support rendering at all.) By convention, if mode is: - human: render to the current display or terminal and return nothing. Usually for human consumption. 
- rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video. 
- ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors). 
 - Note - Make sure that your class’s metadata ‘render_modes’ key includes
- the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method. 
 - Parameters:
- mode (str) – the mode to render with 
 - Example: - class MyEnv(Env):
- metadata = {‘render_modes’: [‘human’, ‘rgb_array’]} - def render(self, mode=’human’):
- if mode == ‘rgb_array’:
- return np.array(…) # return RGB frame suitable for video 
- elif mode == ‘human’:
- … # pop up a window and render 
- else:
- super(MyEnv, self).render(mode=mode) # just raise an exception 
 
 
 
 - reset()[source]#
- Reset the environment. This function first generates a random blocksworld, and then returns the current state. 
 - seed(seed=None)#
- Sets the seed for this env’s random number generator(s). - Note - Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators. - Returns:
- Returns the list of seeds used in this env’s random
- number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example. 
 
- Return type:
- list<bigint> 
 
 - step(action)#
- Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state. - Accepts an action and returns a tuple (observation, reward, done, info). - Parameters:
- action – an action provided by the environment 
- Returns:
- agent’s observation of the current environment reward: amount of reward returned after previous action done: whether the episode has ended, in which case further step() calls will return undefined results info: contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) 
- Return type:
- observation 
 
 - property action_space#
 - cached_result: Tuple[float, bool] | None#
- The result of the current episode. It is a tuple of (reward, is_over). 
 - metadata = {'render_modes': []}#
 - property np_random: RandomState#
- Initializes the np_random field if not done already. 
 - property nr_objects#
- Get the number of objects in the environment. 
 - property observation_space#
 - reward_range = (-inf, inf)#
 - spec = None#
 - property unwrapped: Env#
- Completely unwrap this env. - Returns:
- The base non-wrapped gym.Env instance 
- Return type:
- gym.Env 
 
 - world: BlockWorld#
- The current blocksworld.