# Example Learning Environments

The Unity ML-Agents toolkit contains an expanding set of example environments
which demonstrate various features of the platform. Environments are located in
`UnitySDK/Assets/ML-Agents/Examples` and summarized below. Additionally, our
[first ML Challenge](https://connect.unity.com/challenges/ml-agents-1) contains
environments created by the community.

This page only overviews the example environments we provide. To learn more on
how to design and build your own environments see our [Making a New Learning
Environment](Learning-Environment-Create-New.md) page.

Note: Environment scenes marked as _optional_ do not have accompanying
pre-trained model files, and are designed to serve as challenges for
researchers.

If you would like to contribute environments, please see our
[contribution guidelines](../CONTRIBUTING.md) page.

## Basic

![Basic](images/basic.png)

* Set-up: A linear movement task where the agent must move left or right to
  rewarding states.
* Goal: Move to the most reward state.
* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function:
  * +0.1 for arriving at suboptimal state.
  * +1.0 for arriving at optimal state.
* Brains: One Brain with the following observation/action space.
  * Vector Observation space: One variable corresponding to current state.
  * Vector Action space: (Discrete) Two possible actions (Move left, move
    right).
  * Visual Observations: None
* Reset Parameters: None
* Benchmark Mean Reward: 0.94

## [3DBall: 3D Balance Ball](https://youtu.be/dheeCO29-EI)

![3D Balance Ball](images/balance.png)

* Set-up: A balance-ball task, where the agent balances the ball on it's head.
* Goal: The agent must balance the ball on it's head for as long as possible.
* Agents: The environment contains 12 agents of the same kind, all linked to a
  single Brain.
* Agent Reward Function:
  * +0.1 for every step the ball remains on it's head.
  * -1.0 if the ball falls off.
* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 8 variables corresponding to rotation of the agent cube,
    and position and velocity of ball.
  * Vector Observation space (Hard Version): 5 variables corresponding to
    rotation of the agent cube and position of ball.
  * Vector Action space: (Continuous) Size of 2, with one value corresponding to
    X-rotation, and the other to Z-rotation.
  * Visual Observations: None.
* Reset Parameters: Three
    * scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
      * Default: 1
      * Recommended Minimum: 0.2
      * Recommended Maximum: 5
    * gravity: Magnitude of gravity  
      * Default: 9.81
      * Recommended Minimum: 4
      * Recommended Maximum: 105
    * mass: Specifies mass of the ball
      * Default: 1
      * Recommended Minimum: 0.1
      * Recommended Maximum: 20
* Benchmark Mean Reward: 100

## [GridWorld](https://youtu.be/gu8HE9WKEVI)

![GridWorld](images/gridworld.png)

* Set-up: A version of the classic grid-world task. Scene contains agent, goal,
  and obstacles.
* Goal: The agent must navigate the grid to the goal while avoiding the
  obstacles.
* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function:
  * -0.01 for every step.
  * +1.0 if the agent navigates to the goal position of the grid (episode ends).
  * -1.0 if the agent navigates to an obstacle (episode ends).
* Brains: One Brain with the following observation/action space.
  * Vector Observation space: None
  * Vector Action space: (Discrete) Size of 4, corresponding to movement in
    cardinal directions. Note that for this environment,
    [action masking](Learning-Environment-Design-Agents.md#masking-discrete-actions)
    is turned on by default (this option can be toggled
    using the `Mask Actions` checkbox within the `trueAgent` GameObject).
    The trained model file provided was generated with action masking turned on.
  * Visual Observations: One corresponding to top-down view of GridWorld.
* Reset Parameters: Three, corresponding to grid size, number of obstacles, and
  number of goals.
* Benchmark Mean Reward: 0.8

## [Tennis](https://youtu.be/RDaIh7JX6RI)

![Tennis](images/tennis.png)

* Set-up: Two-player game where agents control rackets to bounce ball over a
  net.
* Goal: The agents must bounce ball between one another while not dropping or
  sending ball out of bounds.
* Agents: The environment contains two agent linked to a single Brain named
  TennisBrain. After training you can attach another Brain named MyBrain to one
  of the agent to play against your trained model.
* Agent Reward Function (independent):
  * +0.1 To agent when hitting ball over net.
  * -0.1 To agent who let ball hit their ground, or hit ball out of bounds.
* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 8 variables corresponding to position and velocity
    of ball and racket.
  * Vector Action space: (Continuous) Size of 2, corresponding to movement
    toward net or away from net, and jumping.
  * Visual Observations: None
* Reset Parameters: Three
    * angle: Angle of the racket from the vertical (Y) axis.
      * Default: 55
      * Recommended Minimum: 35 
      * Recommended Maximum: 65
    * gravity: Magnitude of gravity
      * Default: 9.81
      * Recommended Minimum: 6
      * Recommended Maximum: 20
    * scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
      * Default: 1
      * Recommended Minimum: 0.2
      * Recommended Maximum: 5
* Benchmark Mean Reward: 2.5
* Optional Imitation Learning scene: `TennisIL`.

## [Push Block](https://youtu.be/jKdw216ZgoE)

![Push](images/push.png)

* Set-up: A platforming environment where the agent can push a block around.
* Goal: The agent must push the block to the goal.
* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function:
  * -0.0025 for every step.
  * +1.0 if the block touches the goal.
* Brains: One Brain with the following observation/action space.
  * Vector Observation space: (Continuous) 70 variables corresponding to 14
    ray-casts each detecting one of three possible objects (wall, goal, or
    block).
  * Vector Action space: (Discrete) Size of 6, corresponding to turn clockwise
    and counterclockwise and move along four different face directions.
  * Visual Observations (Optional): One first-person camera. Use
    `VisualPushBlock` scene. __The visual observation version of
     this environment does not train with the provided default
     training parameters.__
* Reset Parameters: Four
    * block_scale: Scale of the block along the x and z dimensions
        * Default: 2
        * Recommended Minimum: 0.5
        * Recommended Maximum:  4
    * dynamic_friction: Coefficient of friction for the ground material acting on moving objects
        * Default: 0
        * Recommended Minimum: 0
        * Recommended Maximum: 1
    * static_friction: Coefficient of friction for the ground material acting on stationary objects
        * Default: 0
        * Recommended Minimum: 0
        * Recommended Maximum: 1
    * block_drag: Effect of air resistance on block
        * Default: 0.5
        * Recommended Minimum: 0
        * Recommended Maximum: 2000
* Benchmark Mean Reward: 4.5
* Optional Imitation Learning scene: `PushBlockIL`.

## [Wall Jump](https://youtu.be/NITLug2DIWQ)

![Wall](images/wall.png)

* Set-up: A platforming environment where the agent can jump over a wall.
* Goal: The agent must use the block to scale the wall and reach the goal.
* Agents: The environment contains one agent linked to two different Brains. The
  Brain the agent is linked to changes depending on the height of the wall.
* Agent Reward Function:
  * -0.0005 for every step.
  * +1.0 if the agent touches the goal.
  * -1.0 if the agent falls off the platform.
* Brains: Two Brains, each with the following observation/action space.
  * Vector Observation space: Size of 74, corresponding to 14 ray casts each
    detecting 4 possible objects. plus the global position of the agent and
    whether or not the agent is grounded.
  * Vector Action space: (Discrete) 4 Branches:
    * Forward Motion (3 possible actions: Forward, Backwards, No Action)
    * Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
    * Side Motion (3 possible actions: Left, Right, No Action)
    * Jump (2 possible actions: Jump, No Action)
  * Visual Observations: None
* Reset Parameters: Four
* Benchmark Mean Reward (Big & Small Wall Brain): 0.8

## [Reacher](https://youtu.be/2N9EoF6pQyE)

![Reacher](images/reacher.png)

* Set-up: Double-jointed arm which can move to target locations.
* Goal: The agents must move its hand to the goal location, and keep it there.
* Agents: The environment contains 10 agent linked to a single Brain.
* Agent Reward Function (independent):
  * +0.1 Each step agent's hand is in goal location.
* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 26 variables corresponding to position, rotation,
    velocity, and angular velocities of the two arm Rigidbodies.
  * Vector Action space: (Continuous) Size of 4, corresponding to torque
    applicable to two joints.
  * Visual Observations: None.
* Reset Parameters: Five
  * goal_size: radius of the goal zone
    * Default: 5
    * Recommended Minimum: 1
    * Recommended Maximum: 10
  * goal_speed: speed of the goal zone around the arm (in radians)
    * Default: 1
    * Recommended Minimum: 0.2
    * Recommended Maximum: 4
  * gravity
    * Default: 9.81
    * Recommended Minimum: 4
    * Recommended Maximum: 20
  * deviation: Magnitude of sinusoidal (cosine) deviation of the goal along the vertical dimension
    * Default: 0
    * Recommended Minimum: 0
    * Recommended Maximum: 5
  * deviation_freq: Frequency of the cosine deviation of the goal along the vertical dimension
    * Default: 0
    * Recommended Minimum: 0
    * Recommended Maximum: 3
* Benchmark Mean Reward: 30

## [Crawler](https://youtu.be/ftLliaeooYI)

![Crawler](images/crawler.png)

* Set-up: A creature with 4 arms and 4 forearms.
* Goal: The agents must move its body toward the goal direction without falling.
  * `CrawlerStaticTarget` - Goal direction is always forward.
  * `CrawlerDynamicTarget`- Goal direction is randomized.
* Agents: The environment contains 3 agent linked to a single Brain.
* Agent Reward Function (independent):
  * +0.03 times body velocity in the goal direction.
  * +0.01 times body direction alignment with goal direction.
* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 117 variables corresponding to position, rotation,
    velocity, and angular velocities of each limb plus the acceleration and
    angular acceleration of the body.
  * Vector Action space: (Continuous) Size of 20, corresponding to target
    rotations for joints.
  * Visual Observations: None
* Reset Parameters: None
* Benchmark Mean Reward for `CrawlerStaticTarget`: 2000
* Benchmark Mean Reward for `CrawlerDynamicTarget`: 400

## [Food Collector](https://youtu.be/heVMs3t9qSk)

![Collector](images/foodCollector.png)

* Set-up: A multi-agent environment where agents compete to collect food.
* Goal: The agents must learn to collect as many green food spheres as possible
  while avoiding red spheres.
* Agents: The environment contains 5 agents linked to a single Brain.
* Agent Reward Function (independent):
  * +1 for interaction with green spheres
  * -1 for interaction with red spheres
* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 53 corresponding to velocity of agent (2), whether
    agent is frozen and/or shot its laser (2), plus ray-based perception of
    objects around agent's forward direction (49; 7 raycast angles with 7
    measurements for each).
  * Vector Action space: (Discrete) 4 Branches:
    * Forward Motion (3 possible actions: Forward, Backwards, No Action)
    * Side Motion (3 possible actions: Left, Right, No Action)
    * Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
    * Laser (2 possible actions: Laser, No Action)
  * Visual Observations (Optional): First-person camera per-agent. Use
    `VisualFoodCollector` scene. __The visual observation version of
     this environment does not train with the provided default
     training parameters.__
* Reset Parameters: Two
  * laser_length: Length of the laser used by the agent
    * Default: 1
    * Recommended Minimum: 0.2
    * Recommended Maximum: 7
  * agent_scale: Specifies the scale of the agent in the 3 dimensions (equal across the three dimensions)
    * Default: 1
    * Recommended Minimum: 0.5
    * Recommended Maximum: 5
* Benchmark Mean Reward: 10
* Optional Imitation Learning scene: `FoodCollectorIL`.

## [Hallway](https://youtu.be/53GyfpPQRUQ)

![Hallway](images/hallway.png)

* Set-up: Environment where the agent needs to find information in a room,
  remember it, and use it to move to the correct goal.
* Goal: Move to the goal which corresponds to the color of the block in the
  room.
* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function (independent):
  * +1 For moving to correct goal.
  * -0.1 For moving to incorrect goal.
  * -0.0003 Existential penalty.
* Brains: One Brain with the following observation/action space:
  * Vector Observation space: 30 corresponding to local ray-casts detecting
    objects, goals, and walls.
  * Vector Action space: (Discrete) 1 Branch, 4 actions corresponding to agent
    rotation and forward/backward movement.
  * Visual Observations (Optional): First-person view for the agent. Use
    `VisualHallway` scene. __The visual observation version of
     this environment does not train with the provided default
     training parameters.__
* Reset Parameters: None
* Benchmark Mean Reward: 0.7
  * To speed up training, you can enable curiosity by adding `use_curiosity: true` in `config/trainer_config.yaml`
* Optional Imitation Learning scene: `HallwayIL`.

## [Bouncer](https://youtu.be/Tkv-c-b1b2I)

![Bouncer](images/bouncer.png)

* Set-up: Environment where the agent needs on-demand decision making. The agent
  must decide how perform its next bounce only when it touches the ground.
* Goal: Catch the floating green cube. Only has a limited number of jumps.
* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function (independent):
  * +1 For catching the green cube.
  * -1 For bouncing out of bounds.
  * -0.05 Times the action squared. Energy expenditure penalty.
* Brains: One Brain with the following observation/action space:
  * Vector Observation space: 6 corresponding to local position of agent and
    green cube.
  * Vector Action space: (Continuous) 3 corresponding to agent force applied for
    the jump.
  * Visual Observations: None
* Reset Parameters: Two
    * target_scale: The scale of the green cube in the 3 dimensions
        * Default: 150
        * Recommended Minimum: 50
        * Recommended Maximum: 250
* Benchmark Mean Reward: 10

## [Soccer Twos](https://youtu.be/Hg3nmYD3DjQ)

![SoccerTwos](images/soccer.png)

* Set-up: Environment where four agents compete in a 2 vs 2 toy soccer game.
* Goal:
  * Striker: Get the ball into the opponent's goal.
  * Goalie: Prevent the ball from entering its own goal.
* Agents: The environment contains four agents, with two linked to one Brain
  (strikers) and two linked to another (goalies).
* Agent Reward Function (dependent):
  * Striker:
    * +1 When ball enters opponent's goal.
    * -0.1 When ball enters own team's goal.
    * -0.001 Existential penalty.
  * Goalie:
    * -1 When ball enters team's goal.
    * +0.1 When ball enters opponents goal.
    * +0.001 Existential bonus.
* Brains: Two Brain with the following observation/action space:
  * Vector Observation space: 112 corresponding to local 14 ray casts, each
    detecting 7 possible object types, along with the object's distance.
    Perception is in 180 degree view from front of agent.
  * Vector Action space: (Discrete) One Branch
    * Striker: 6 actions corresponding to forward, backward, sideways movement,
      as well as rotation.
    * Goalie: 4 actions corresponding to forward, backward, sideways movement.
  * Visual Observations: None
* Reset Parameters: Two
  * ball_scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
    * Default: 7.5
    * Recommended minimum: 4
    * Recommended maximum: 10
  * gravity: Magnitude of the gravity
    * Default: 9.81
    * Recommended minimum: 6
    * Recommended maximum: 20
* Benchmark Mean Reward (Striker & Goalie Brain): 0 (the means will be inverse
  of each other and criss crosses during training) __Note that our trainer is currently unable to consistently train this environment__

## Walker

![Walker](images/walker.png)

* Set-up: Physics-based Humanoids agents with 26 degrees of freedom. These DOFs
  correspond to articulation of the following body-parts: hips, chest, spine,
  head, thighs, shins, feet, arms, forearms and hands.
* Goal: The agents must move its body toward the goal direction as quickly as
  possible without falling.
* Agents: The environment contains 11 independent agent linked to a single
  Brain.
* Agent Reward Function (independent):
  * +0.03 times body velocity in the goal direction.
  * +0.01 times head y position.
  * +0.01 times body direction alignment with goal direction.
  * -0.01 times head velocity difference from body velocity.
* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 215 variables corresponding to position, rotation,
    velocity, and angular velocities of each limb, along with goal direction.
  * Vector Action space: (Continuous) Size of 39, corresponding to target
    rotations applicable to the joints.
  * Visual Observations: None
* Reset Parameters: Four
    * gravity: Magnitude of gravity
        * Default: 9.81
        * Recommended Minimum:
        * Recommended Maximum:
    * hip_mass: Mass of the hip component of the walker
        * Default: 15
        * Recommended Minimum: 7
        * Recommended Maximum: 28
    * chest_mass: Mass of the chest component of the walker
        * Default: 8
        * Recommended Minimum: 3
        * Recommended Maximum: 20
    * spine_mass: Mass of the spine component of the walker
        * Default: 10
        * Recommended Minimum: 3
        * Recommended Maximum: 20
* Benchmark Mean Reward: 1000

## Pyramids

![Pyramids](images/pyramids.png)

* Set-up: Environment where the agent needs to press a button to spawn a
  pyramid, then navigate to the pyramid, knock it over, and move to the gold
  brick at the top.
* Goal: Move to the golden brick on top of the spawned pyramid.
* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function (independent):
  * +2 For moving to golden brick (minus 0.001 per step).
* Brains: One Brain with the following observation/action space:
  * Vector Observation space: 148 corresponding to local ray-casts detecting
    switch, bricks, golden brick, and walls, plus variable indicating switch
    state.
  * Vector Action space: (Discrete) 4 corresponding to agent rotation and
    forward/backward movement.
  * Visual Observations (Optional): First-person camera per-agent. Us
    `VisualPyramids` scene. __The visual observation version of
     this environment does not train with the provided default
     training parameters.__
* Reset Parameters: None
* Optional Imitation Learning scene: `PyramidsIL`.
* Benchmark Mean Reward: 1.75