# Environment Design Best Practices ## General * It is often helpful to being with the simplest version of the problem, to ensure the agent can learn it. From there increase complexity over time. * When possible, It is often helpful to ensure that you can complete the task by using a Player Brain to control the agent. ## Rewards * The magnitude of any given reward should typically not be greater than 1.0 in order to ensure a more stable learning process. * Positive rewards are often more helpful to shaping the desired behavior of an agent than negative rewards. * For locomotion tasks, a small positive reward (+0.1) for forward progress is typically used. * If you want the agent the finish a task quickly, it is often helpful to provide a small penalty every step (-0.1). ## States * The magnitude of each state variable should be normalized to around 1.0. * States should include all variables relevant to allowing the agent to take the optimally informed decision. * Categorical state variables such as type of object (Sword, Shield, Bow) should be encoded in one-hot fashion (ie `3` -> `0, 0, 1`). ## Actions * When using continuous control, action values should be clipped to an appropriate range.