比较提交

...
此合并请求有变更与目标分支冲突。
/docs/Unity-Agents---Python-API.md
/python/unityagents/environment.py
/unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DDecision.cs
/docs/Agents-Editor-Interface.md
/docs/Getting-Started-with-Balance-Ball.md
/docs/Making-a-new-Unity-Environment.md
/docs/Organizing-the-Scene.md
/docs/Unity-Agents-Overview.md
/docs/best-practices.md
/docs/installation.md
/docs/Using-TensorFlow-Sharp-in-Unity-(Experimental).md
/python/ppo.py
/python/ppo/history.py
/python/ppo/trainer.py
/python/ppo/models.py
/python/README.md
/python/requirements.txt
/images/normalization.png

2 次代码提交

作者 SHA1 备注 提交日期
Arthur Juliani 57a9ed38 Require tensorflow 1.4.1 (#315) 7 年前
Arthur Juliani ce2ce437 Added growth parameter to stop failing with allocation under windows for #277 (#278) 7 年前
共有 18 个文件被更改,包括 153 次插入38 次删除
  1. 19
      unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DDecision.cs
  2. 2
      docs/Agents-Editor-Interface.md
  3. 16
      docs/Getting-Started-with-Balance-Ball.md
  4. 2
      docs/Making-a-new-Unity-Environment.md
  5. 2
      docs/Organizing-the-Scene.md
  6. 4
      docs/Unity-Agents---Python-API.md
  7. 2
      docs/Unity-Agents-Overview.md
  8. 10
      docs/best-practices.md
  9. 2
      docs/installation.md
  10. 2
      docs/Using-TensorFlow-Sharp-in-Unity-(Experimental).md
  11. 6
      python/ppo.py
  12. 15
      python/ppo/history.py
  13. 12
      python/ppo/trainer.py
  14. 25
      python/ppo/models.py
  15. 4
      python/unityagents/environment.py
  16. 2
      python/README.md
  17. 2
      python/requirements.txt
  18. 64
      images/normalization.png

19
unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DDecision.cs


{
if (gameObject.GetComponent<Brain>().brainParameters.actionSpaceType == StateType.continuous)
{
return new float[4]{ 0f, 0f, 0f, 0.0f };
List<float> ret = new List<float>();
if (state[2] < 0 || state[5] < 0)
{
ret.Add(state[5]);
}
else
{
ret.Add(state[5]);
}
if (state[3] < 0 || state[7] < 0)
{
ret.Add(-state[7]);
}
else
{
ret.Add(-state[7]);
}
return ret.ToArray();
}
else

2
docs/Agents-Editor-Interface.md


* `Action Descriptions` - A list of strings used to name the available actions for the Brain.
* `State Space Type` - Corresponds to whether state vector contains a single integer (Discrete) or a series of real-valued floats (Continuous).
* `Action Space Type` - Corresponds to whether action vector contains a single integer (Discrete) or a series of real-valued floats (Continuous).
* `Type of Brain` - Describes how Brain will decide actions.
* `Type of Brain` - Describes how the Brain will decide actions.
* `External` - Actions are decided using Python API.
* `Internal` - Actions are decided using internal TensorflowSharp model.
* `Player` - Actions are decided using Player input mappings.

16
docs/Getting-Started-with-Balance-Ball.md


1. Open `python/PPO.ipynb` notebook from Jupyter.
2. Set `env_name` to the name of your environment file earlier.
3. (optional) Set `run_path` directory to your choice.
4. Run all cells of notebook with the exception of the last one under "Export the trained Tensorflow graph."
3. (optional) In order to get the best results quickly, set `max_steps` to 50000, set `buffer_size` to 5000, and set `batch_size` to 512. For this exercise, this will train the model in approximately ~5-10 minutes.
4. (optional) Set `run_path` directory to your choice.
5. Run all cells of notebook with the exception of the last one under "Export the trained Tensorflow graph."
### Observing Training Progress
In order to observe the training process in more detail, you can use Tensorboard.

From Tensorboard, you will see the summary statistics of six variables:
* Cumulative Reward - The mean cumulative episode reward over all agents. Should increase during a successful training session.
* Value Loss - The mean loss of the value function update. Correlates to how well the model is able to predict the value of each state. This should decrease during a succesful training session.
* Policy Loss - The mean loss of the policy function update. Correlates to how much the policy (process for deciding actions) is changing. The magnitude of this should decrease during a succesful training session.
* Value Loss - The mean loss of the value function update. Correlates to how well the model is able to predict the value of each state. This should decrease during a successful training session.
* Policy Loss - The mean loss of the policy function update. Correlates to how much the policy (process for deciding actions) is changing. The magnitude of this should decrease during a successful training session.
* Episode Length - The mean length of each episode in the environment for all agents.
* Value Estimates - The mean value estimate for all states visited by the agent. Should increase during a successful training session.
* Policy Entropy - How random the decisions of the model are. Should slowly decrease during a successful training process. If it decreases too quickly, the `beta` hyperparameter should be increased.

Because TensorFlowSharp support is still experimental, it is disabled by default. In order to enable it, you must follow these steps. Please note that the `Internal` Brain mode will only be available once completing these steps.
1. Make sure you are using Unity 2017.1 or newer.
2. Make sure the TensorFlowSharp plugin is in your `Assets` folder. A Plugins folder which includes TF# can be downloaded [here](https://s3.amazonaws.com/unity-agents/0.2/TFSharpPlugin.unitypackage). Double click and import it once downloaded.
2. Make sure the TensorFlowSharp plugin is in your `Assets` folder. A Plugins folder which includes TF# can be downloaded [here](https://s3.amazonaws.com/unity-agents/0.2/TFSharpPlugin.unitypackage). Double click and import it once downloaded. You can see if this was successfully installed by checking the TensorFlow files in the Project tab under `Assets` -> `ML-Agents` -> `Plugins` -> `Computer`
3. In `Scripting Defined Symbols`, add the flag `ENABLE_TENSORFLOW`
5. Restart the Unity Editor.
3. In `Scripting Defined Symbols`, add the flag `ENABLE_TENSORFLOW`. After typing in, press Enter.
5. Go to `File` -> `Save Project`
6. Restart the Unity Editor.
### Embedding the trained model into Unity

2
docs/Making-a-new-Unity-Environment.md


1. Click on the game object **`YourNameAcademy`**.
2. In the inspector tab, you can modify the characteristics of the academy:
* **`Max Steps`** Maximum length of each episode (set to 0 if you want do not want the environment to reset after a certain time).
* **`Max Steps`** Maximum length of each episode (set to 0 if you do not want the environment to reset after a certain time).
* **`Wait Time`** Real-time between steps when running environment in test-mode.
* **`Frames To Skip`** Number of frames (or physics updates) to skip between steps. The agents will act at every frame but get new actions only at every step.
* **`Training Configuration`** and **`Inference Configuration`** The first defines the configuration of the Engine at training time and the second at test / inference time. The training mode corresponds only to external training when the reset parameter `train_model` was set to True. The adjustable parameters are as follows:

2
docs/Organizing-the-Scene.md


* Coordinating the Brains which must be set as children of the Academy.
#### Brains
Each brain corresponds to a specific Decision-making method. This often aligns with a specific neural network model. A Brains is responsible for deciding the action of all the Agents which are linked to it. There can be multiple brains in the same scene and multiple agents can subscribe to the same brain.
Each brain corresponds to a specific Decision-making method. This often aligns with a specific neural network model. The brain is responsible for deciding the action of all the Agents which are linked to it. There can be multiple brains in the same scene and multiple agents can subscribe to the same brain.
#### Agents
Each agent within a scene takes actions according to the decisions provided by it's linked Brain. There can be as many Agents of as many types as you like in the scene. The state size and action size of each agent must match the brain's parameters in order for the Brain to decide actions for it.

4
docs/Unity-Agents---Python-API.md


```python
from unityagents import UnityEnvironment
env = UnityEnvironment(file_name=filename, worker_num=0)
env = UnityEnvironment(file_name=filename, worker_id=0)
* `worker_num` indicates which port to use for communication with the environment. For use in parallel training regimes such as A3C.
* `worker_id` indicates which port to use for communication with the environment. For use in parallel training regimes such as A3C.
## Interacting with a Unity Environment

2
docs/Unity-Agents-Overview.md


![diagram](../images/agents_diagram.png)
A visual depiction of how an Learning Environment might be configured within ML-Agents.
A visual depiction of how a Learning Environment might be configured within ML-Agents.
The three main kinds of objects within any Agents Learning Environment are:

10
docs/best-practices.md


# Environment Design Best Practices
## General
* It is often helpful to being with the simplest version of the problem, to ensure the agent can learn it. From there increase
* It is often helpful to start with the simplest version of the problem, to ensure the agent can learn it. From there increase
* When possible, It is often helpful to ensure that you can complete the task by using a Player Brain to control the agent.
* When possible, it is often helpful to ensure that you can complete the task by using a Player Brain to control the agent.
* If you want the agent the finish a task quickly, it is often helpful to provide a small penalty every step (-0.05) that the agent does not complete the task. In this case completion of the task should also coincide with the end of the episode.
* If you want the agent to finish a task quickly, it is often helpful to provide a small penalty every step (-0.05) that the agent does not complete the task. In this case completion of the task should also coincide with the end of the episode.
* Rotation information on GameObjects should be recorded as `state.Add(transform.rotation.eulerAngles.y/180.0f-1.0f);` rather than `state.Add(transform.rotation.y);`.
* Besides encoding non-numeric values, all inputs should be normalized to be in the range 0 to +1 (or -1 to 1). For example rotation information on GameObjects should be recorded as `state.Add(transform.rotation.eulerAngles.y/180.0f-1.0f);` rather than `state.Add(transform.rotation.y);`. See the equation below for one approach of normaliztaion.
![normalization](../images/normalization.png)
## Actions
* When using continuous control, action values should be clipped to an appropriate range.

2
docs/installation.md


* TensorFlow (1.0+) (Training)
### Installing Dependencies
To install dependencies, go into the `python` sub-directory of the repositroy, and run (depending on your python version) from the command line:
To install dependencies, go into the `python` sub-directory of the repository, and run (depending on your python version) from the command line:
`pip install .`

2
docs/Using-TensorFlow-Sharp-in-Unity-(Experimental).md


In your C# script :
At the top, add the line
```csharp
using Tensorflow;
using TensorFlow;
```
If you will be building for android, you must add this block at the start of your code :
```csharp

6
python/ppo.py


return None
else:
return None
with tf.Session() as sess:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
# Instantiate model parameters
if load_model:
print('Loading Model...')

15
python/ppo/history.py


import numpy as np
history_keys = ['states', 'observations', 'actions', 'rewards', 'action_probs', 'epsilons',
history_keys = ['states', 'actions', 'rewards', 'action_probs', 'epsilons',
'value_estimates', 'advantages', 'discounted_returns']

"""
for key in history_keys:
agent_dict[key] = []
for i, _ in enumerate(key for key in agent_dict.keys() if key.startswith('observations')):
agent_dict['observations%d' % i] = []
return agent_dict

:return: dictionary of numpy arrays.
"""
for key in history_keys:
agent_dict[key] = np.array(agent_dict[key])
for key in (key for key in agent_dict.keys() if key.startswith('observations')):
agent_dict[key] = np.array(agent_dict[key])
return agent_dict

history_dict[agent] = empty_local_history(history_dict[agent])
history_dict[agent]['cumulative_reward'] = 0
history_dict[agent]['episode_steps'] = 0
for i, _ in enumerate(agent_info.observations):
history_dict[agent]['observations%d' % i] = []
return history_dict

"""
for key in history_keys:
global_buffer[key] = np.concatenate([global_buffer[key], local_buffer[key]], axis=0)
for key in (key for key in local_buffer.keys() if key.startswith('observations')):
global_buffer[key] = np.concatenate([global_buffer[key], local_buffer[key]], axis=0)
return global_buffer

"""
for key in history_keys:
global_buffer[key] = np.copy(local_buffer[key])
for key in (key for key in local_buffer.keys() if key.startswith('observations')):
global_buffer[key] = np.array(local_buffer[key])
return global_buffer

s = np.arange(global_buffer[history_keys[2]].shape[0])
np.random.shuffle(s)
for key in history_keys:
if len(global_buffer[key]) > 0:
global_buffer[key] = global_buffer[key][s]
for key in (key for key in global_buffer.keys() if key.startswith('observations')):
if len(global_buffer[key]) > 0:
global_buffer[key] = global_buffer[key][s]
return global_buffer

12
python/ppo/trainer.py


epsi = np.random.randn(len(info.states), env.brains[brain_name].action_space_size)
feed_dict[self.model.epsilon] = epsi
if self.use_observations:
feed_dict[self.model.observation_in] = np.vstack(info.observations)
for i, _ in enumerate(info.observations):
feed_dict[self.model.observation_in[i]] = info.observations[i]
if self.use_states:
feed_dict[self.model.state_in] = info.states
if self.is_training and env.brains[brain_name].state_space_type == "continuous" and self.use_states and normalize:

idx = info.agents.index(agent)
if not info.local_done[idx]:
if self.use_observations:
history['observations'].append([info.observations[0][idx]])
for i, _ in enumerate(info.observations):
history['observations%d' % i].append([info.observations[i][idx]])
if self.use_states:
history['states'].append(info.states[idx])
if self.is_continuous:

else:
feed_dict = {self.model.batch_size: len(info.states)}
if self.use_observations:
feed_dict[self.model.observation_in] = np.vstack(info.observations)
for i in range(self.info.observations):
feed_dict[self.model.observation_in[i]] = info.observations[i]
if self.use_states:
feed_dict[self.model.state_in] = info.states
value_next = self.sess.run(self.model.value, feed_dict)[l]

if self.use_states:
feed_dict[self.model.state_in] = np.vstack(training_buffer['states'][start:end])
if self.use_observations:
feed_dict[self.model.observation_in] = np.vstack(training_buffer['observations'][start:end])
for i, _ in enumerate(self.model.observation_in):
feed_dict[self.model.observation_in[i]] = np.vstack(training_buffer['observations%d' % i][start:end])
v_loss, p_loss, _ = self.sess.run([self.model.value_loss, self.model.policy_loss,
self.model.update_batch], feed_dict=feed_dict)
total_v += v_loss

25
python/ppo/models.py


class PPOModel(object):
def __init__(self):
self.normalize = False
self.observation_in = []
def create_global_steps(self):
"""Creates TF ops to track and increment global training step."""

else:
c_channels = 3
self.observation_in = tf.placeholder(shape=[None, o_size_h, o_size_w, c_channels], dtype=tf.float32,
name='observation_0')
self.observation_in.append(tf.placeholder(shape=[None, o_size_h, o_size_w, c_channels], dtype=tf.float32,
name='observation_%d' % len(self.observation_in)))
self.conv1 = tf.layers.conv2d(self.observation_in, 16, kernel_size=[8, 8], strides=[4, 4],
self.conv1 = tf.layers.conv2d(self.observation_in[-1], 16, kernel_size=[8, 8], strides=[4, 4],
use_bias=False, activation=activation)
self.conv2 = tf.layers.conv2d(self.conv1, 32, kernel_size=[4, 4], strides=[2, 2],
use_bias=False, activation=activation)

self.create_reward_encoder()
hidden_state, hidden_visual, hidden_policy, hidden_value = None, None, None, None
encoders = []
height_size, width_size = brain.camera_resolutions[0]['height'], brain.camera_resolutions[0]['width']
bw = brain.camera_resolutions[0]['blackAndWhite']
hidden_visual = self.create_visual_encoder(height_size, width_size, bw, h_size, 2, tf.nn.tanh, num_layers)
for i in range(brain.number_observations):
height_size, width_size = brain.camera_resolutions[i]['height'], brain.camera_resolutions[i]['width']
bw = brain.camera_resolutions[i]['blackAndWhite']
encoders.append(self.create_visual_encoder(height_size, width_size, bw, h_size, 2, tf.nn.tanh, num_layers))
hidden_visual = tf.concat(encoders, axis=2)
if brain.state_space_size > 0:
s_size = brain.state_space_size
if brain.state_space_type == "continuous":

hidden_state, hidden_visual, hidden = None, None, None
if brain.number_observations > 0:
height_size, width_size = brain.camera_resolutions[0]['height'], brain.camera_resolutions[0]['width']
bw = brain.camera_resolutions[0]['blackAndWhite']
hidden_visual = self.create_visual_encoder(height_size, width_size, bw, h_size, 1, tf.nn.elu, num_layers)[0]
encoders = []
for i in range(brain.number_observations):
height_size, width_size = brain.camera_resolutions[i]['height'], brain.camera_resolutions[i]['width']
bw = brain.camera_resolutions[i]['blackAndWhite']
encoders.append(self.create_visual_encoder(height_size, width_size, bw, h_size, 1, tf.nn.elu, num_layers)[0])
hidden_visual = tf.concat(encoders, axis=1)
if brain.state_space_size > 0:
s_size = brain.state_space_size
if brain.state_space_type == "continuous":

4
python/unityagents/environment.py


candidates = glob.glob(os.path.join(cwd, file_name + '.app', 'Contents', 'MacOS', true_filename))
if len(candidates) == 0:
candidates = glob.glob(os.path.join(file_name + '.app', 'Contents', 'MacOS', true_filename))
if len(candidates) == 0:
candidates = glob.glob(os.path.join(cwd, file_name + '.app', 'Contents', 'MacOS', '*'))
if len(candidates) == 0:
candidates = glob.glob(os.path.join(file_name + '.app', 'Contents', 'MacOS', '*'))
if len(candidates) > 0:
launch_string = candidates[0]
elif platform == 'win32':

2
python/README.md


* numpy
* Pillow
* Python (2 or 3)
* Tensorflow (1.0+)
* Tensorflow (1.4)
### Installing Dependencies
To install dependencies, run:

2
python/requirements.txt


tensorflow>=1.0
tensorflow==1.4.1
Pillow>=4.2.1
matplotlib
numpy>=1.11.0

64
images/normalization.png

之前 之后
宽度: 697  |  高度: 104  |  大小: 14 KiB
正在加载...
取消
保存