浏览代码

Develop docs azure (#744)

* First draft of Azure support docs

* Correcting links to other docs

* Adding additional links and cleaning instructions

* Adding references to Azure docs in other appropriate places
/develop-generalizationTraining-TrainerController
Arthur Juliani 6 年前
当前提交
898874c7
共有 6 个文件被更改,包括 174 次插入0 次删除
  1. 5
      docs/ML-Agents-Overview.md
  2. 1
      docs/Readme.md
  3. 10
      docs/localized/zh-CN/docs/ML-Agents-Overview.md
  4. 1
      docs/localized/zh-CN/docs/Readme.md
  5. 76
      docs/Training-on-Microsoft-Azure-Custom-Instance.md
  6. 81
      docs/Training-on-Microsoft-Azure.md

5
docs/ML-Agents-Overview.md


on how to set-up EC2 instances in addition to a public pre-configured Amazon
Machine Image (AMI).
* **Cloud Training on Microsoft Azure** - To facilitate using ML-Agents on
Azure machines, we provide a
[guide](Training-on-Microsoft-Azure.md)
on how to set-up virtual machine instances in addition to a pre-configured data science image.
## Summary and Next Steps
To briefly summarize: ML-Agents enables games and simulations built in Unity

1
docs/Readme.md


* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training with LSTM](Feature-Memory.md)
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
## Help

10
docs/localized/zh-CN/docs/ML-Agents-Overview.md


让您了解如何设置 EC2 实例以及公共的预配置 Amazon
Machine Image (AMI)。
* **Microsoft Azure 上的云训练** - 为了便于在 Microsoft Azure
机器上使用 ML-Agents,我们提供了一份
[指南](/docs/Training-on-Microsoft-Azure.md)
让您了解如何设置 virtual machine instance 实例以及公共的预配置 Data Science VM。
* **Cloud Training on Microsoft Azure** - To facilitate using ML-Agents on
Azure machines, we provide a
[guide](Training-on-Microsoft-Azure.md)
on how to set-up virtual machine instances in addition to a pre-configured data science image.
## 总结和后续步骤
简要总结一下:ML-Agents 使 Unity 中构建的游戏和模拟环境

1
docs/localized/zh-CN/docs/Readme.md


* [Imitation Learning(模仿学习)训练要点](/docs/Training-Imitation-Learning.md)
* [LSTM 训练要点](/docs/Feature-Memory.md)
* [如何使用 Amazon Web Services 进行云端训练](/docs/Training-on-Amazon-Web-Service.md)
* [如何使用 Microsoft Azure 进行云端训练](/docs/Training-on-Microsoft-Azure.md)
* [如何使用 TensorBoard 观察训练过程](/docs/Using-Tensorboard.md)
## 帮助

76
docs/Training-on-Microsoft-Azure-Custom-Instance.md


# Setting up a Custom Instance on Microsoft Azure for Training
This page contains instructions for setting up a custom Virtual Machine on Microsoft Azure so you can running ML-Agents training in the cloud.
1. Start by [deploying an Azure VM](https://docs.microsoft.com/azure/virtual-machines/linux/quick-create-portal) with Ubuntu Linux (tests were done with 16.04 LTS). To use GPU support, use a N-Series VM.
2. SSH into your VM.
3. Start with the following commands to install the Nvidia driver:
```
wget http://us.download.nvidia.com/tesla/375.66/nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb
sudo dpkg -i nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-drivers
sudo reboot
```
4. After a minute you should be able to reconnect to your VM and install the CUDA toolkit:
```
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-8-0
```
5. You'll next need to download cuDNN from the Nvidia developer site. This requires a registered account.
6. Navigate to [http://developer.nvidia.com](http://developer.nvidia.com) and create an account and verify it.
7. Download (to your own computer) cuDNN from [this url](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v6/prod/8.0_20170307/Ubuntu16_04_x64/libcudnn6_6.0.20-1+cuda8.0_amd64-deb).
8. Copy the deb package to your VM: ```scp libcudnn6_6.0.21-1+cuda8.0_amd64.deb <VMUserName>@<VMIPAddress>:libcudnn6_6.0.21-1+cuda8.0_amd64.deb ```
9. SSH back to your VM and execute the following:
```
sudo dpkg -i libcudnn6_6.0.21-1+cuda8.0_amd64.deb
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH
. ~/.profile
sudo reboot
```
10. After a minute, you should be able to SSH back into your VM. After doing so, run the following:
```
sudo apt install python-pip
sudo apt install python3-pip
```
11. At this point, you need to install TensorFlow. The version you install should be tied to if you are using GPU to train:
```
pip3 install tensorflow-gpu==1.4.0 keras==2.0.6
```
Or CPU to train:
```
pip3 install tensorflow==1.4.0 keras==2.0.6
```
12. You'll then need to install additional dependencies:
```
pip3 install pillow
pip3 install numpy
pip3 install docopt
```
13. You can now return to the [main Azure instruction page](Training-on-Microsoft-Azure.md).

81
docs/Training-on-Microsoft-Azure.md


# Training on Microsoft Azure
This page contains instructions for setting up training on Microsoft Azure through either [Azure Container Instances](https://azure.microsoft.com/services/container-instances/) or Virtual Machines. Non "headless" training has not yet been tested to verify support.
## Pre-Configured Azure Virtual Machine
A pre-configured virtual machine image is available in the Azure Marketplace and is nearly compltely ready for training. You can start by deploying the [Data Science Virtual Machine for Linux (Ubuntu)](https://azuremarketplace.microsoft.com/marketplace/apps/microsoft-ads.linux-data-science-vm-ubuntu) into your Azure subscription. Once your VM is deployed, SSH into it and run the following command to complete dependency installation:
```
pip install docopt
```
Note that, if you choose to deploy the image to an [N-Series GPU optimized VM](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-gpu), training will, by default, run on the GPU. If you choose any other type of VM, training will run on the CPU.
## Configuring your own Instance
Setting up your own instance requires a number of package installations. Please view the documentation for doing so [here](Training-on-Microsoft-Azure-Custom-Instance.md).
## Installing ML-Agents
2. [Move](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/copy-files-to-linux-vm-using-scp) the `python` sub-folder of this ml-agents repo to the remote Azure instance, and set it as the working directory.
2. Install the required packages with `pip3 install .`.
## Testing
To verify that all steps worked correctly:
1. In the Unity Editor, load a project containing an ML-Agents environment (you can use one of the example environments if you have not created your own).
2. Open the Build Settings window (menu: File > Build Settings).
3. Select Linux as the Target Platform, and x86_64 as the target architecture.
4. Check Headless Mode.
5. Click Build to build the Unity environment executable.
6. Upload the resulting files to your Azure instance.
7. Test the instance setup from Python using:
```python
from unityagents import UnityEnvironment
env = UnityEnvironment(<your_env>)
```
Where `<your_env>` corresponds to the path to your environment executable.
You should receive a message confirming that the environment was loaded successfully.
## Running Training on your Virtual Machine
To run your training on the VM:
1. [Move](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/copy-files-to-linux-vm-using-scp) your built Unity application to your Virtual Machine.
2. Set the `python` sub-folder of the ml-agents repo to your working directory.
3. Run the following command:
```
python3 learn.py <your_app> --run-id=<run_id> --train
```
Where `<your_app>` is the path to your app (i.e. `~/unity-volume/3DBallHeadless`) and `<run_id>` is an identifer you would like to identify your training run with.
If you've selected to run on a N-Series VM with GPU support, you can verify that the GPU is being used by running `nvidia-smi` from the command line.
## Monitoring your Training Run with Tensorboard
Once you have started training, you can [use Tensorboard to observe the training](Using-Tensorboard.md).
1. Start by [opening the appropriate port for web traffic to connect to your VM](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/nsg-quickstart-portal).
* Note that you don't need to generate a new `Network Security Group` but instead, go to the **Networking** tab under **Settings** for your VM.
* As an example, you could use the following settings to open the Port with the following Inbound Rule settings:
* Source: Any
* Source Port Ranges: *
* Destination: Any
* Destination Port Ranges: 6006
* Protocol: Any
* Action: Allow
* Priority: <Leave as default>
2. Unless you started the training as a background process, connect to your VM from another terminal instance.
3. Set the `python` folder in ml-agents to your current working directory.
4. Run the following command from your `tensorboard --logdir=summaries --host 0.0.0.0`
5. You should now be able to open a browser and navigate to `<Your_VM_IP_Address>:6060` to view the TensorBoard report.
## Running on Azure Container Instances
[Azure Container Instances](https://azure.microsoft.com/services/container-instances/) allow you to spin up a container, on demand, that will run your training and then be shut down. This ensures you aren't leaving a billable VM running when it isn't needed. You can read more about [ML-Agents support for Docker containers here](Using-Docker.md). Using ACI enables you to offload training of your models without needing to install Python and Tensorflow on your own computer. You can find [instructions, including a pre-deployed image in DockerHub for you to use, available here](https://github.com/druttka/unity-ml-on-azure).
正在加载...
取消
保存