浏览代码

added faq to the aws doc (#1320)

* added faq to the aws doc

* added the link

* added some faq and updated the temp ami id

* resolved the comments, updated one of the faq along with the scriptable object update

* added one other cause raise in issues

* fixed line change
/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
b4171b67
共有 2 个文件被更改,包括 106 次插入14 次删除
  1. 25
      docs/FAQ.md
  2. 95
      docs/Training-on-Amazon-Web-Service.md

25
docs/FAQ.md


## Environment Connection Timeout
If you are able to launch the environment from `UnityEnvironment` but then
receive a timeout error, there may be a number of possible causes.
receive a timeout error like this:
```
UnityAgentsException: The Communicator was unable to connect. Please make sure the External process is ready to accept communication with Unity.
```
There may be a number of possible causes:
* _Cause_: There may be no Brains the `Broadcast Hub` of the Academy.
In this case, the environment will not attempt to communicate
with python. _Solution_: Set the Brains(s) you wish to externally control
through the Python API to `External` from the Unity Editor, and rebuild the
environment.
* _Cause_: There may be no LearningBrain with `Control` option checked in the
`Broadcast Hub` of the Academy. In this case, the environment will not attempt
to communicate with python. _Solution_: Click `Add New` in your Academy's
`Broadcast Hub`, and drag your LearningBrain asset into the `Brains` field,
and check the `Control` toggle. Also you need to assign this LearningBrain
asset to all of the Agents you wish to do training on.
* _Cause_: On OSX, the firewall may be preventing communication with the
environment. _Solution_: Add the built environment binary to the list of
exceptions on the firewall by following

files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the Unity
Environment to figure what error happened.
# _Cause_: You have assigned HTTP_PROXY and HTTPS_PROXY values in your
environment variables. _Solution_: Remove these values and try again.
## Communication port {} still in use

Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts
for custom episode-terminating events.
## Problems with training on AWS
Please refer to [Training on Amazon Web Service FAQ](Training-on-Amazon-Web-Service.md#faq)

95
docs/Training-on-Amazon-Web-Service.md


## Preconfigured AMI
We've prepared a preconfigured AMI for you with the ID: `ami-18642967` in the
We've prepared a preconfigured AMI for you with the ID: `ami-016ff5559334f8619` in the
`us-east-1` region. It was created as a modification of [Deep Learning AMI
(Ubuntu)](https://aws.amazon.com/marketplace/pp/B077GCH38C). The AMI has been
tested with p2.xlarge instance. Furthermore, if you want to train without

would on a local machine. Ensure that `headless` mode is disabled when building
linux executables which use visual observations.
1. Install and setup Xorg:
#### Install and setup Xorg:
```console
# Install Xorg

$ sudo vim /etc/X11/xorg.conf
```
2. Update and setup Nvidia driver:
#### Update and setup Nvidia driver:
$ wget http://download.nvidia.com/XFree86/Linux-x86_64/390.67/NVIDIA-Linux-x86_64-390.67.run
# Please refer to http://download.nvidia.com/XFree86/Linux-#x86_64/latest.txt
$ wget http://download.nvidia.com/XFree86/Linux-x86_64/390.87/NVIDIA-Linux-x86_64-390.87.run
$ sudo /bin/bash ./NVIDIA-Linux-x86_64-390.67.run --accept-license --no-questions --ui=none
# Disable Nouveau as it will clash with the Nvidia driver

$ sudo update-initramfs -u
```
3. Restart the EC2 instance:
#### Restart the EC2 instance:
4. Make sure there are no Xorg processes running:
#### Make sure there are no Xorg processes running:
```console
# Kill any possible running Xorg processes

```
5. Start X Server and make the ubuntu use X Server for display:
#### Start X Server and make the ubuntu use X Server for display:
```console
# Start the X Server, press Enter to come back to the command line

$ export DISPLAY=:0
```
6. Ensure the Xorg is correctly configured:
#### Ensure the Xorg is correctly configured:
```console
# For more information on glxgears, see ftp://www.x.org/pub/X11R6.8.1/doc/glxgears.1.html.

```console
mlagents-learn <trainer-config-file> --env=<your_env> --train
```
## FAQ
### The <Executable_Name>_Data folder hasn't been copied cover
If you've built your Linux executable, but forget to copy over the corresponding <Executable_Name>_Data folder, you will see error message like the following:
```console
Set current directory to /home/ubuntu/ml-agents/ml-agents
Found path: /home/ubuntu/ml-agents/ml-agents/3dball_linux.x86_64
no boot config - using default values
(Filename: Line: 403)
There is no data folder
```
### Unity Environment not responding
If you didn't setup X Server or hasn't launched it properly, or you didn't made your environment with external brain, or your environment somehow crashes, or you haven't `chmod +x` your Unity Environment, all of these will cause connection between Unity and Python to fail. Then you will see something like this:
```console
Logging to /home/ubuntu/.config/unity3d/<Some_Path>/Player.log
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/ml-agents/ml-agents/mlagents/envs/environment.py", line 63, in __init__
aca_params = self.send_academy_parameters(rl_init_parameters_in)
File "/home/ubuntu/ml-agents/ml-agents/mlagents/envs/environment.py", line 489, in send_academy_parameters
return self.communicator.initialize(inputs).rl_initialization_output
File "/home/ubuntu/ml-agents/ml-agents/mlagents/envs/rpc_communicator.py", line 60, in initialize
mlagents.envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
The environment does not need user interaction to launch
The Academy and the External Brain(s) are attached to objects in the Scene
The environment and the Python interface have compatible versions.
```
It would be also really helpful to check your /home/ubuntu/.config/unity3d/<Some_Path>/Player.log to see what happens with your Unity environment.
### Could not launch X Server
When you execute:
```console
sudo /usr/bin/X :0 &
```
You might see something like:
```console
X.Org X Server 1.18.4
...
(==) Log file: "/var/log/Xorg.0.log", Time: Thu Oct 11 21:10:38 2018
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(EE)
Fatal server error:
(EE) no screens found(EE)
(EE)
Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE)
(EE) Server terminated with error (1). Closing log file.
```
And when you execute:
```console
nvidia-smi
```
You might see something like:
```console
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
```
This means the NVIDIA's driver needs to be updated. Refer to [this section](Training-on-Amazon-Web-Service.md#update-and-setup-nvidia-driver) for more information.
正在加载...
取消
保存