浏览代码

Fixed various typos (#2652)

* Add console log section to Bug Report form (#2566)

* Fixed typos
/develop-gpu-test
Chris Elion 5 年前
当前提交
7a178f12
共有 7 个文件被更改,包括 14 次插入11 次删除
  1. 3
      .github/ISSUE_TEMPLATE/bug_report.md
  2. 2
      docs/Learning-Environment-Design-Agents.md
  3. 2
      docs/Migrating.md
  4. 6
      docs/Reward-Signals.md
  5. 8
      docs/Training-Generalized-Reinforcement-Learning-Agents.md
  6. 2
      docs/Training-Imitation-Learning.md
  7. 2
      docs/localized/KR/docs/Training-PPO.md

3
.github/ISSUE_TEMPLATE/bug_report.md


3. Scroll down to '....'
4. See error
**Console logs / stack traces**
Please wrap in [triple backticks (```)](https://help.github.com/en/articles/creating-and-highlighting-code-blocks) to make it easier to read.
**Screenshots**
If applicable, add screenshots to help explain your problem.

2
docs/Learning-Environment-Design-Agents.md


![RenderTexture with Raw Image](images/visual-observation-rawimage.png)
The [GridWorld environment](Learning-Environment-Examples.md#gridworld)
is an example on how to use a RenderTexure for both debugging and observation. Note
is an example on how to use a RenderTexture for both debugging and observation. Note
that in this example, a Camera is rendered to a RenderTexture, which is then used for
observations and debugging. To update the RenderTexture, the Camera must be asked to
render every time a decision is requested within the game code. When using Cameras

2
docs/Migrating.md


## Migrating from ML-Agents toolkit v0.7 to v0.8
### Important Changes
* We have split the Python packges into two seperate packages `ml-agents` and `ml-agents-envs`.
* We have split the Python packages into two separate packages `ml-agents` and `ml-agents-envs`.
* `--worker-id` option of `learn.py` has been removed, use `--base-port` instead if you'd like to run multiple instances of `learn.py`.
#### Steps to Migrate

6
docs/Reward-Signals.md


to reaching some goal. These are what we refer to as "extrinsic" rewards, as they are defined
external of the learning algorithm.
Rewards, however, can be defined outside of the enviroment as well, to encourage the agent to
Rewards, however, can be defined outside of the environment as well, to encourage the agent to
behave in certain ways, or to aid the learning of the true extrinsic reward. We refer to these
rewards as "intrinsic" reward signals. The total reward that the agent will learn to maximize can
be a mix of extrinsic and intrinsic reward signals.

The `curiosity` Reward Signal enables the Intrinsic Curiosity Module. This is an implementation
of the approach described in "Curiosity-driven Exploration by Self-supervised Prediction"
by Pathak, et al. It trains two networks:
* an inverse model, which takes the current and next obersvation of the agent, encodes them, and
* an inverse model, which takes the current and next observation of the agent, encodes them, and
* a forward model, which takes the encoded current obseravation and action, and predicts the
* a forward model, which takes the encoded current observation and action, and predicts the
next encoded observation.
The loss of the forward model (the difference between the predicted and actual encoded observations) is used as the intrinsic reward, so the more surprised the model is, the larger the reward will be.

8
docs/Training-Generalized-Reinforcement-Learning-Agents.md


One of the challenges of training and testing agents on the same
environment is that the agents tend to overfit. The result is that the
agents are unable to generalize to any tweaks or variations in the enviornment.
This is analgous to a model being trained and tested on an identical dataset
agents are unable to generalize to any tweaks or variations in the environment.
This is analogous to a model being trained and tested on an identical dataset
should be trained over multiple variations of the enviornment. Using this approach
should be trained over multiple variations of the environment. Using this approach
to future unseen variations of the enviornment
to future unseen variations of the environment
_Example of variations of the 3D Ball environment._

2
docs/Training-Imitation-Learning.md


The ML-Agents toolkit provides several ways to learn from demonstrations.
* To train using GAIL (Generative Adversarial Imitaiton Learning) you can add the
* To train using GAIL (Generative Adversarial Imitation Learning) you can add the
[GAIL reward signal](Reward-Signals.md#gail-reward-signal). GAIL can be
used with or without environment rewards, and works well when there are a limited
number of demonstrations.

2
docs/localized/KR/docs/Training-PPO.md


### Beta
`beta` 는 엔트로피 정규화 (Entropy Regulazation)의 정도를 결정하며 이를 통해 정책을 더 랜덤하게 만들 수 있습니다. 이 값을 통해 에이전트는 학습 동안 액션 공간을 적절하게 탐험할 수 있습니다. 이 값을 증가시키면 에이전트가 더 많이 랜덤 행동을 취하게 됩니다. 엔트로피 (텐서보드를 통해 측정 가능)는 보상이 증가함에 따라 서서히 크기를 감소시켜야합니다. 만약 엔트로피가 너무 빠르게 떨어지면 `beta`를 증가시켜야합니다. 만약 엔트로피가 너무 느리게 떨어지면 `beta`를 감소시켜야 합니다.
`beta` 는 엔트로피 정규화 (Entropy Regularization)의 정도를 결정하며 이를 통해 정책을 더 랜덤하게 만들 수 있습니다. 이 값을 통해 에이전트는 학습 동안 액션 공간을 적절하게 탐험할 수 있습니다. 이 값을 증가시키면 에이전트가 더 많이 랜덤 행동을 취하게 됩니다. 엔트로피 (텐서보드를 통해 측정 가능)는 보상이 증가함에 따라 서서히 크기를 감소시켜야합니다. 만약 엔트로피가 너무 빠르게 떨어지면 `beta`를 증가시켜야합니다. 만약 엔트로피가 너무 느리게 떨어지면 `beta`를 감소시켜야 합니다.
일반적인 범위: 1e-4 - 1e-2

正在加载...
取消
保存