|
|
|
|
|
|
[Learning a policy](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/) |
|
|
|
usually requires many trials and iterative policy updates. More specifically, |
|
|
|
the robot is placed in several fire situations and over time learns an optimal |
|
|
|
policy which allows it to put our fires more effectively. Obviously, we cannot |
|
|
|
policy which allows it to put out fires more effectively. Obviously, we cannot |
|
|
|
expect to train a robot repeatedly in the real world, particularly when fires |
|
|
|
are involved. This is precisely why the use of |
|
|
|
[Unity as a simulator](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/) |
|
|
|