Unity 机器学习代理工具包 (ML-Agents) 是一个开源项目,它使游戏和模拟能够作为训练智能代理的环境。
您最多选择25个主题 主题必须以中文或者字母或数字开头,可以包含连字符 (-),并且长度不得超过35个字符
 
 
 
 
 

12 KiB

Basic Guide

This guide will show you how to use a pre-trained model in an example Unity environment (3D Ball) and show you how to train the model yourself.

If you are not familiar with the Unity Engine, we highly recommend the Roll-a-ball tutorial to learn all the basic concepts of Unity.

Setting up the ML-Agents Toolkit within Unity

In order to use the ML-Agents toolkit within Unity, you first need to change a few Unity settings.

  1. Launch Unity
  2. On the Projects dialog, choose the Open option at the top of the window.
  3. Using the file dialog that opens, locate the UnitySDK folder within the ML-Agents toolkit project and click Open.
  4. Go to Edit > Project Settings > Player
  5. For each of the platforms you target (PC, Mac and Linux Standalone, iOS or Android):
    1. Expand the Other Settings section.
    2. Select Scripting Runtime Version to Experimental (.NET 4.6 Equivalent or .NET 4.x Equivalent)
  6. Go to File > Save Project

Running a Pre-trained Model

We include pre-trained models for our agents (.nn files) and we use the Unity Inference Engine to run these models inside Unity. In this section, we will use the pre-trained model for the 3D Ball example.

  1. In the Project window, go to the Assets/ML-Agents/Examples/3DBall/Scenes folder and open the 3DBall scene file.

  2. In the Project window, go to the Assets/ML-Agents/Examples/3DBall/Prefabs folder. Expand 3DBall and click on the Agent prefab. You should see the Agent prefab in the Inspector window.

    Note: The platforms in the 3DBall scene were created using the 3DBall prefab. Instead of updating all 12 platforms individually, you can update the 3DBall prefab instead.

    Platform Prefab

  3. In the Project window, drag the 3DBallLearning Model located in Assets/ML-Agents/Examples/3DBall/TFModels into the Model property under Ball 3D Agent (Script) component in the Inspector window.

    3dball learning brain

  4. You should notice that each Agent under each 3DBall in the Hierarchy windows now contains 3DBallLearning as Model. Note : You can modify multiple game objects in a scene by selecting them all at once using the search bar in the Scene Hierarchy.

  5. Select the InferenceDevice to use for this model (CPU or GPU) on the Agent. Note: CPU is faster for the majority of ML-Agents toolkit generated models

  6. Click the Play button and you will see the platforms balance the balls using the pre-trained model.

    Running a pre-trained model

Using the Basics Jupyter Notebook

The notebooks/getting-started.ipynb Jupyter notebook contains a simple walk-through of the functionality of the Python API. It can also serve as a simple test that your environment is configured correctly. Within Basics, be sure to set env_name to the name of the Unity executable if you want to use an executable or to None if you want to interact with the current scene in the Unity Editor.

More information and documentation is provided in the Python API page.

Training the Model with Reinforcement Learning

Setting up the environment for training

In order to setup the Agents for Training, you will need to edit the Behavior Name under BehaviorParamters in the Agent Inspector window. The Behavior Name is used to group agents per behaviors. Note that Agents sharing the same Behavior Name must be agents of the same type using the same Behavior Parameters. You can make sure all your agents have the same Behavior Parameters using Prefabs. The Behavior Name corresponds to the name of the model that will be generated by the training process and is used to select the hyperparameters from the training configuration file.

Training the environment

  1. Open a command or terminal window.

  2. Navigate to the folder where you cloned the ML-Agents toolkit repository. Note: If you followed the default installation, then you should be able to run mlagents-learn from any directory.

  3. Run mlagents-learn <trainer-config-path> --run-id=<run-identifier> --train where:

    • <trainer-config-path> is the relative or absolute filepath of the trainer configuration. The defaults used by example environments included in MLAgentsSDK can be found in config/trainer_config.yaml.
    • <run-identifier> is a string used to separate the results of different training runs
    • --train tells mlagents-learn to run a training session (rather than inference)
  4. If you cloned the ML-Agents repo, then you can simply run

    mlagents-learn config/trainer_config.yaml --run-id=firstRun --train
    
  5. When the message "Start training by pressing the Play button in the Unity Editor" is displayed on the screen, you can press the ▶️ button in Unity to start training in the Editor.

    Note: Alternatively, you can use an executable rather than the Editor to perform training. Please refer to this page for instructions on how to build and use an executable.

ml-agents$ mlagents-learn config/trainer_config.yaml --run-id=first-run --train


                        ▄▄▄▓▓▓▓
                   ╓▓▓▓▓▓▓█▓▓▓▓▓
              ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
          ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
        ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
        ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
          ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
            '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
               ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
                   `▀█▓▓▓▓▓▓▓▓▓▌
                        ¬`▀▀▀█▓


INFO:mlagents.learn:{'--curriculum': 'None',
 '--docker-target-name': 'Empty',
 '--env': 'None',
 '--help': False,
 '--keep-checkpoints': '5',
 '--lesson': '0',
 '--load': False,
 '--no-graphics': False,
 '--num-runs': '1',
 '--run-id': 'first-run',
 '--save-freq': '50000',
 '--seed': '-1',
 '--slow': False,
 '--train': True,
 '--worker-id': '0',
 '<trainer-config-path>': 'config/trainer_config.yaml'}
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.

Note: If you're using Anaconda, don't forget to activate the ml-agents environment first.

If mlagents-learn runs correctly and starts training, you should see something like this:

INFO:mlagents.envs:
'Ball3DAcademy' started successfully!
Unity Academy name: Ball3DAcademy
        Number of Brains: 1
        Number of Training Brains : 1
        Reset Parameters :

Unity brain name: 3DBallLearning
        Number of Visual Observations (per agent): 0
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): [2]
        Vector Action descriptions: ,
INFO:mlagents.envs:Hyperparameters for the PPO Trainer of brain 3DBallLearning:
        batch_size:          64
        beta:                0.001
        buffer_size:         12000
        epsilon:             0.2
        gamma:               0.995
        hidden_units:        128
        lambd:               0.99
        learning_rate:       0.0003
        max_steps:           5.0e4
        normalize:           True
        num_epoch:           3
        num_layers:          2
        time_horizon:        1000
        sequence_length:     64
        summary_freq:        1000
        use_recurrent:       False
        summary_path:        ./summaries/first-run-0
        memory_size:         256
        use_curiosity:       False
        curiosity_strength:  0.01
        curiosity_enc_size:  128
        model_path:	./models/first-run-0/3DBallLearning
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training.
INFO:mlagents.trainers: first-run-0: 3DBallLearning: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training.

After training

You can press Ctrl+C to stop the training, and your trained model will be at models/<run-identifier>/<behavior_name>.nn where <behavior_name> is the name of the Behavior Name of the agents corresponding to the model. (Note: There is a known bug on Windows that causes the saving of the model to fail when you early terminate the training, it's recommended to wait until Step has reached the max_steps parameter you set in trainer_config.yaml.) This file corresponds to your model's latest checkpoint. You can now embed this trained model into your Agents by following the steps below, which is similar to the steps described above.

  1. Move your model file into UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/.
  2. Open the Unity Editor, and select the 3DBall scene as described above.
  3. Select the 3DBall prefab Agent object.
  4. Drag the <behavior_name>.nn file from the Project window of the Editor to the Model placeholder in the Ball3DAgent inspector window.
  5. Press the ▶️ button at the top of the Editor.

Next Steps