您最多选择25个主题 主题必须以中文或者字母或数字开头,可以包含连字符 (-),并且长度不得超过35个字符

19 KiB

Human Pose Estimation Tutorial

In this tutorial, we will walk through the process of importing rigged humanoid models and animations of .fbx format into your computer vision data generation project, and using them to produce key-point and pose-estimation ground-truth data. We will use the tools and samples provided in the Perception package.

We strongly recommend you finish Phase 1 of the Perception Tutorial before continuing with this one, especially if you do not have prior experience with Unity Editor.

Through-out the tutorial, lines starting with bullet points followed by "🟢 Action:" denote the individual actions you will need to perform in order to progress through the tutorial. This is while the rest of the text will provide additional context and explanation around the actions. If in a hurry, you can just follow the actions!

Steps included in this tutorial:

Step 1: Import .fbx Models and Animations

This tutorial assumes that you have already created a Unity project, installed the Perception package, and set up a Scene with a Perception Camera inside. If this is not the case, please follow steps 1 to 3 of Phase 1 of the Perception Tutorial.

  • 🟢 Action: Open the project you created in the Perception Tutorial steps mentioned above. Duplicate TutorialScene and name the new Scene HPE_Scene. Open HPE_Scene.

We will use this duplicated Scene in this tutorial so that we do not lose our grocery object detection setup from the Perception Tutorial.

  • 🟢 Action: If your Scene already contains a Scenario object from the Perception Tutorial, remove all previously added Randomizers from this Scenario.
  • 🟢 Action: If your Scene does not already contain a Scenario, create an empty GameObject, name it Simulation Scenario, and add a Fixed Length Scenario component to it.

Your Scenario should now look like this:

  • 🟢 Action: Select Main Camera and in the Inspector view of the Perception Camera component, disable all previously added labelers using the check-mark in front of each. We will be using a new labeler in this tutorial.

We now need to import the sample files required for this tutorial.

  • 🟢 Action: Open Package Manager and select the Perception package, which should already be present in the navigation pane to the left side.
  • 🟢 Action: From the list of Samples for the Perception package, click on the Import into Project button for the sample bundle named Human Pose Estimation.

Once the sample files are imported, they will be placed inside the Assets/Samples/Perception folder in your Unity project, as seen in the image below:

  • 🟢 Action: Select all of the asset inside the Assets/Samples/Perception/<perception-package-version>/Human Pose Estimation/Models and Animations.
  • 🟢 Action: In the Inspector tab, navigate to the Rig section.

Note how Animation Type is set to Humanoid for all selected assets. This is a requirement and makes sure all animations included in the sample .fbx files are ready to be used on a rigged humanoid model.

Step 2: Set Up a Humanoid Character in a Scene

  • 🟢 Action: Drag and drop the file named Player.fbx into your Scene Hierarchy.
  • 🟢 Action: Select the new Player object in the Scene and in the Inspector tab set its transform's position and rotation according to the image below to make the character face the camera.

The Player object already has an Animator component attached. This is because the Animation Type property of all the sample .fbx files is set to Humanoid. We will now need to attach an Animation Controller to the Animator component, in order for our character to animate.

  • 🟢 Action: Create a new Animation Controller asset in your Assets folder and name it TestAnimationController.
  • 🟢 Action: Double click the new controller to open it. Then right click in the empty area and select Create State -> Empty.

This will create a new state and attach it to the Entry state with a new transition edge. This means the controller will always move to this new state as soon as the Animator component is awoken. In this example, this will happen when the button is pressed and the simulation starts.

  • 🟢 Action: Click on the state named New State. Then, in the Inspector tab click the small circle next to Motion to select an animation clip.

In the selector window that pops up, you will see several clips named Take 001. These are animation clips that are bundled inside of the sample .fbx files you imported into the project.

  • 🟢 Action: Select the animation clip originating from the TakeObjects.fbx file, as seen below:

  • 🟢 Action: Assign TestAnimationController to the Controller property of the Player object's Animator component.

If you run the simulation now you will see the character performing an animation for picking up a hypothetical object as seen in the GIF below.

Step 3: Set Up the Perception Camera for Key Point Annotation

Now that we have our character performing animations, let's modify our Perception Camera to report the character's key points in the output dataset, updating frame by frame as they animate.

  • 🟢 Action: Add a KeyPointLabeler to the list of labelers in Perception Camera. Also, make sure Show Labeler Visualizations is turned on so that you can verify the labeler working.

Similar to the labelers we used in the Perception Tutorial, we will need a label configuration for this new labeler.

  • 🟢 Action: In the Project tab, right-click the Assets folder, then click Create -> Perception -> Id Label Config. Name the new asset HPE_IdLabelConfig.
  • 🟢 Action: Add a MyCharacter label to the newly created config.

ℹ️ You can use any label string, as long as you assign the same label to the Player object in the next step.

  • 🟢 Action: Add a Labeling component to the Player object in the Scene.
  • 🟢 Action: In the Inspector UI for this new Labeling component, expand HPE_IdLabelConfig and click Add to Labels on MyCharacter.

  • 🟢 Action: Return to Perception Camera and assign HPE_IdLabelConfig to the KeyPointLabeler's label configuration property.

The labeler should now look like the image below:

Note the CocoKeypointTemplate asset that is already assigned as the Active Template. This template will tell the labeler how to map default Unity rig joints to human joint labels in the popular COCO dataset, so that the output of the labeler can be easily converted to COCO format. Later in this tutorial, we will learn how to add more joints to our character and how to customize joint mapping templates.

You can now check out the output dataset to see what the annotations look like. To do this, click the Show Folder button in the Perception Camera UI, then navigate inside to the dataset folder to find the captures_000.json file. Here is an example annotation for the first frame of our test-case here:

"pose": "unset",
  "keypoints": [
    {
      "index": 0,
      "x": 0.0,
      "y": 0.0,
      "state": 0
    },
    {
      "index": 1,
      "x": 649.05615234375,
      "y": 300.65264892578125,
      "state": 2
    },
    {
      "index": 2,
      "x": 594.4522705078125,
      "y": 335.8978271484375,
      "state": 2
    },
    {
      "index": 3,
      "x": 492.46444702148438,
      "y": 335.72491455078125,
      "state": 2
    },
    {
      "index": 4,
      "x": 404.89456176757813,
      "y": 335.57647705078125,
      "state": 2
    },
    {
      "index": 5,
      "x": 705.89404296875,
      "y": 335.897705078125,
      "state": 2
    },
    {
      "index": 6,
      "x": 807.74688720703125,
      "y": 335.7244873046875,
      "state": 2
    },
    {
      "index": 7,
      "x": 895.1993408203125,
      "y": 335.57574462890625,
      "state": 2
    },
    {
      "index": 8,
      "x": 612.51654052734375,
      "y": 509.065185546875,
      "state": 2
    },
    {
      "index": 9,
      "x": 608.50006103515625,
      "y": 647.0631103515625,
      "state": 2
    },
    {
      "index": 10,
      "x": 611.7791748046875,
      "y": 797.7828369140625,
      "state": 2
    },
    {
      "index": 11,
      "x": 682.175048828125,
      "y": 509.06524658203125,
      "state": 2
    },
    {
      "index": 12,
      "x": 683.1016845703125,
      "y": 649.64434814453125,
      "state": 2
    },
    {
      "index": 13,
      "x": 686.3271484375,
      "y": 804.203857421875,
      "state": 2
    },
    {
      "index": 14,
      "x": 628.012939453125,
      "y": 237.50531005859375,
      "state": 2
    },
    {
      "index": 15,
      "x": 660.023193359375,
      "y": 237.50543212890625,
      "state": 2
    },
    {
      "index": 16,
      "x": 0.0,
      "y": 0.0,
      "state": 0
    },
    {
      "index": 17,
      "x": 0.0,
      "y": 0.0,
      "state": 0
    }
  ]
}

In the above annotation, all of the 18 joints defined in the COCO template we used are listed. For each joint that is present in our character, you can see the X and Y coordinates within the captured frame. However, you may notice three of the joints are listed with (0,0) coordinates. These joints are not present in our character. A fact that is also denoted by the state field. A state of 0 means the joint was not present, 1 denotes a joint that is present but not visible (to be implemented in a later version of the package), and 2 means the joint was present and visible.

You may also note that the pose field has a value of unset. This is because we have not defined poses for our animation clip and Perception Camera yet. We will do this next.

Step 4: Configure Human Pose Estimation

  • 🟢 Action: In the Project tab, right-click the Assets folder, then click Create -> Perception -> Animation Pose Config. Name the new asset MyAnimationPoseConfig.

This type of asset lets us specify custom time ranges of an animation clip as poses. The time ranges are between 0 and 1 as they denote percentages of time elapsed in the animation clip.

  • 🟢 Action: Select the MyAnimationPoseConfig asset. In the Inspector view, choose the same animation clip as before for the Animation Clip property. This would be the clip originating from TakeObjects.fbx.

You can now use the Timestamps list to define poses. Let's define four poses here:

  1. Reaching for the object. (starts at the 0% timestamp)
  2. Taking the object and standing up. (starts at the 28% timestamp)
  3. Putting the object in the pocket. (starts at the 65% timestamp)
  4. Standing. (starts at the 90% timestamp)

ℹ️ To find the time indexes in an animation clip that correspond with different poses, you can directly open the clip inside the Inspector. Click on the TakeObjects.fbx file in the Project tab. Then, in the Inspector view, you will see a small preview of the model along with a timeline above it. Move the timeline's marker to advance through the animation.

Modify MyAnimationPoseConfig according to the image below:

The pose configuration we created needs to be assigned to our KeyPointLabeler. So:

  • 🟢 Action: In the Inspector UI for Perception Camera, set the Size of Animation Pose Configs for the KeyPointLabeler to 1. Then, assign the MyAnimationPoseConfig to the sole slot in the list, as shown below:

If you run the simulation again to generate a new dataset, you will see the new poses we defined written in it. All frames that belong to a certain pose will have the pose label attached.

Step 5: Add Joints to the Character and Customize Key Points Templates

The CocoKeypointTemplate asset that we are using on our KeyPointLabeler maps all of the joints included in the rigged character to their corresponding COCO labels. However, the industry standard character rigs used in Unity do not include some of the joints that are included in the COCO format. As we saw earlier, these joints appear with a state of 0 and coordinates of (0,0) in our current dataset. These joints are:

  • Nose
  • Left Ear
  • Right Ear

We will now add these joints to our character using labels that are defined in the CocoKeypointTemplate asset. Let's first have a look at this asset.

  • 🟢 Action: In the UI for the KeyPointLabeler on Perception Camera, click on CocoKeypointTemplate to reveal the asset in the Project tab, then click the asset to open it.

In the Inspector view of CocoKeypointTemplate, you will see the list of 18 key points of the COCO standard. If you expand each key point, you can see a number of options. The Label property defines a string that can be used for mapping custom joints on the character to this template (we will do this shortly). The Associate To Rig flag denotes whether this key point can be directly mapped to a standard Unity key point in the rigged character. If this flag is enabled, the key point will then be mapped to the Rig Label chosen below it. The Rig Label dropdown displays a list of all standard joints available in rigged characters in Unity. In our case here, the list does not include the nose joint, that is why the nose key point has Associate To Rig disabled. If you look at an example that does exist in the list of standard joints (e.g. neck), the Associate to Rig flag is enabled, and the proper corresponding joint is selected as Rig Label. Note that when Associate To Rig is disabled, the Rig Label property is ignored. The image below depicts the nose and neck examples:

If you review the list you will see the other two joints besides nose that are not associated to the rig are left_ear and right_ear.

  • 🟢 Action: Expand the Player object's hierarchy in the scene to find the Head object.

We will create our three new joints under the Head object.

  • 🟢 Action: Create three new empty GameObjects under Head and place them in the proper positions for the character's nose and ears, as seen in the GIF below (make sure the positions are correct in 3D space):

The final step in this process would be to label these new joints such that they match the labels of their corresponding key points in CocoKeyPointTemplate. For this purpose, we use the Joint Label component.

  • 🟢 Action: Add a Joint Label component to each of the newly created joints. Then, for each joint, set Size to 1, Template to CocoKeypointTemplate, and Label to the proper string (one of nose, left_ear or right_ear). These are also shown in the GIF above.

If you run the simulation now, you can see the new joints being visualized:

You could now look at the latest generated dataset to confirm the new joints are being detected and written.

Step 6: Randomize the Humanoid Character's Animations

The final step of this tutorial is to randomize the animations of the character, so that we can generate large amounts of data with varied animations and timestamps for computer vision training.

  • 🟢 Action: Add the Animation Randomizer to the list of Randomizers in the Simulation Scenario object.
  • 🟢 Action: Set the Scenario's number of Frames Per Iteration to 150 and the number of Total Iterations to 20.
  • 🟢 Action: Add an Animation Randomizer Tag component to the Player object to let the above Randomizer know this object's animations shall be randomized.

The Animation Randomizer Tag accepts a list of animation clips. At runtime, the Animation Randomizer will pick one of the provided clips randomly as well as a random time within the selected clip, and apply them to the character's Animator. Since we set the number of Frames Per Iteration to 100, each clip will play for 100 frames before the next clip replaces it.

  • 🟢 Action: Add four options to the Animation Randomizer Tag list. Then populate these options with the animation clips originating from the files Run.fbx, Walk.fbx, PutGlassesOn.fbx, and Idle.fbx (these are just examples; you can try any number or choice of rig animation clips).

If you run the simulation now, your character will randomly perform one of the above four animations, each for 150 frames. This cycle will recur 20 times, which is the total number of Iterations in you Scenario.

ℹ️ The reason the character stops animating at certain points in the above GIF is that the animation clips are not set to loop. Therefore, if the randomly selected timestamp is sufficiently close to the end of the clip, the character will complete the animation and stop animating for the rest of the Iteration.

This concludes the Human Pose Estimation Tutorial. Thank you for following these instructions with us. In case of any issues or questions, please feel free to open a GitHub issue on the com.unity.perception repository so that the Unity Computer Vision team can get back to you as soon as possible.