您最多选择25个主题 主题必须以中文或者字母或数字开头,可以包含连字符 (-),并且长度不得超过35个字符
 
 
 
 
 

2.3 KiB

Kubeflow pipelines

The Kubeflow pipelines are located inside the kubeflow/ folder, where you can find one pipeline for training the model, and one for the evaluation.

Train and Evaluate Model using Kubeflow

Create New Pipeline

You will need a Python environment with kfp==0.5.1 installed.

Compile Kubeflow pipeline

cd kubeflow
python pose_estimation_train_pipeline.py

This will create a file pose_estimation_train_pipeline.py.tar.gz which can be uploaded to Kubeflow pipeline for executions.

Next, go to the Kubeflow dashboard, and upload and create new pipeline using the above pipeline. You should be able to create a new parameterized experiment to run a Kubeflow pipeline following this tutorial.

Train a model on Kubeflow for Single Object Pose Estimation

Go to this pipeline and follow the tutorial to create a Kubeflow project. Once you have set up your project, go to the dashboard and press the Create Run button at the top of the screen (it has a solid blue fill and white letters).

  1. Set docker_image to be gcr.io/unity-ai-thea-test/datasetinsights:<git-commit-sha> Where <git-commit-sha> is the sha from the latest version of master in the thea repo. It should be the latest commit in the history: link.

  2. To check the progress of your model run docker run -p 6006:6006 -v $HOME/.config:/root/.config:ro tensorflow/tensorflow tensorboard --host=0.0.0.0 --logdir gs://your/log/directory. Open http://localhost:6006 to view the dashboard.

    This command assumes you have run gcloud auth login command and the local credential is stored in $HOME/.config, which is mounted to the home directory inside Docker. It must have read permission to gs://your/log/directory

  3. If the mAP and mAR for validation are leveling off then you can terminate the run early; it's unlikely the model's performance will improve.

  4. The model will save checkpoints after every epoch to the logdir with the format gs://logdir/ep#.estimator, e.g. gs://thea-dev/runs/20200328_221415/FasterRCNN.ep24.estimator