2.3 KiB
Kubeflow pipelines
The Kubeflow pipelines are located inside the kubeflow/ folder, where you can find one pipeline for training the model, and one for the evaluation.
Train and Evaluate Model using Kubeflow
Create New Pipeline
You will need a Python environment with kfp==0.5.1 installed.
Compile Kubeflow pipeline
cd kubeflow
python pose_estimation_train_pipeline.py
This will create a file pose_estimation_train_pipeline.py.tar.gz
which can be uploaded to Kubeflow pipeline for executions.
Next, go to the Kubeflow dashboard, and upload and create new pipeline using the above pipeline. You should be able to create a new parameterized experiment to run a Kubeflow pipeline following this tutorial.
Train a model on Kubeflow for Single Object Pose Estimation
Go to this pipeline and follow the tutorial to create a Kubeflow project. Once you have set up your project, go to the dashboard and press the Create Run
button at the top of the screen (it has a solid blue fill and white letters).
-
Set
docker_image
to begcr.io/unity-ai-thea-test/datasetinsights:<git-commit-sha>
Where<git-commit-sha>
is the sha from the latest version of master in the thea repo. It should be the latest commit in the history: link. -
To check the progress of your model run
docker run -p 6006:6006 -v $HOME/.config:/root/.config:ro tensorflow/tensorflow tensorboard --host=0.0.0.0 --logdir gs://your/log/directory
. Openhttp://localhost:6006
to view the dashboard.This command assumes you have run
gcloud auth login
command and the local credential is stored in$HOME/.config
, which is mounted to the home directory inside Docker. It must have read permission togs://your/log/directory
-
If the mAP and mAR for validation are leveling off then you can terminate the run early; it's unlikely the model's performance will improve.
-
The model will save checkpoints after every epoch to the logdir with the format
gs://logdir/ep#.estimator
, e.g.gs://thea-dev/runs/20200328_221415/FasterRCNN.ep24.estimator