10 KiB

原始文件 文件历史

Codebase Structure

In this project, I create a network to predict the position of a cube.

Architecture

The pose estimation project is organized as following.

PoseEstimationModel:

environment-gpu.yml: If the computer your are runnning the project from has a gpu support, this file sets the dependencices of the project and the different packages to install. It is meant to be used when you create your conda environment.
environment.yml: If you the computer your are runnning the project from does not have a gpu support, this file sets the dependencices of the project and the different packages to install. It is meant to be used when you create your conda environment.
setup.py: This file is to create a package as your project.
cli.py: This file contains the cli commands which are the commands to launch the different processes (either train or evaluate).
config.yaml: This file contains the default configuration for the estimator (pose estimation model) on the single cube dataset.
single_cube_dataset.py: This file contains knowledge on how the SingleCubeDataset class dataset should be loaded into memory.
model.py: This file contains the neural network along with the custom linear activation function to perform the pose estimation task: predict the object's translation (coordinates x, y, z of the cube's center) and the cube's orientation (quaternion describing the orientation of the cube) if the object is asymmetric otherwise it will predict only the translation.
pose_estimation_estimator.py: This file contains the pose estimation estimator and the different methods you can apply on your model as train, evaluate, save and load.
train.py: This file contains the model training process.
evaluate.py: This file contains the model evaluation process.
evaluation_metrics: This module contains metrics used by the pose estimation estimator.
logger.py: This module contains the logger class which is a class designed to save elements (metrics, losses) visible on tensorboard.
storage: This module contains functionality that relates to writing/downloading/uploading to/from different sources.
tests: This module contains all the tests which you can run using the pytest command.
Dockerfile: This file is the file reponsible for the creation of the docker image.
kubeflow: This module contains kubeflow pipelines (.py.tar.gz files). You can have more information on how to set up a kubeflow pipeline in the ReadMe.

Details of the config.py file

In the following, I will explain what each argument in the config.yaml means. There are 8 sections in the config files:

estimator: This will be the core name of the saved model
train:
- dataset_zip_file_name_training: name of the training dataset file.
- batch_training_size: number of training samples to work through before the model’s internal parameters are updated.
- accumulation_steps: number of backwards passes which are performed before updating the parameters. The goal is to have the same model parameters for multiple inputs (batches) and then update the model's parameters based on all these batches, instead of performing an update after every single batch.
- epochs: number of passes of the entire training dataset the machine learning algorithm has completed.
- beta_loss: beta coefficient when we add the translation and orientation losses.
- sample_size_train: size of a dataset training sample. It is used to test operations/commands on a few examples.
val:
- dataset_zip_file_name_validation: name of the validation dataset file.
- batch_validation_size: number of validation samples to work through before the metrics are calculated.
- eval_freq: frequency of epochs when the evaulation process is launched.
- sample_size_val: size of a dataset validation sample. It is used to test operations/commands on a few examples.
test:
- dataset_zip_file_name_test: name of the test dataset file.
- batch_test_size: number of training samples to work through before the metrics are calculated.
- sample_size_test: size of a dataset test sample. It is used to test operations/commands on a few examples.
dataset:
- image_scale: size of the image we use to train the model (some processings need to be applied to give as input to the model though).
- download_data_gcp: if True it will download the data from gcp otherwise it will use the data you have on local.
- gcs_bucket: name of GCS Bucket where the datasets are located.
- pose_estimation_gcs_path: path inside the gcp bucket where the datasets are located.
- symmetric: Boolean. If the object is symmetric then the element is True otherwise it is False. Based on that we will only predict the translation or translation and orientation.
adam_optimizer:
- lr: learning rate which scales the magnitude of our weight updates in order to minimize the network's loss function.
- beta_1: the exponential decay rate for the first moment estimates.
- beta_2: the exponential decay rate for the second-moment estimates.
checkpoint:
- load_dir_checkpoint: path towards the saved model.
- save_frequency: frequency of epochs when the model is saved. If it is set to 1 then the model will be saved every epoch and if it is set to 2 then the model will be saved ever two epochs.
system:
- log_dir_system: path where the model and the metrics (.tar file that will be visioned by tensorbard) will be saved.
- data_root: path towards the upper directory of the data.

Save and Load methods

In the pose_estimation_estimator.py file, there is one method to save and one method to load a model.

The save method is called in train.py file at the line 95 and the load method is called in the pose_estimation_estimator.py line 82. The model is saved using the save method of the checkpointer object and the model is loaded using the load method of the checkpointer object. The checkpointer object is created in pose_estimation_estimator.py file line 50. Then, to understand how the model is saved or loaded we need to look into the checkpoint.py file.

But first, let's have a general overview of the checkpoint.py file. There are three classes:

EstimatorCheckpoint: it assigns estimator checkpoint writer according to log_dir which is responsible for saving estimators. The writer can be a GCS or local writer. It also assigns loader which is responsible for loading estimator from a given path. Loader can be a local, GCS or HTTP loader.
LocalEstimatorWriter: it is to write (saves) estimator checkpoints locally.
GCSEstimatorWriter: it is to write (saves) estimator checkpoints on GCP (Google Cloud Platform).

When the EstimatorCheckpoint object is created, the static method _create_writer is called and based on the format of the log_dir, which is the directory where you want to save your model (attribute log_dir under system in config.yaml), a LocalEstimatorWriter or a GCSEstimatorWriter object is created.

Save

Now you have two options to save your model, either you save it on local or you save it on google cloud (you can use another cloud but you will have to make the changes yourself). Then, if we go back to the method called to save the model, it is the save method of the EstimatorCheckpoint object. This method calls the save method of the object created by the _create_writer method.

local: the class LocalEstimatorWriter takes as attributes a dirname which is the log_dir path, a prefix which is the name of the estimator (corresponds to the argument estimator in the config.yaml file), and a suffix which is by default equal to .tar (type of the file) and create a directory which will host the model. Then, the method save calls the method save of the estimator which in the pose_estimation_estimator.py file.
gcp: The class GCSEstimatorWriter takes as attributes cloud_path which is the log_dir path towards the gcp bucket and a prefix which is the name of the estimator (corresponds to the argument estimator in the config.yaml file). In the method save, the model is saved on a temporary directory on the local computer that the cloud uses. The process used is the process I just described a little bit above in local. The save method returns the full GCS cloud path to the saved checkpoint file. Then the method upload from the GCSClient() class is called: it is a method to upload files on Google Cloud Platform.

Load

Now you have three options to load your model, either you load it from local, you can load it on google cloud (you can use another cloud but you will have to make the changes yourself) or you can load it from http. Then, if we go back to the method called to load the model, it is the load method of the EstimatorCheckpoint object. This method calls the _get_loader_from_path and based on the path, a load method will be launched (load_local, load_from_gcs, load_from_http).

10 KiB 原始文件 文件历史