* **sequence**: a time-ordered series of captures generated by a simulation.
* **sequence**: a time-ordered series of captures generated by a simulation.
* **annotation**: data (e.g. bounding boxes or semantic segmentation) recorded that is used to describe a particular capture at the same timestamp.
A capture might include multiple types of annotations.
* **annotation**: data (e.g. bounding boxes or semantic segmentation) recorded that is used to describe a particular capture at the same timestamp.
A capture might include multiple types of annotations.
* **step**: id for data-producing frames in the simulation.
* **step**: id for data-producing frames in the simulation.
* **ego**: a frame of reference for a collection of sensors (camera/LIDAR/radar) attached to it.
* **ego**: a frame of reference for a collection of sensors (camera/LIDAR/radar) attached to it.
* **label**: a string token (e.g. car, human.adult, etc.) that represents a semantic type, or class.
One GameObject might have multiple labels used for different annotation purposes.
* **label**: a string token (e.g. car, human.adult, etc.) that represents a semantic type, or class.
One GameObject might have multiple labels used for different annotation purposes.
* **global coordinate system**: coordinate with respect to the global origin in Unity.
* **global coordinate system**: coordinate with respect to the global origin in Unity.
* **ego coordinate system**: coordinate with respect to an ego object.
Typically, this refers to an object moving in the Unity scene.
* **ego coordinate system**: coordinate with respect to an ego object.
Typically, this refers to an object moving in the Unity scene.
* **sensor coordinate system**: coordinate with respect to a sensor.
This is useful for ML model training for a single sensor, which can be transformed from a global coordinate system and ego coordinate system.
* **sensor coordinate system**: coordinate with respect to a sensor.
This is useful for ML model training for a single sensor, which can be transformed from a global coordinate system and ego coordinate system.
The schema is based on the [nuScenes data format](https://www.nuscenes.org/data-format).
The main difference between this schema and nuScenes is that we use **document based schema design** instead of **relational database schema design**.
The schema is based on the [nuScenes data format](https://www.nuscenes.org/data-format).
The main difference between this schema and nuScenes is that we use **document based schema design** instead of **relational database schema design**.
This means that instead of requiring multiple id-based "joins" to explore the data, data is nested and sometimes duplicated for ease of consumption.
## Components
capture {
id: <str> -- UUID of the capture.
sequence_id: <str> -- UUID of the sequence.
step: <int> -- The index of capture in the sequence. This field is used to order of captures within a sequence.
step: <int> -- The index of capture in the sequence. This field is used to order of captures within a sequence.
timestamp: <int> -- Timestamp in milliseconds since the sequence started.
sensor: <obj> -- Attributes of the sensor. see below.
ego: <obj> -- Ego pose of this sample. See below.
#### sequence, step and timestamp
In some use cases, two consecutive captures might not be related in time during simulation.
For example, if we generate randomly placed objects in a scene for X steps of simulation.
In this case, sequence, step and timestamp are irrelevant for the captured data.
In some use cases, two consecutive captures might not be related in time during simulation.
For example, if we generate randomly placed objects in a scene for X steps of simulation.
In this case, sequence, step and timestamp are irrelevant for the captured data.
In cases where we need to maintain time order relationship between captures (e.g. a sequence of camera capture in a 10 second video) and [metrics](#metrics), we need to add a sequence, step and timestamp to maintain the time ordered relationship of captures.
Sequence represents the collection of any time ordered captures and annotations.
Timestamps refer to the simulation wall clock in milliseconds since the sequence started.
Steps are integer values which increase when a capture or metric event is triggered.
We cannot use timestamps to synchronize between two different events because timestamps are floats and therefore make poor indices.
Instead, we use a "step" counter which make it easy to associate metrics and captures that occur at the same time.
Below is an illustration of how captures, metrics, timestamps and steps are synchronized.
In cases where we need to maintain time order relationship between captures (e.g. a sequence of camera capture in a 10 second video) and [metrics](#metrics), we need to add a sequence, step and timestamp to maintain the time ordered relationship of captures.
Sequence represents the collection of any time ordered captures and annotations.
Timestamps refer to the simulation wall clock in milliseconds since the sequence started.
Steps are integer values which increase when a capture or metric event is triggered.
We cannot use timestamps to synchronize between two different events because timestamps are floats and therefore make poor indices.
Instead, we use a "step" counter which make it easy to associate metrics and captures that occur at the same time.
Below is an illustration of how captures, metrics, timestamps and steps are synchronized.
Since each sensor might trigger captures at different frequencies, at the same timestamp we might contain 0 to N captures, where N is the total number of sensors included in this scene.
Since each sensor might trigger captures at different frequencies, at the same timestamp we might contain 0 to N captures, where N is the total number of sensors included in this scene.
Physical camera sensors require some time to finish exposure.
Physical LIDAR sensor requires some time to finish one 360 scan.
How do we define the timestamp of the sample in simulation?
Following the nuScene sensor [synchronization](https://www.nuscenes.org/data-collection) strategy, we define a reference line from ego origin to the ego’s "forward" traveling direction.
The timestamp of the LIDAR scan is the time when the full rotation of the current LIDAR frame is achieved.
A full rotation is defined as the 360 sweep between two consecutive times passing the reference line.
Physical camera sensors require some time to finish exposure.
Physical LIDAR sensor requires some time to finish one 360 scan.
How do we define the timestamp of the sample in simulation?
Following the nuScene sensor [synchronization](https://www.nuscenes.org/data-collection) strategy, we define a reference line from ego origin to the ego’s "forward" traveling direction.
The timestamp of the LIDAR scan is the time when the full rotation of the current LIDAR frame is achieved.
A full rotation is defined as the 360 sweep between two consecutive times passing the reference line.
An ego record stores the ego status data when a sample is created.
It includes translation, rotation, velocity and acceleration (optional) of the ego.
An ego record stores the ego status data when a sample is created.
It includes translation, rotation, velocity and acceleration (optional) of the ego.
The pose is with respect to the **global coordinate system** of a Unity scene.
```
rotation: <float,float,float,float> -- Orientation as quaternion: w, x, y, z.
velocity: <float,float,float> -- Velocity in meters per second as v_x, v_y, v_z.
acceleration: <float,float,float> [optional] -- Acceleration in meters per second^2 as a_x, a_y, a_z.
acceleration: <float,float,float> [optional] -- Acceleration in meters per second^2 as a_x, a_y, a_z.
A sensor record contains attributes of the sensor at the time of the capture.
A sensor record contains attributes of the sensor at the time of the capture.
Different sensor modalities may contain additional keys (e.g. field of view FOV for camera, beam density for LIDAR).
```
translation: <float,float,float> -- Position in meters: (x, y, z) with respect to the ego coordinate system. This is typically fixed during the simulation, but we can allow small variation for domain randomization.
rotation: <float,float,float,float> -- Orientation as quaternion: (w, x, y, z) with respect to ego coordinate system. This is typically fixed during the simulation, but we can allow small variation for domain randomization.
camera_intrinsic: <3x3floatmatrix> [optional] -- Intrinsic camera calibration. Empty for sensors that are not cameras.
# add arbitrary optional key-value pairs for sensor attributes
}
```
#### capture.annotation
An annotation record contains the ground truth for a sensor either inline or in a separate file.
An annotation record contains the ground truth for a sensor either inline or in a separate file.
A single capture may contain many annotations.
```
filename: <str> [optional] -- Path to a single file that stores annotations. (e.g. sementic_000.png etc.)
values: [<obj>,...] [optional] -- List of objects that store annotation data (e.g. polygon, 2d bounding box, 3d bounding box, etc). The data should be processed according to a given annotation_definition.id.
filename: <str> -- Path to a single file that stores annotations. (e.g. semantic_000.png, bounding_box_2d_000.json, bounding_box_3d_xyz.json etc.) The file contain data that can be processed according to the given annotation_definition.id.
}
```
A grayscale PNG file that stores integer values (label pixel_value in [annotation spec](#annotation_definitionsjson) reference table, semantic segmentation) of the labeled object at each pixel.
A grayscale PNG file that stores integer values (label pixel_value in [annotation spec](#annotation_definitionsjson) reference table, semantic segmentation) of the labeled object at each pixel.
#### capture.annotation.values
Sample image from [cityscapes](https://www.cityscapes-dataset.com/) dataset.
<!-- Not yet implemented annotations
##### instance segmentation - polygon
##### 2D bounding box - json file
A json object that stores collections of polygons. Each polygon record maps a tuple of (instance, label) to a list of
K pixel coordinates that forms a polygon. This object can be directly stored in annotation.values
Each bounding box record maps a tuple of (instance, label)
to a set of 4 variables (x, y, width, height) that draws a bounding box.
We follow the OpenCV 2D coordinate [system](https://github.com/vvvv/VL.OpenCV/wiki/Coordinate-system-conversions-between-OpenCV,-DirectX-and-vvvv#opencv) where the origin (0,0), (x=0, y=0) is at the top left of the image.
semantic_segmentation_polygon {
bounding_box_2d {
polygon: [<int,int>,...] -- List of points in pixel coordinates of the outer edge. Connecting these points in order should create a polygon that identifies the object.
x: <float> -- x coordinate of the upper left corner.
y: <float> -- y coordinate of the upper left corner.
width: <float> -- number of pixels in the x direction
height: <float> -- number of pixels in the y direction
-->
##### 2D bounding box
Each bounding box record maps a tuple of (instance, label)
to a set of 4 variables (x, y, width, height) that draws a bounding box.
We follow the OpenCV 2D coordinate [system](https://github.com/vvvv/VL.OpenCV/wiki/Coordinate-system-conversions-between-OpenCV,-DirectX-and-vvvv#opencv) where the origin (0,0), (x=0, y=0) is at the top left of the image.
<!-- Not yet implemented annotations
##### instance segmentation - polygon
A json object that stores collections of polygons. Each polygon record maps a tuple of (instance, label) to a list of
K pixel coordinates that forms a polygon. This object can be directly stored in annotation.values
bounding_box_2d {
semantic_segmentation_polygon {
x: <float> -- x coordinate of the upper left corner.
y: <float> -- y coordinate of the upper left corner.
width: <float> -- number of pixels in the x direction
height: <float> -- number of pixels in the y direction
polygon: [<int,int>,...] -- List of points in pixel coordinates of the outer edge. Connecting these points in order should create a polygon that identifies the object.
-->
A json file that stored collections of 3D bounding boxes.
Each bounding box record maps a tuple of (instance, label) to translation, size and rotation that draws a 3D bounding box, as well as velocity and acceleration (optional) of the 3D bounding box.
A json file that stored collections of 3D bounding boxes.
Each bounding box record maps a tuple of (instance, label) to translation, size and rotation that draws a 3D bounding box, as well as velocity and acceleration (optional) of the 3D bounding box.
All location data is given with respect to the **sensor coordinate system**.
```
* Consider cases for object tracking
* Consider cases not used for object tracking, so that instances do not need to be consistent across different captures/annotations.
* Consider cases not used for object tracking, so that instances do not need to be consistent across different captures/annotations.
A grayscale PNG file that stores integer values of labeled instances at each pixel.
A grayscale PNG file that stores integer values of labeled instances at each pixel.
Metrics store extra metadata that can be used to describe a particular sequence, capture or annotation.
Metric records are stored as an arbitrary number (M) of key-value pairs. For a sequence metric, capture_id, annotation_id and step should be null.
For a capture metric, annotation_id can be null.
Metrics store extra metadata that can be used to describe a particular sequence, capture or annotation.
Metric records are stored as an arbitrary number (M) of key-value pairs. For a sequence metric, capture_id, annotation_id and step should be null.
For a capture metric, annotation_id can be null.
For an annotation metric, all four columns of sequence_id, capture_id, annotation_id and step are not null.
Metrics files might be generated in parallel from different simulation instances.
sequence_id: <str> -- Foreign key which points to capture.sequence_id.
step: <int> -- Foreign key which points to capture.step.
metric_definition: <int> -- Foreign key which points to metric_definition.id
values: [<obj>,...] -- List of all metric records stored as json objects.
values: [<obj>,...] -- List of all metric records stored as json objects.
Ego, sensor, annotation, and metric definition tables are static during the simulation.
Ego, sensor, annotation, and metric definition tables are static during the simulation.
A json file containing a collection of egos. This file is an enumeration of all egos in this simulation.
A json file containing a collection of egos. This file is an enumeration of all egos in this simulation.
A specific object with sensors attached to it is a commonly used ego in a driving simulation.
```
#### sensors.json
A json file containing a collection of all sensors present in the simulation.
A json file containing a collection of all sensors present in the simulation.
Each sensor is assigned a unique UUID. Each is associated with an ego and stores the UUID of the ego as a foreign key.
```
```
#### annotation_definitions.json
A json file containing a collection of annotation specifications (annotation_definition).
A json file containing a collection of annotation specifications (annotation_definition).
Typically, the `spec` key describes all labels_id and label_name used by the annotation.
Typically, the `spec` key describes all labels_id and label_name used by the annotation.
name: <str> -- Human readable annotation spec name (e.g. sementic_segmentation, instance_segmentation, etc.)
name: <str> -- Human readable annotation spec name (e.g. semantic_segmentation, instance_segmentation, etc.)
description: <str,optional> -- Description of this annotation specifications.
format: <str> -- The format of the annotation files. (e.g. png, json, etc.)
spec: [<obj>...] -- Format-specific specification for the annotation values (ex. label-value mappings for semantic segmentation images)
annotation_definition.spec {
annotation_definition.spec {
label_id: <int> -- Integer identifier of the label
label_name: <str> -- String identifier of the label
pixel_value: <int> -- Grayscale pixel value
#### metric_definitions.json
A json file that stores collections of metric specifications records (metric_definition).
Each specification record describes a particular metric stored in [metrics](#metrics) values.
Each metric_definition record is assigned a unique identifier to a collection of specification records, which is stored as a list of key-value pairs.
A json file that stores collections of metric specifications records (metric_definition).
Each specification record describes a particular metric stored in [metrics](#metrics) values.
Each metric_definition record is assigned a unique identifier to a collection of specification records, which is stored as a list of key-value pairs.
name: <str> -- Human readable metric spec name (e.g. object_count, average distance, etc.)
name: <str> -- Human readable metric spec name (e.g. object_count, average distance, etc.)
description: <str,optional> -- Description of this metric specifications.
spec: [<obj>...] -- Format-specific specification for the metric values
}
* The schema uses [semantic versioning](https://semver.org/).
* Version info is placed at root level of the json file that holds a collection of objects (e.g. captures.json,
metrics.json, annotation_definitions.json,... ). All json files in a dataset will share the same version.
* Version info is placed at root level of the json file that holds a collection of objects (e.g. captures.json,
metrics.json, annotation_definitions.json,... ). All json files in a dataset will share the same version.
* The version should only change when the Perception package changes (and even then, rarely).
* The version should only change when the Perception package changes (and even then, rarely).
A mockup of synthetic dataset according to this schema can be found
A mockup of synthetic dataset according to this schema can be found
This is a mock dataset that is created according to this schema [design](https://docs.google.com/document/d/1lKPm06z09uX9gZIbmBUMO6WKlIGXiv3hgXb_taPOnU0)
This is a mock dataset that is created according to this schema [design](https://github.com/Unity-Technologies/com.unity.perception/tree/master/com.unity.perception/Documentation~/Schema/Synthetic_Dataset_Schema.md)