Dataset Generation with NVIDIA Omniverse™

Introduction

In the field of robotics, high-quality datasets are crucial for training accurate models and developing robust systems. However, generating these datasets can be a time-consuming and costly process. Traditional methods often rely on expensive data acquisition or limited resources, hindering the progress of robotic research and development.

As you can see in the following figure 1, agricultural activity and industrial food processing automation tasks suffer from these challenges. Either because the window to acquire data is limited (growing and harvesting cycles) or there is not enough data variance to train the models effectively for the posterior automation tasks. Not to mention the challenges of the data acquisition itself.

Figure 1: apple plantation.

In this case study we aimed to build an end to end software solution that allowed us to detect fruit. We wanted to answer the question of whether synthetic datasets can be used to train a model and then integrate it into an object detection pipeline using ROS 2 in real life.

The challenge

Typically, these pipelines suffer from four different challenges:

  1. Lack of datasets: processes are custom, specific, with particular conditions and ranges. There are no datasets for each possible automation candidate in the world. This leads to the following item.
  2. Expensive data acquisition: because there is no data, as a practitioner you start acquiring it. You then need to annotate it, the data sample could be not representative of the process variance, and many other issues related to the sampling and acquisition details that you will face down the road.
  3. Development: it is expensive when you need to be onsite, you need to build a substitute of the automation scene or try to replicate the best you can the working conditions to validate your solution.
  4. Integration: typically requires to go back over the previous stages, not just once, to iterate the solution.

We do not neglect the need of physical, end to end testing, but what if there could be a way to reduce the impact of the identified challenges?

What is NVIDIA Omniverse™?

NVIDIA Omniverse™ enables the creation of photorealistic virtual environments and objects. It provides a comprehensive platform for generating synthetic datasets, which can be used for training machine learning models and testing robotic systems. On top of that, the Isaac modules provide ROS and ROS 2 integrations which make it simple to exchange data with the simulation platform and use it as a traditional robotics simulator.

For this project, we are interested in the following characteristics:

  • Real-time rendering and physics simulations
  • Highly detailed 3D models and textures
  • Support to randomize scenes and control environment and simulation asset variables

These features make NVidia Omniverse™ an ideal solution for dataset generation in robotics, enabling the creation of highly realistic and diverse synthetic environments. Additionally, please note that the same platform used for dataset generation can be used for simulation. This reduces the effort of maintaining multiple softwares for doing similar things and potentially duplicating assets.

Our solution

Architecture

Starting with a simple but clean architecture, we wanted to quickly validate the perception pipeline in simulation and have the ability to switch back and forth between simulation and the real setup. We proposed the following architecture:

Loading graph...

graph TD
    A[Real Camera / Simulation] --> B[Fruit Detection Node]
    B[Fruit Detection Node] --> C[Visualization dashboard]
    A[Real Camera / Simulation] --> C[Visualization dashboard]

The arrows represent ROS 2 topics. In case of camera topics, they would consist of Image and CompressedImage messages. The camera component can be swapped by the simulated sensor camera by means of providing the same ROS 2 interface. Regarding the real hardware, we used an olixVision Camera™ device. There are particular and interesting features of this device that are worth mentioning:

  • Native ROS 2 interface featuring multiple DDS vendors.
  • Integrated IMU sensor with ROS 2 interface as well.
  • Real time Linux based kernel.
  • Ethernet over USB interface which provided both energy and quick interface setup.
  • A collection of lenses that could be adapted for our application.

Together with NVIDIA Omniverse™ which allowed us to configure the lens and sensor parameters we could simulate accurately the camera sensor we had in the real setup.

Finally the fruit detection node will host the detection model. It takes raw images from the camera, processes them and makes it through the detection algorithm to detect the food. Further discrimination parameters are introduced to tweak the final detection output.

Hardware setup

Figure 2 shows the real hardware setup used to try the model together with the camera. In effect, we used this setup to simulate the system and validate any error before deploying to the real hardware.

Figure 2: hardware setup.

Provided that we want to run it in real time we need to run the ML code in the GPU. For that, we used a powerful laptop which helped us with all the involved tasks in the pipeline. See the reference section at the bottom for more details about it.

Dataset acquisition

This process required to randomized the scene variables to capture enough variance such that we can fine tune a detection algorithm later on. We focused on three main aspects:

  • Object pose randomization: provided the asymmetry in the food models, image distortions, and other aspects related to the object pose we needed to randomize position and orientation. Total and partial occlusions are important in our case as well, as we wanted to work with a system robust to that as well.
  • Object scale randomization: some models are not robust to changes in scale so we achieved that by randomizing distance to the camera in the previous item.
  • Lighting conditions: the simulator offers multiple lighting parameters and light sources which allowed us to have a wide range of lighting conditions.

All things considered, gathering thousands of annotated images ready to be processed is a matter of minutes of processing.

The model

We used Faster R-CNN neural network which is available in PyTorch. Using the synthetic dataset generated in the previous step, we fine tuned the model and validated it against a portion of the dataset. We included also a web dashboard to better visualize the evolution of the training metrics as the process evolved. This is particularly useful for ML practitioners.

Once the model was ready, we integrated it into a ROS 2 Node and wired the interfaces.

Simulation

Using the same simulator scene, camera model and synthetic assets, we could bring up a simulation and wire up the detection node to validate system integration and performance. We could perform latency measurements and other performance metrics as well to determine the minimum camera frequency and other detection parameters beforehand.

Results

This case study helped us to showcase how powerful NVIDIA Omniverse™ is a synthetic dataset generation tool as well as a robotic simulator platform. We fine tuned with pure synthetic data a model and integrated it using ROS 2 interfaces into a detection pipeline. We got a solution capable of running at real time multiple object detection and image annotation with very little integration effort. This case study helped us to address the identified challenges:

  1. Lack of datasets: we could generate small to large annotated datasets with a few configuration changes randomizing multiple variables in ranges that we totally controlled.
  2. Expensive data acquisition: generating 5000 images takes only 15 minutes!
  3. Development: we created a development pipeline that allowed us to quickly understand the changes in previous stages and solve bugs, and even fine tune detection parameters before hardware integration.
  4. Integration: thanks to the olixVision Camera™ it took us just a few minutes to setup everything and run the system to solve the integration details.

See in the following video how the system performed in simulation:

And now the real file system performing:

We hope this case study helps organizations to try out synthetic dataset generation technologies to augment their data pipelines and use more simulation technologies to reduce operational costs. At Ekumen, we are always pleased to receive inquiries and collaboration proposals via our contact page to help you implement and leverage these technologies in your organization.

References

The project is accessible in this Github repository. Find in the README multiple references to the hardware setup, and detailed instructions to help you get started, run the code either in simulation or with a similar hardware setup (you can even try it with your webcam!). See the contribution guidelines for bug reports, feature or pull requests.