Achieving autonomous driving safely requires nearly endless hours of training software for every situation that may arise before putting a vehicle on the road. In the past, autonomy companies have collected hoards of real-world data to train their algorithms on, but it’s impossible to train a system to deal with edge cases based on real-world data alone. Not only that, but it’s time-consuming to collect, sort, and label all of this data in the first place.
Most self-driving vehicle companies like Cruise, Waymo, and Waabi use synthetic data to train and test perceptual models at a speed and level of control that is impossible with real-world data. Parallel Domain, a startup that has built a data generation platform for autonomy companies, says synthetic data is a critical component to scaling the AI that powers vision and perception systems and preparing it for the unpredictability of the physical world.
The startup just completed a $30 million Series B led by March Capital, which includes return investors Costanoa Ventures, Foundry Group, Calibrate Ventures and Ubiquity Ventures. Parallel Domain focuses on the automotive market, providing synthetic data to some of the major OEMs building advanced driver assistance systems and autonomous driving companies building much more advanced self-driving systems. According to co-founder and CEO Kevin McNamara, Parallel Domain is now poised to expand into drones and mobile computer vision.
“We’re also really doubling down on generative AI approaches to content generation,” McNamara told TechCrunch. “How can we use some of the advances in generative AI to bring a much greater variety of things, people, and behaviors into our world? Because the hard part here really is, once you have a physically accurate renderer, how do you actually build the millions of different scenarios that a car has to encounter?
According to McNamara, the startup also wants to hire a team to support its growing customer base in North America, Europe and Asia.
building a virtual world
When Parallel Domain was founded in 2017, the startup had a strong focus on creating virtual worlds based on real-world map data. Over the past five years, Parallel Domain has expanded its generation of worlds, filling them with cars, people, different times of day, weather, and all the different behaviors that make those worlds interesting. This allows customers – which include Google, Continental, Woven Planet and the Toyota Research Institute to Parallel Domain – to generate dynamic camera, radar and lidar data they would need to actually train their vision and perception systems and test, McNamara said.
Parallel Domain’s synthetic data platform consists of two modes: training and testing. During training, customers describe high-level parameters—for example, freeway driving with 50% rain, 20% at night, and an ambulance in each sequence—that they want to train their model on, and the system generates hundreds of thousands of examples that meet those parameters.
On the testing side, Parallel Domain offers an API that allows customers to control the placement of dynamic things in the world, which can then be plugged into their simulator to test specific scenarios.
Waymo, for example, is particularly interested in using synthetic data to test for different weather conditions, the company told TechCrunch. (Disclaimer: Waymo is not a verified Parallel Domain customer.) Waymo sees weather as a new lens that it can apply to all the miles it’s driven in the real world and in simulation, since it would be impossible to look at to remember all these experiences with any weather conditions.
Whether it’s testing or training, whenever Parallel Domain’s software creates a simulation, it is able to automatically generate labels corresponding to each simulated agent. This helps machine learning teams perform supervised learning and testing without having to go through the tedious process of labeling data themselves.
Parallel Domain envisions a world where autonomy companies use synthetic data for most, if not all, of their training and testing needs. Today, the ratio of synthetic to real data varies from company to company. Established companies with the historical resources to have collected lots of data use synthetic data for about 20% to 40% of their needs, while companies earlier in their product development process rely on 80% synthetic versus 20% real world leaving. to McNamara.
Julia Klein, a partner at March Capital and now a board member of Parallel Domain, said she believes synthetic data will play a critical role in the future of machine learning.
“Getting the real-world data you need to train computer vision models is often an obstacle, and there are delays in getting that data in, labeling it, and getting it into position which they can actually be used,” Klein told TechCrunch. “What we’ve seen with Parallel Domain is that they significantly speed up this process and also address things that you might not even get in real-world datasets.”