We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways data teams are tackling the challenges of this new world to help their companies and their customers thrive.
Over the years, some bold predictions have been made about the impact autonomous vehicles (AVs) will have on our daily lives. Researchers from the National Highway Traffic Safety Administration estimated that fully autonomous cars could reduce traffic fatalities by up to 94% by eliminating accidents due to human error. Meanwhile, “Science” magazine reported that introducing even just a small number of AVs onto the roads could improve overall traffic flow and reduce trip times. And the European Commission stated that transport will be “…cleaner, cheaper and more accessible to the elderly and to people with reduced mobility” as a result of fully automated and connected mobility systems.
It’s difficult not to get excited about this future. However, according to experts, it might be further away than we’ve been led to believe. Real-life AVs are a huge undertaking, composed of regulatory hurdles, programming and data challenges, and a massive culture shift. They will change the world in ways that only science fiction writers and futurists have envisioned, and data will play a huge role in that story.
Baby steps for self-driving vehicles
There’s been huge buzz around self-driving cars for years now, and countless startups and established car companies have set to the task of handling every part of the AV puzzle. Despite the extraordinary efforts of many of the biggest automotive industry players, fully autonomous cars are still inaccessible, except in special pilot programs.
“While we are already seeing a small number of AVs being tested on our roads today, they have limited capabilities and can only drive in very specific conditions,” explained Ryan Pietzsch, a driver safety education expert with the National Safety Council, a not-for-profit organization promoting health and safety in the U.S. “I liken this to the advent of the cellphone. The first cellphones had limited abilities, and their coverage was extremely narrow. They still don’t work in all areas, but the network is much improved. We are finding the same with AVs. The reality is that most cars on our roads have a very low level of autonomy.”
The Society of Automotive Engineers defines six levels of driving automation, from 0 (fully manual) to 5 (fully autonomous). These levels have been adopted by the U.S. Department of Transportation.
“There really are no autonomous vehicles operating today,” said Mike Ramsey, vice president and analyst for the automotive and smart mobility division at analyst firm Gartner. “There are some automakers, like Tesla and GM, that offer systems that can handle some driving tasks but require people to be paying attention. These are so-called level 2 vehicles. What most people think of as autonomous vehicles are what we would call level 4 or 5, where the car drives itself and a human never has to pay attention. Level 3 is conditional automation, where the car could drive itself in specific areas, but where a person would have to take over when those conditions aren’t met. There are some researchers who think that this type of system won’t really be possible because people can’t be trusted to pay enough attention to the environment.”
There are numerous challenges to the safe introduction of fully autonomous cars to roads filled with human drivers. However, the experts agree that there is one critical enabler in expediting their adoption — data.
Data is the dealbreaker
“Data is a critical factor in getting to where we need to be,” explained Ramsey. “AVs are the most advanced version of artificial intelligence (AI) that we are working on right now and require an enormous amount of data to do machine learning to improve the computer’s ability to understand the world and make decisions. Almost a limitless amount of data is required to train vehicles because what we are trying to do is duplicate how a human mind works.”
Cara Bloom, a senior cybersecurity engineer at not-for-profit organization Mitre Labs and former staff researcher at Carnegie Mellon University’s CyLab, agreed. “Computers are the new drivers,” she said. “Much of the data that is used to drive non-autonomous vehicles is the exact same data that AVs will use, but the difference is in what processes that data: a person or a computer. The road conditions, signage, weather, maps, and predictions about other cars on the road are all ‘data’ that both people and computers must process to drive safely. But AVs won’t just use data; they will create it and use the new information to make new decisions — some of which are not decisions we have been afforded before.”
“Data is extremely important to the improvement of all advanced driver-assistance systems and autonomous features,” added Pietzsch. “Specifically, data is advancing the improvement of sensor systems such as light detection and ranging (LiDAR), as well as sensor performance — reducing false activations and improving overall autonomous performance. When we see level 4 or level 5 AVs on our roads, it will be because of data engineers’ ability to collect the correct data, interpolate the data correctly, invest in hardware changes when needed, and implement successful changes and improvements.”
Data centers on wheels
For level 4 and 5 AVs to become commonplace on our roads, it’s clear that more computing power is needed — especially when you consider that today, even at lower levels of autonomy, connected cars generate around 25 gigabytes of data per hour. AVs of the future will require different types of storage — and lots of it — to gather data from LiDAR, radar, cameras, and other sensors as well as in-vehicle infotainment, navigation systems, and maintenance data. In fact, according to forecasts by Western Digital, the storage capacity per vehicle could amount to 11 terabytes by 2030.
“The most advanced prototypes of level 4 and 5 AVs carry huge computers,” said Ramsey. “These computers need to get smaller so that processing can be done in the car itself — this is important to reduce the amount of time lag and the cost of transferring data to the cloud.”
Ramsey said that, while all real AI and machine learning (ML) processing is done in the cloud right now, this will change. “While we won’t get to the stage where cars will do most of the heavy lifting and ML onboard, what we will see is real-time data analytics in vehicles. For example, an AV driving down a street will recognize a feature of the neighborhood that isn’t in its HD map and react accordingly. If it has to do this repeatedly, then it will make an adjustment on board and send information to the cloud, but it will have already adjusted its behavior based on what it sees in its environment.”
Meanwhile, Pietzsch said, further advances need to be made in how data is retrieved remotely. “Some progress is being made already,” he said. “Advancements in data sharing to the cloud will greatly improve accuracy and advancement of ML. We are starting to see software updates based on ML being sent directly to vehicles through satellites. This provides the most up-to-date technology to the vehicle, which is important if we ever move away from having an engineer physically in the AV directly plugged into the computer.”
Security and privacy concerns
For Bloom, however, the biggest hurdles to getting autonomous cars on our roads revolve around the privacy and security of data. “Because AVs collect data in public where there is little ‘reasonable expectation of privacy’, they are not subject to many of the privacy laws in the U.S. and abroad,” she explained. “The data collected by AVs in the U.S. will likely be owned by the collector of the data, not the data subject. The data subjects themselves are unlikely to have the option to opt out of data collection on public roads by AVs or other sensors, except to avoid such sensors entirely.”
Bloom also pointed out that if AVs are in a collective fleet, such as for ride-sharing, the data could be centralized, stored, analyzed, and sold for profit (as has happened with other centralized data aggregators). “If AVs have facial recognition and license plate recognition systems, that data could be used to surveil populations and sold for profit — in addition to being used for socially beneficial purposes such as safety and traffic management,” she said. “For example, is it OK if a fleet of AVs collect license plate data to track down a vehicle that’s involved in an Amber Alert? What if this data is also used for open warrants? For insurance company premiums? Advertising? Since it is infeasible for people to opt out of all data collection by AVs, it is essential to fulfill their expectations upfront to prevent harm. Makers of AVs will need to determine what acceptable and safe data use is before implementing these technologies. If not, they could face backlash from consumers and regulators.”
AVs will also need advanced encryption schemes and stringent technical and policy measures to protect the location privacy of the passengers, Bloom said. “Without security, the vehicles will not be safe or trustworthy: They could be rendered inoperable by ransomware, used to surveil populations, or intentionally endanger passengers and others.”
A call for collaboration
In addition to the aforementioned concerns, the experts agreed that, in order to make better progress and realize the many benefits of AVs, the industry needs to better collaborate.
“Industry collaboration is undoubtedly key to future success,” said Ramsey. “Labeled data is so critical to train machine learning models to develop and deploy AVs.”
Thankfully, progress is being made in this direction. Earlier this year, Waymo (formerly the Google self-driving car project) and Ford released open datasets of information collected during AV tests and challenged developers to use them to come up with faster and smarter self-driving algorithms. Meanwhile, U.S. startup Scale AI, in collaboration with LiDAR manufacturer Hesai, launched an open-source dataset called PandaSet that can be used for training machine learning models for autonomous driving.
The U.S. Department of Transportation has also been working with stakeholders to prioritize and facilitate the iterative development of voluntary data exchanges to accelerate safe integration of AVs. Improving access to work zone data is one of the top needs identified.
“We launched the WZDx Specification to jump-start the voluntary adoption of a basic work zone data specification through collaboration with data producers and data users,” explained a spokesperson from the organization. “Longer term, the goal is to enable collaborative maintenance and expansion of the specification to meet the emerging needs of [automated driving systems].”
Work is also underway to facilitate the sharing of key mapping data. “All players operating in the self-driving vehicle industry need to agree on defining how mapping data can be shared between companies and authorities, to speed up the development of safe self-driving vehicles, without hindering competition,” stated a recent report by British-government-backed AV accelerator organization Zenzic. “Merging mapping data from regional sources requires streamlining to avoid multiple different ways of processing and handling data. Mapping data quality, specifically accuracy and precision of such data, is seen to be more important than resolution.”
The Zenzic report advises the connected and self-driving technology industries to follow the gaming, weather, and building information modeling sectors in a quest for common terminology.
The road toward an autonomous future
Creating a safe and successful AV industry is likely to bring huge economic and social benefits to consumers and industry alike. Major automaker companies, technology giants, and specialized startups have already invested more than $50 billion in AVs over the past five years, and their investments will only continue in the years to come.
For Pietzsch, it’s money well spent. “There’s a lot still to learn, but as our knowledge of data science expands, so too will the development of AVs,” he said. “This will also have far-reaching implications for other areas. After all, data science impacts our daily lives. The lessons that NASA has given us over the years have spawned countless opportunities and have influenced industries from adhesives to transportation. Similarly, I expect that the science that is occurring in the development of AVs will create opportunities and spark advancements in data analytics that can be used in other industries, and other industries that are not currently in the AV area may find themselves in it.”
Lindsay James is a journalist and copywriter with over 20 years’ experience writing for enterprise business audiences. She has had the privilege of creating all sorts of copy for some of the world’s biggest companies and is a regular contributor to The Record, Compass, and IT Pro.