Driverless Cars Get Confused by Poses of Objects, Sit Up Straight Only

Dr. Lance B. Eliot, AI Insider

Image for post
Image for post
Researchers at Auburn University showcase the object poses problem

Take an object nearby you and turn it upside down.

Assuming that you’ve turned an object upside down, look at it. Do you still know what the object is? I’d bet that you do.

But why would you? If you were used to seeing it right-side up, presumably you should be baffled at what the object is, now that you’ve turned it upside down.

I’m guessing that you are puzzled that I would even suggest that you should be puzzled. Of course, you recognize what the object is. No big deal. It seems silly perhaps to assert that the mere act of turning the object upside down should impact your ability to recognize the object. You might insist that the object is still the same object that it was a moment ago. No change has occurred. It is simply reoriented.

Not so fast. Your ability as a grown adult is helping you quite a bit on this seemingly innocuous task. For you, it has been years upon years of cognitive maturation that makes things so easy to perceive an object when reoriented.

I could get you to falter somewhat by showing you an object that I had hidden behind my back and suddenly showed it to you, only showing it to you while it is being held upside down. Without first my showing it to you in a right-side up posture, the odds are that it would take you a few moments to figure out what the upside-down object was.

Turning an object upside down, prior to presenting it, can be a bit of a challenge to your identifying an object, when presented to someone, even for adults. Your mind tries to examine the upside-down object and perhaps reorients the object in your mind, creating a picture in your mind, and flipping the picture to a right-side up orientation to make sense of it.

AI Self-Driving Cars and Object Orientations in Street Scenes

What does this have to do with AI self-driving cars?

At the Cybernetic AI Self-Driving Car Institute, we are developing AI software for self-driving cars. One of the major concerns that we have, and the auto makers have, and tech firms have, pertains to Machine Learning or Deep Learning that we are all using today, and which tends to be ultra-brittle when it comes to objects that are reoriented.

This is bad because it means that the AI system might either not recognize an object due to the orientation of it, or the AI might misclassify an object, and end-up tragically getting the self-driving car into a precarious situation because of it.

I’d like to first clarify and introduce the notion that there are varying levels of AI self-driving cars. The topmost level is considered Level 5. A Level 5 self-driving car is one that is being driven by the AI and there is no human driver involved.

For self-driving cars less than a Level 5, there must be a human driver present in the car. The human driver is currently considered the responsible party for the acts of the car. The AI and the human driver are co-sharing the driving task.

Another key aspect of AI self-driving cars is that they will be driving on our roadways in the midst of human driven cars too.

Artificial Neural Networks (ANN) and Deep Neural Networks (DNN)

Returning to the topic of object orientation, let’s consider how today’s Machine Learning and Deep Learning works, along with why it is considered at times to be ultra-brittle. We’ll also mull over how this ultra-brittleness can spell sour outcomes for the emerging AI self-driving cars.

Take a look at Figure 1.

Image for post
Image for post

Suppose I decide to craft an Artificial Neural Network (ANN) that will aid in finding street signs, cars, and pedestrians inside of images or video streaming of a camera that is on a self-driving car. Typically, I would start by finding a large dataset of traffic setting images that I could use to train my ANN. We want this ANN to be as full-bodied as we can make it, so we’ll have a multitude of layers and compose it of a large number of artificial neurons, thus we might refer to this kind of more robust ANN as a Deep Neural Network (DNN).

You might wonder how I will come upon the thousands upon thousands of images of traffic scenes. I need a rather large set of images to be able to appropriately train the DNN. My best bet would be to go ahead and use datasets that already exist.

Indeed, some would say that the reason we’ve seen such great progress in the application of Deep Learning and Machine Learning is because of the efforts by others to create large-scale datasets that we can all use to do our training of the ANN or DNN.

This also though means there is a kind of potential vulnerability that is taking place, one that is not so obvious. . I’ll in a moment provide you with an example involving military equipment images that can highlight this vulnerability.

Once I’ve got my dataset or datasets and readied my DNN to be trained, I would run the DNN over and over, trying to get it to find patterns in the images. For example, those yellow lengthy blobs that have big tires and lots of windows are school buses.

I am hoping that the DNN is generalizing sufficiently about the objects, in the sense that if a yellow school bus is bright yellow, it is still a school bus, while if it is maybe a dull yellow due to faded paint and dirt and grime, the DNN should still be classifying it into the school bus category.

There is the famous story that highlights the dangers of making this kind of assumption about the manner in which the pattern matching is taking place. The story goes that there were pictures of Russia military equipment, like tanks and cannons, and there were pictures of United States military equipment. Thousands of images that had those kinds of equipment were fed into an ANN. The ANN seemed to be able to discern between the Russian military equipment and the United States military equipment, of which we would presume that it was due to the differences in the shape and designs of their respective tanks and cannons.

Turns out that upon further inspection, the pictures of the Russian military equipment were all grainy and slightly out of focus, while the United States military equipment pictures were crisp and bright. The ANN pattern matched on the background and lighting aspects, rather than the shape of the military equipment itself. This was not readily discerned at first because the same set of images were used to train the ANN and then test it. Thus, the test set were also grainy for the Russian equipment and crisp for the U.S. equipment, misleading one into believing that the ANN was doing a generalized job of gauging the object differences, when it was not doing so.

This highlights an important aspect for those using Machine Learning and Deep Learning, namely trying to ferret out how your ANN or DNN is achieving its pattern matching. If you treat it utterly like a black box, there might be ways in which the pattern matching has landed that won’t be satisfactory for use when the ANN or DNN is used in real-world ways. You might have thought that you did a great job, but once the ANN or DNN is exposed to other images, beyond your datasets, it could be that the characteristics used to classify objects is revealed as brittle and not what you had hoped for.

Considering Deep Learning as Brittle and Ultra-Brittle

By the word “brittle” I am referring to the notion that the ANN or DNN is not doing a full-bodied kind of pattern matching and will therefore falter or fall-down on doing what you presumably want it to do. In the case of the tanks and cannons, you likely wanted the patterns to be about the shape of the tank, its turret, its muzzle, its treads, etc. Instead, the pattern matching was about the graininess of the images.

Let’s liken this to my point about the yellow school bus. If the ANN or DNN is pattern matching on the color of yellow, and if perchance all of most of the images in my dataset were of bright yellow school buses, it could be that the matching is being done by that bright yellow color. This means that if I think that my ANN or DNN is good to go, and it encounters a school bus that is old, faded in yellow color, and perhaps covered with grime, the ANN or DNN might declare that the object is not a school bus.

One of the ways in which the brittleness of the ANN or DNN can be exploited involves making use of adversarial images. The notion is to confuse or mislead the trained ANN or DNN into misclassifying an object. This might be done by a bad actor, someone hoping to cause the ANN or DNN to falter.

One of the more startling examples of this adversarial trickery involved a one-pixel change that caused an apparent image of a dog to be classified by a DNN as a cat, which goes to show how potentially brittle these systems can be.

Remember that I earlier said that by using the same datasets we are somewhat vulnerable, well, a bad actor can study those datasets too, and try to find ways to undermine or undercut an ANN or DNN that has been trained via the use of those datasets. The dataset giveth and it taketh, one might say. By having large-scale datasets readily available, it means the good actors can more readily develop their ANN or DNN, but it also means that the bad actors can try to figure out ways to subvert those good guy ANN and DNN’s, doing so by discovering devious adversarial perturbations.

Suppose we trained an ANN or DNN with thousands upon thousands of images of yellow buses. The odds are that the pictures of these yellow buses are primarily all of the same overall orientation, namely driving along on a flat road or maybe in a parked spot, sitting perfectly upright. The bus is right-side up.

If we were to tilt the bus, a human would likely still be able to tell you that it is a school bus. I could probably turn the bus completely upside down, if I could do so, and you’d still be able to discern that it is a school bus.

Fascinating Study of Poses Problems in Machine Learning

A fascinating new study by researchers at Auburn University and Adobe provides a handy warning that orientation should not be taken for granted when training your Deep Learning or Machine Learning system. Researchers Michael Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei-Shinn Ku, and Anh Nguyen investigated the vulnerability of DNN’s, doing so by using adversarial techniques, primarily involving rotating or reorienting objects in images. These mainly were DNN’s that had been trained on rather popular datasets, such as ImageNet and MS COCO. Their study can be found here:

Given that for most real-world objects like school buses and cars, they are 3D objects, you can do the rotations or reorienting in three dimensions, altering the yaw, pitch, and roll of the object.

The researchers used a Photoshop-like technique, and took an image of a yellow school bus and tilted it a few degrees, and went further to conjure up an image of the bus turned on its side. To the human eye, these adversarial changes are blatantly obvious.

For the school bus, some of the reorientations caused the ANN or DNN to report that it was a garbage truck, or that it was a punching bag, or that it was a snowplow.

Of the objects that they decided to convert from their normal or canonical poses in the images, and reoriented to a different pose stance, they were able to get the selected DNN’s to do a misclassification 97% of the time. You might assume that this only happens when the pose is radically altered. You’d be wrong. They tried various pose changes and seemed to find that with just an approximate 10% yaw change, an 8% pitch change, or a 9% roll change, it was enough to fool the DNN.

Objects they studied included a school bus, park bench, bald eagle, beach wagon, tiger cat, German shepherd, motor scooter, jean, street sign, moving van, umbrella, police van, and a trailer truck. That’s enough of a variety that I think we can reasonably suggest that it showcases a diversity of objects and therefore is generalizable as a potential concern.

Variant Poses Suggest Ultra-Brittleness

Many people refer to today’s Machine Learning and Deep Learning as brittle. I’ll go even further and claim that it is ultra-brittle. I do so to emphasize the dangers we face by today’s ANN and DNN applications. Not only are they brittle with respect to the feature’s detection of objects, such as a bright yellow versus a faded yellow, they are brittle when you simply rotate or reorient an object. That’s why I am going to call this as being ultra-brittle.

In the real-world, when an AI self-driving car is zooming along at 80 miles per hour, you certainly don’t want the on-board AI and ANN or DNN to misclassify objects due to their orientation.

I remember one harrowing time that I was driving my car and another car, going in the opposing direction, came across a tilted median that was intended to protect the separated directions of traffic. The car was on an upper street and I was on a lower street.

I don’t know whether the driver was drunk or maybe had fallen asleep, but in any case, he dove down toward the lower street. His car was at quite an obtuse angle.

What would an AI self-driving car have determined?

What To Do About the Poses Problem

There are several ways we can gradually deal with this issue of the poses problem.

They include:

  • Improve the ANN or DNN algorithms being used

The Key 4 A’s of Datasets for Deep Learning

When putting together Machine Learning datasets that you’ll use for training purposes, you should think of the mixture in the following way:

  • Anticipated poses

It’s the 4 A’s of poses or orientations.

We want to have some portion of the dataset with the anticipated poses, which are usually the right-side up or canonical orientations.

We want to have some portion of the dataset with the adaptation poses, namely postures that you could reasonably expect to occur from time-to-time in the real-world. It’s not the norm, but nor is it something that is extraordinary or unheard of in terms of orientation.

We want to ensure that there are a sufficient number of aberrations poses, entailing orientations that are quite rare and seemingly unlikely.

And we want to have some inclusion of adversarial poses that are let’s say concocted and would not seem to ever happen naturally, but for which we want to use so that if someone is determined to attack the ANN or DNN, it has already encountered those orientations. Note this is not the pixel-level kind of attacks preparation, which is handled in other ways.

You need to be reviewing your datasets to ascertain what mix you have of the 4 A’s. Is it appropriate for what you are trying to achieve with your ANN or DNN? Does the ANN or DNN have enough sensitivity to pick-up on the variants? And so on.


Those of us in AI know that the so-called “object recognition” that today’s ANN and DNN are doing is not anything close to what humans are able to do in terms of object recognition.

Contemporary automated systems are still rudimentary. This could be an impediment to the advent of AI self-driving cars. The objects orientation poses problem is real and needs to be dealt with for real-world applications.

For free podcast of this story, visit:

The podcasts are also available on Spotify, iTunes, iHeartRadio, etc.

More info about AI self-driving cars, see:

To follow Lance Eliot on Twitter: @LanceEliot

Copyright 2019 Dr. Lance Eliot

Written by

Dr. Lance B. Eliot is a renowned global expert on AI, Stanford Fellow at Stanford University, was a professor at USC, headed an AI Lab, top exec at a major VC.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store