Dr. Lance B. Eliot, AI Insider
When my children were about kindergarten age, I told them about the mammal known as a platypus. I verbally described that it has fur, haswebbed feet like an otter, lives mainly in the water, has the tail of a beaver, it has a snout like a duck, and they would be unlikely to spot one here in California. From my description, I’m sure they were dubious that such a creature actually existed since it seemed like a descriptive mishmash of other animals that they were familiar with, and perhaps I was trying to pull a fast one on them (I had told them earlier about grunion and after numerous grunion hunts, we had yet to see one!).
A few months later, we went on vacation to a zoo and the moment we came upon an actual pen of platypodes, I was pleasantly surprised that the children immediately pointed at and exclaimed that we were witnessing a set of real platypuses in-person. I had not prompted them to be considering finding any platypuses at the zoo. I had not mentioned anything at all about any platypus beyond my 15-second description that I had casually mentioned to them, off-hand, while we were driving home from school one day those several months earlier.
Smart kids, I reasoned.
Let me give you another example of their genius (proud father, you betcha!).
We had coyotes that were sometimes near where we lived, and the children had seen them from time-to-time at a nearby open preserve. There was even one occasion whereby a coyote dared to come into the local community of homes and wandered throughout the neighborhood late one night. This created quite a stir and there was an impetus by the community leaders to establish ways to try and keep the coyote and any wandering brethren from coming in.
After my children had seen coyotes in and around our neighborhood and become accustomed to seeing these creatures, one day I showed the kids a textbook picture of a coyote and I also showed them a textbook picture of a wolf. I offered no verbal explanation of the similarities and differences between a coyote and a wolf. I let them observe the picture for themselves. I merely pointed out to them that there are coyotes, which they had already seen with their actual eyes, and there are wolves (we didn’t have any wolves nearby where we lived, thankfully).
You likely know that wolves tend to have round ears, while coyote tend to have taller pointed ears. Wolves tend to be larger than coyotes. There the differences start to get less distinguishable, since the fur of both types of animals is quite similar and in many other physical ways they appear very much the same. I could have mentioned that wolves tend to howl while coyotes tend to make a yapping sound, but in this case, I merely silently showed them a picture of the two types of animals.
Fast forward to a trip to the local snow-capped mountains, where we would go to try and get some skiing in (note, the city of Los Angeles itself gets no snow and thus if you want to ski outdoors, you need to go up to the local mountains, which is about a 2 hour drive or so; on some days, you can go surfing in the morning at the beach and then get up to the mountains to go skiing for the afternoon).
We were walking through the thick snow and suddenly a wolf came out of the woods and stood in front of us, maybe 20 yards away. It was surprising that the wolf would appear like this, since there were usually a lot of humans wandering around in this area. But we had stayed late, and it was getting dark, plus we were the only humans left in this particular spot, so perhaps the wolf felt like it was not in any particular risk or danger of making an appearance. I wasn’t sure what the intentions of the wolf were. It certainly startled me and took my breath away as I tried to decide what to do next.
Meanwhile, the kids both whispered “wolf” and they knew this was a dangerous predicament. I was somewhat surprised they had not said “coyote,” since we were generally used to seeing coyotes and it probably should have been the closest match to what we were now seeing in front of us.
Of course, they were right that it was a wolf. We waited a few moments and fortunately the wolf retreated back into the woods. I skedaddled out of there with the kids in rapid tow.
Why do I tell these two stories?
In the case of the wolf, the children had seen coyotes and so knew what a coyote looked like. I had shown them one picture of a wolf. From that one picture, they were able to identify a wolf when they saw one during our snowy adventure. You might say this is an example of one-shot learning. They had learned about wolves by merely having seen one picture of a wolf.
In the case of the platypuses, they had not seen any picture of a platypus and I had merely provided a verbal description. Yet, when seeing platypodes at the zoo, they right away recognized them. You might say this is an example of zero-shot learning. They had not seen any example of a platypus, thus they had zero visual examples to extrapolate from, but had used the description to be able to match what they saw at the zoo to the definition of the animal.
In traditional machine learning of today, most of the time we need to make use of thousands and upon thousands of examples of something to be able to train a Deep Neural Network or Deep Learning system on the item of interest. If you want to train an image processing system about platypuses via current Machine Learning (ML) techniques, you would gather up many thousands of pictures of platypuses and feed them into the system you had setup. Likewise, if you wanted to train the ML or DL on what wolves look like, you would need thousands of pictures of wolves.
When I say thousands, it could take hundreds of thousands of such pictures to try and get a solid matching capability of the ML or DL. This also would take a fair amount of computer processing time to undertake. You’d also want to have screened the pictures to make sure you are feeding the right kinds of pictures into the system. If you are feeding pictures of platypuses that also have say alligators in them, and if you aren’t carefully scrutinizing the ML or DL, it could end-up mathematically conjure up a notion that a “platypus” can look like a platypus orlike an alligator.
That won’t do you much good when trying to find platypus somewhere inside a picture that you later feed into the trained ML or DL system. Sure, it might identify platypuses, but if there happens to also be an alligator in any of those pictures, the ML or DL might falsely report that another platypus has been found in the picture that you submitted.
In fact, one of the dangers about blindly feeding inputs into a DL or ML during its training is that it might pattern match onto aspects that you did not intend to be included. There is a famous story of the pictures of military tanks that were fed into a DL or ML system. Some of the pictures were of United States tanks and some of the pictures were of Russian tanks.
At first, after the training was seemingly completed, the DL or ML could readily discern other test pictures of U.S. tanks and Russian tanks. The researchers thought they had finished the job. Turns out that the pictures of the U.S. tanks were pristine photos, while the Russian tanks were mainly grainy photos. The ML or DL had considered the background and overall look of the photos as part of the pattern matching effort, doing so in a mathematical way. Thus, if it was shown pictures of a Russian tank that was in a pristine photo, the DL or ML would sometimes classify it as a U.S. tank. Similarly, if it was shown a picture of a U.S. tank that was in a cloudy kind of photo, the DL or ML would sometimes mistake it as a Russian tank.
Many Examples Train Today’s DL or ML Systems
In any case, the point is that to train today’s DL or ML systems, you typically need to assemble a whole bunch of examples. This can be arduous to do. It can be costly to do. You need to make sure that the examples are representative of what you are trying to train for. You need to make sure that there is nothing extraneous that can potentially throw-off the pattern matching. You need to run the DL or ML for many iterations and chew-up lots of computer processing cycles, which can be costly. You then need to try and verify that what the pattern matching has found is something sensible.
Suppose instead that you could show a DL or ML an example based on just one picture. Imagine how easy it would be to then train the DL or ML. Here’s a picture of a wolf. Via that one picture alone, it would be great if the DL or ML was then essentially done being trained. With one-shot learning, you had avoided having to collect thousands of examples and dealing with all the other troubles of doing the training.
Maybe you don’t even have a photo of what you are trying to train the DL or ML on. Wouldn’t it be great if you could somehow just describe what you are wanting to have the DL or ML pattern toward, and henceforth it could find that for you. This would be akin to my description of a platypus, from which the children were able to discern them when they actually saw it in-person.
You have now been introduced to one of the most vexing problems facing today’s Machine Learning and Deep Learning. Namely, if humans can learn something by a one-shot or by a zero-shot, why can’t we get ML or DL systems to do the same?
It is said that children by the age of 6 have supposedly around 1 x 10⁴ number of categories of objects that they know about. Based on these categories, when they are provided with something new, the children seem to cognitively be able to model on it, even without having to see thousands of whatever the item is.
Think about it. Have you seen children sitting quietly and studying thousands upon thousands of photos of elephants to be able to figure out what an elephant most likely looks like? I don’t think so. And yet that’s what we are doing today to train the ML and DL systems that are being used in all kinds of ways.
There are some that suggest a one-shot can be flexible and that if you can do something in just a handful of examples it is about the same as doing so in one example only. Therefore, they lump those few-shots into a one-shot. They justify this by pointing out that they are not stretching the one-shot to be say a hundred examples or a thousand examples. Maybe a half-dozen or a dozen, they suggest, makes it pretty much the same as a one-shot.
I’ll be a bit of a stickler herein and suggest that one-shot should literally mean one shot, and offer that we can use these ways of depicting the aspects of Machine Learning in terms of the number of “shots” or examples that are needed:
- Zero Shot Learning (ZSL) = there are no learning exemplars used per se
- One-Shot Learning (OSL) = one exemplar is used to learn from
- Few-Shots Learning (FSL) = more than one exemplar is used for learning and some number less than a number like maybe ten exemplars or a dozen or so
- Many-Shots Learning (MSL) = more than the FSL, let’s say tens to perhaps hundreds to thousands of exemplars
- Mega-Many Shots Learning (MMSL) = more than MSL, let’ say many thousands and possibly millions of exemplars
The desire would be to try and always aim at the least number of exemplars that might be needed to do Machine Learning, which makes sense because the more exemplars you might need then the generally the greater the effort and cost involved in finding the exemplars, preparing the exemplars, and otherwise undertaking the whole ML process.
If possible, we’d like to minimize the effort/cost to arrive at the needed ML.
Is it going to be always possible to find a means to get the number of exemplars down to the zero (ZSL), 1 (OSL), or few (FSL) categories of shot learning? Maybe yes, maybe no.
Cognitive development studies of children tend to suggest that how words are learned via sounds involves babies hearing hundreds and thousands of words and sentences that are spoken to them or near to them. When you talk to a baby, even though you might assume the baby is not “understanding” what you are saying, it is actually pattern matching your spoken sounds.
When you speak aloud a sentence, there are short gaps of silence between your words, and another slightly longer gap of silence between your sentences. You are so accustomed to these gaps that you don’t even realize they exist. Babies hearing your spoken utterances are noting these silence gaps and garnering a kind of pattern matching about the nature of the spoken word. They grasp that words are shorter, and sentences are longer, and that sentences can have some number of these shorter sounding things in them.
I remember when my children were first born, people would speak to them in baby-talk, such as cooing at the them and saying nonsense sounds like ba-ba and boo-boo. Supposedly, these kinds of sounds are not going to help the baby formulate the kinds of “learning” best needed to understand true spoken language. You are making up some strange and out-of-sorts kind of nonsense language, which doesn’t do them much good, and you instead should speak to the baby in normal adult language, which allows the baby to then begin to learn true spoken language.
The point being that it would appear that to formulate an understanding of spoken language seems to require a budding mind to hear hundreds and likely thousands upon thousands of exemplars of spoken words and sentences. Can this be reduced to just 0, 1, or a few exemplars? It seems unlikely.
One aspect that we also need to keep in mind is the nature of the learning outcome.
Let’s consider my wolf example earlier. The kids said that the animal we saw in the snowy woods was a wolf. They got this right. Does this imply they learned what a wolf looks like? It is a bit overly generous to say so, because they might have just been wildly guessing. Maybe they had no idea of the differences between a coyote and a wolf. Instead, they might have somehow else labeled this creature that came out of the woods as a wolf.
We’ll also use the platypus example. I assumed that the children had mentally calculated that the creature we were seeing at the zoo had the requisite features of the otter’s webbed feet, the beaver’s tail, and the duck’s snout. Suppose instead the kids used solely the duck’s snout-like feature to label the animal as being a platypus. This is not going to be handy for future circumstances of them encountering other kinds of animals that also have a snout-like feature, for which my children might decide to call those as platypuses too.
Maybe if I were to have shown the kids pictures of platypuses, they might have landed on realizing that all three of the features were needed (webbed feet, snout, beaver’s tail). Could I have achieved this with just one such picture of a platypus? Or, would have I needed a few such pictures? Or, would I have needed hundreds or thousands of pictures?
Effort to Minimize Number of Exemplars Used for Machine Learning
The crux is that we want to try and minimize the number of exemplars used for Machine Learning, but the question arises as to whether we can get the same kind of learning outcomes by doing so. If you are able to get the ML system to “learn” based on one exemplar, such as a picture of a wolf, but if the learnt result is narrow and unlikely to be robust enough for our needs, the learning itself has been insufficient and the minimal number of exemplars hasn’t really aided our learning hopes.
Does this imply that the more the exemplars, the better off you will be? Suppose we line-up a dataset of a million pictures of dogs. All kinds of dogs. Big dogs, small dogs. Dogs that are happy, dogs that are sad. Dogs running, dogs walking, dogs sleeping. We feed these pictures into a Machine Learning system that we’ve setup.
After doing the training, we test the ML by feeding it some pictures of dogs that it had not been trained on. Let’s assume the ML reports that those are pictures of dogs. Great! Meanwhile, we decide to also now feed a few pictures of cats into the ML system. It reports that they are dogs! What’s this, a cat being mistaken for being a dog? The global union of cats will protest in droves, as they don’t want to be miscast as dogs.
It could be that the ML opted to identify that any four-legged creature was a dog. Thus, when it received the pictures of some cats, after having done the training on the million dog pictures, it found that the cats had four legs and therefore reported they were dogs. Easy peasy.
Using tons of training exemplars does not guarantee us the kind of learning outcomes that we might be desirous of reaching. Presumably, the more exemplars the better you will be in terms of potentially getting good learning outcomes, but it is not axiomatic that larger datasets means that you’ll get more robust learning outcomes.
There’s something else we need to factor into the Machine Learning aspects, namely time.
Have you ever done one of those “Escape The Room” challenges? You go into a locked room and need to find your way out. The first time you do so, the odds are that you might at first be confused as to what to do. How are you supposed to find your way out? If you’ve never done one before, you might be completely bewildered as to what to do and where to even start to find a way out.
Upon seeing someone else in the room that opts to look for clues, you likely realize that you too need to try and find clues. You are a fast learner! Yes, you went from being utterly baffled to the realization that there are clues hidden in the room and you must find the clues, from which you can then potentially find a way out of the room.
In this case you were time-boxed in that the room escape is usually timed and you only have a limited amount of time to find the clues and ferret out how to escape. There is the time needed to actually discover the clues, decipher them, and then use those clues to escape. There is also the time needed to “learn” how to cope with being inside an escape room and learning how to proceed to escape it.
Upon seeing an exemplar of the other person in the escape room that was feverishly looking for a clue, you quickly learned how to play the game. Sometimes we might sit in classrooms for weeks or months learning something, such as say calculus or chemistry. Sometimes we need to learn on-the-fly, meaning that there is a time crunch involved.
The learning can be unsupervised or it can be supervised. Inside the escape room, suppose the other person was so intent on finding clues that they did not explain to you what they were doing. All you had to go on was the aspect that the person was fervently looking around the room. In that sense, you learned they were looking for clues and did so in an unsupervised manner, namely the person did not guide you or explain what to learn. If the other person had told you that you needed to start looking for clues, and then perhaps told you to look behind the painting hanging on the wall and look under the desk, this would be more of a supervised kind of learning.
Back to the Machine Learning aspects, there is a trade-off of having to do supervised versus unsupervised learning. It could be that if the ML is supervised and given direction and pointers, it will have a better learning outcome, but this also requires usually added effort and cost versus the unsupervised approach. In the escape room, for every moment that the other person tries to tell you what to do, it perhaps is depriving them of seeking to find clues and aid the escape, therefore there is a “cost” involved in their supervising you versus if they had not done so.
Another factor involves what you already know and how your prior knowledge plays into what you are trying to learn anew.
Suppose my children had already known something about wolves. Perhaps they had seen cartoons on the Saturday morning TV shows that depicted wolves. These might have been simply cartoon-like wolves. Upon seeing the picture of an actual wolf, which I showed them along with the picture of the coyote, they now could connect together the actual wolf picture with the cartoon images of wolves they had already seen. In that case, they were leveraged in the learning because they already had prior background that was useful to the item they were newly learning.
Once you’ve done an escape room challenge, the odds are that the next time you do one, you’ll be more proficient. Furthermore, it might also mean that when you do the second one, you’ll be able to learn new tricks about how to escape a room, which layers onto the tricks you learned from the first time you did an escape room. Our prior foundation of what we know can be a significant factor in how well and how fast we can learn something new.
There are numerous attempts underway of trying to find ways to improve Machine Learning and Deep Learning to be able to do one-shot or few-shots kind of learning.
Siamese Neural Network Tries for One-Shot Goal
For example, the Siamese neural network is a variant on the use of neural networks that tries to deal with the one-shot goal. Taking its name from the concept of Siamese twins, you have two (or more) neural networks that you setup and train in the same manner. They are twins. You then have a conjoining element which is going to measure the “distance” of their outputs in terms of whether their outputs are considered quite similar versus being quite dissimilar.
Using a pair-wise comparison technique, you can use the Siamese neural network to compare two (or more) inputs and try to determine if they are likely the same or different. Let’s say I provide a picture of a dog and a cat. Based on a numeric vector that is output from each of the two neural networks, one receiving the dog picture and the other receiving the cat picture, the conjoining distance estimator would hopefully indicate that there is a large numeric difference between the outputs, which suggests the cat is not the same as the dog.
Another promising approach involves augmenting a Deep Neural Network with external memory. These Memory Augmented Neural Networks (MANN) leverage the connected external memory as a means to avoid various kinds of difficulties associated with neural networks that are being retrained. There is a chance during retraining of inadvertently “forgetting” prior aspects, of which, the external memory can potentially make-up for that deficiency.
There are other approaches such as the Hierarchical Bayesian Program Learning (HBPL) and other kinds of Bayesian one-shot algorithms that are being explored. One of the most popular datasets used in examining one-shot learning consists of using the famous Omniglot dataset, which consists of various handwritten characters and involves trying to do handwriting recognition from a sparse set of exemplars.
Efforts to seek one-shot learning are ongoing and eagerly are sought so as to reduce the burden involved in having to gather lots of exemplars, plus, it is hoped or assumed that the lesser number of exemplars needed will also reduce the amount of learning time needed.
Humans seem to have a capacity to do one-shot learning. It is not always perfect and people can readily learn “the wrong thing” based on a one-shot approach. Nonetheless, it seems to be a crucial cognitive capability and one that we humans depend upon greatly.
What does this have to do with AI self-driving cars?
At the Cybernetic AI Self-Driving Car Institute, we are developing AI software for self-driving cars. One aspect that we are exploring involves the use of one-shot learning for AI self-driving cars.
I’d like to first clarify and introduce the notion that there are varying levels of AI self-driving cars. The topmost level is considered Level 5. A Level 5 self-driving car is one that is being driven by the AI and there is no human driver involved.
For self-driving cars less than a Level 5, there must be a human driver present in the car. The human driver is currently considered the responsible party for the acts of the car.
Another key aspect of AI self-driving cars is that they will be driving on our roadways in the midst of human driven cars too.
Returning to the topic of one-shot learning, let’s consider how this kind of learning comes to play with AI self-driving cars.
When you first learned to drive a car, the odds are that much of what you learned was new to you, though it occurred within the context of a lot of other things that you already knew. You did not wake-up one morning with an empty noggin and suddenly find yourself sitting in the driver’s seat of a car. Instead, you brought to your learning about how to drive a car the various prior experiences of life and dealing with all kinds of aspects of being in this world.
For example, no one needed to likely explain to you that there are these things called streets and that cars can drive on them. I’d bet that you already knew this before you opted to turn the key and start the engine. You knew that there are other cars on the roadways. You knew that cars can go fast and they can go slow. You knew that there are turns to be made and various rules-of-the-road are to be observed and abided by. You likely had been a passenger in a car, many times before, and knew somewhat the nature of the act of driving. And so on.
Imagine if we found one of those hidden-in-the-jungle humans that has never had contact with the outside world, and we opted to put them behind the wheel of a car. They’d have no particular knowledge about streets, cars, and all of the rest of those aspects. It would be a steep learning curve for them to cope with how to drive a car. I don’t know of any such situations wherein someone from the hidden jungles has suddenly been asked to drive a car, and so for now let’s assume that by-and-large most people learned to drive a car when they already had a lot of prior knowledge generally about cars and what happens when you drive a car.
You typically learn to drive a car over an extended period of time, perhaps weeks or months in duration. With my children, I would take them to an empty parking lot at a mall, and they’d drive round and round for an hour or so. We’d do this repeatedly, a few days a week. Gradually, we’d build up towards trying to drive in the local neighborhood, doing so when the streets were relatively empty. After a while, I’d have them drive into community traffic situations and get used to that kind of driving. Eventually, we worked-up the nerve to go onto the crazed freeways at high speeds.
AI Self-Driving Cars Like a Hidden-in-the-Jungle Human
In terms of AI for self-driving cars, one of the key problems is that unlike humans, the AI we’re starting with has no semblance of what a teenager has about the nature of the world around them. The AI is like the hidden-in-the-jungle human, since it has essentially no background or prior knowledge per se about cars, streets, and all the rest. I would assert that the AI is even worse off then the human from the jungle, since the human from the jungle presumably has cognitive capabilities and we could likely readily teach the person about the nature of streets, cars, etc.
For Machine Learning aspects, the primary focus to-date in AI for self-driving cars has been the processing of sensory data. When the AI receives sensory data, the data needs to be analyzed to ascertain what it has to indicate about the world surrounding the self-driving car. There are visual images coming from the cameras and image processing needs to occur in an effort to ferret out whether there is a car ahead and whether there are pedestrians in the roadway. The same kind of sensory processing needs to be done for the radar, the LIDAR, the ultrasonic sensors, and any other kind of sensory devices on the self-driving car.
Somehow, we need to have the AI system “learn” to find in that sensory data the aspects needed to then be able to properly and safely drive the car. This involves being able to extract from the massive amounts of sensory data the elements that are important to be considered. Where is the street ahead? Where are other cars? Are those cars coming toward the self-driving car or away from it? Are there any potential collisions that might happen? Etc.
Let’s use the aspect of road signs to consider the kind of learning involved. We might setup a Deep Neural Network that we feed with thousands upon thousands of pictures of road signs for training purposes. This includes stop signs, caution signs, deer crossing signs, and so on. We are seeking to have the Machine Learning be able to find the patterns associated with each of these signs and therefore be able to spot it when we capture images from the cameras on the AI self-driving car.
Assuming we’ve done a good job of training the neural network, we’ll go ahead and include it into the on-board system of the AI self-driving car. Sure enough, when images are being fed from the cameras, the on-board neural network is crunching the image data and able to ascertain that it found say a stop sign. The sensory analysis portion of the AI system doesn’t especially act on the fact that it found a stop sign and merely passes this detection onward to the other processes of the AI system (it is up to the AI Action Planning portion to ascertain what to do about the detected stop sign, such as issuing car commands to bring the car to a stop).
Once we’ve loaded-up the neural network into the on-board system, we’re going to freeze it from learning anything new, which we might do because we’re concerned that if it was allowed to continue to “learn” while in-the-wild that it might learn the wrong things. We are worried that it could somehow change from considering stop signs of being stop signs to instead interpreting stop signs to be merely caution signs (in which case, this would be passed along to the AI Action Planner, which would not likely bring the car to a stop since it has been misinformed about the nature of the posted sign).
One of the problems with not allowing the Deep Neural Network to learn “on the fly” is that it might encounter posted signs it has not yet seen and thus not try to figure out what the sign signifies. It might simply report that there is an unknown sign up ahead and let the AI Action Planner figure out what to do about it.
I remember one time while my children were still novice drivers that we came up to a quite unusual road sign (one that I had not seen before either). The road sign said, “Turn Right to Go Left.” What’s that? Seemingly an oxymoron. But, it actually did make sense due to the aspect that there was a kind of dog leg to the right that curved back and around a partial roundabout, allowing you to ultimately go to the left, which otherwise you could not directly make a left turn legally.
It was the kind of roadway sign that you figure won’t make-or-break you, meaning that it wasn’t a life or death kind of matter. If you didn’t notice the sign, it meant that you would not be able to make a rather immediate left and would need to go down another block to make a left turn. When I first spotted the sign, I looked and could see that some drivers either did not see the sign or ignored it, and they proceeded up a block to make a desired left turn.
With further pride in my heart, I watched as my novice driver detected the road sign, offering a bit of a startled look about it, carefully judged what it meant, and opted to make the right in order to make the left. This was done smoothly and without any apparent confusion. I would also bet that in the future, if such a sign was ever detected again, it would be a natural now for my child to know what it intended.
I’d say it was a one-shot learning instance. The roadway sign was detected, interpreted, utilized, and now has become part of the repertoire of road signs known by my offspring.
What would an AI self-driving car do?
Assuming it had not already been trained on such a road sign, which I’d wager was unlikely to be in a normal training dataset, the Deep Neural Network would have likely detected that the sign existed, and would have identified where it was positioned, but otherwise would not have been able to categorize what the sign was. It would certainly indicate that it was probably not a stop sign, and not a caution sign, and not a deer crossing sign, and so on. It would be considered unlike those signs and instead be a sign that was unknown as to what was intended by the sign.
The AI Action Planner could take a chance and assume that the sign had no significance to the driving task at-hand. Suppose the AI Action Planner was hoping to turn left. It might opt to do what I had seen some other humans do, namely just proceed up a block and then make a normal left turn. In that manner, the AI got kind of lucky that the sign wasn’t something more onerous like “Abyss in 5 Feet” or something like that.
If possible, it would be handy if the AI system could learn on-the-fly and have figured out the meaning of the road sign. My novice teenage drivers were able to do so.
We Need One-Shot Learning for AI Self-Driving Cars
Essentially, we need to have one-shot learning for AI self-driving cars. I’d also go with the possibility of zero-shot learning and the few-shots learning. Any of those would be quite handy.
In this case, it was not a life or death kind of situation, but there might be such circumstances that the AI could encounter, and for which the lack of a one-shot learning mechanism might lead to complications or even injuries or deaths. I realize that some AI developers balk at my example and say that if the data from the self-driving car is being fed to the cloud of the auto maker or tech firm, using OTA (Over The Air) electronic communications, the cloud-based system might have been able to better interpret the newly encountered road sign and then push back into the AI self-driving car the aspects of what to do.
Realistically, it is not likely that the OTA would have had sufficient time to transmit the data, have the data crunched someplace in the cloud, devise an indication of what the sign meant, and then push down a patch into the AI self-driving car. That’s some kind of magic we don’t yet have.
Sure, ultimately, the hope is that the cloud-based systems will be collecting tons of data from fleets of AI self-driving cars, and that the cloud collected data will be analyzed and then “learnings” about the driving task will be shared among the fleet of cars. This though is something that will take days or maybe weeks of the system analyzing these large volumes of data. Plus, the odds are that the AI developers will need to be in-the-loop as part of the analysis and ascertaining what makes sense to add as “new learnings” into the on-board AI of the self-driving cars in their fleet.
Here’s then where we are at on this topic. The bulk of the core “learning” for driving a self-driving car is most likely going to take place before the self-driving car gets onto the roadway. It will have the core essentials of the driving task.
Once it is on the roadway, we want the AI to have the capability to do one-shot learning so that it can hopefully better cope with the driving task.
The one-shot learning is likely to occur in real-time. Therefore, there is a severe time constraint involved.
We are only likely to get one exemplar and not have the luxury of somehow having dozens or hundreds of them in-hand (there weren’t any other “Turn Right to Go Left” signs anywhere nearby and none that I had ever seen before in my many miles of driving).
The AI is going to need to “learn” in a likely unsupervised setting. There is nothing or no one around that can guide or explain to the AI what the one-shot signifies.
You might suggest that the AI could ask the passenger in the self-driving car and find out if the passenger knows what the sign means. Yes, this might be possible via the use of a Natural Language Processing (NLP) interface with the occupants of the self-driving car. But, suppose there are only small children that are occupants and they don’t have any clue about driving or road signs.
Or, maybe the occupants misinterpret the road sign and tell the AI that it needs to make a radical right turn immediately. Should the AI obey such a suggestion? Also, you need to consider that it might be somewhat disconcerting to the occupants that the AI has no clue what the sign says. I suppose you would weigh this reveal against the chances that the road sign is important and might lead to harming or killing the occupants, and thus revealing that the AI doesn’t know what the sign means might be a last-gasp attempt to avoid calamity.
Much of the one-shot learning being researched by AI developers focuses on image recognition.
This makes sense as image processing is a usage that we can all readily agree has potential value. If you are doing facial recognition, it would be better to do a one-shot learning over having to get a multitude of pictures of someone’s face. Humans seen to be able to see a person’s face one time, and then have a remarkable ability to pick that face out of a crowd, even though they may only have seen the face that one time and perhaps even a long time ago.
For an AI self-driving car, having one-shot learning for the sign recognition as an image processing only solution is not quite sufficient. The rest of the AI driving tasks need to also become “learned” about what the sign means. The AI Action Planner won’t have any set of driving aspects that apply to the newly detected sign and yet it is the part of the AI processing that must decide what driving tasks to next take, due to detecting the sign.
Thus, the one-shot learning has to permeate across the entire set of AI tasks being undertaken while driving the self-driving car. This is a much harder problem to deal with. If you only were dealing with being able to “recognize” the sign and categorize it, the question becomes what category to apply it to, and does the AI Action Planner have anything ready to do when encountering this potential new category.
A rather significant downside of any one-shot learning will be whether what has been learned in “correct” or not. I mentioned earlier that we might have a Machine Learning that pattern matches that any four-legged animal is a dog, and therefore classify cats as dogs. Suppose the AI on-board the self-driving car is able to do one-shot learning and in the case of this turn right to turn left sign, the AI “learns” that it should come to a stop and then try to make a left.
You might be shaking your head and asking why in the world would the AI “learn” that based on the sign it should come to a stop and try to make a left turn? Suppose that the AI self-driving car witnesses a human driven car up ahead that does just that, and the AI then falsely assumes that the posted sign and the action of that other car were correlated to each other. It might then henceforth assign the action of making a stop and an immediate left as the appropriate action when encountering the turn right to turn left sign.
This is the kind of difficulty associated with doing one-shot learning and doing so on-the-fly. It has rather obvious and potentially adverse consequences on the safety of the AI self-driving car and what it might do.
One-shot Machine Learning and its close cousins are a vaunted goal of AI.
There is still plenty of exploration and research to be done on this topic. It is handy to pursue because it will not only hopefully improve the capabilities of Machine Learning, it would seem likely that it will force us to further figure out how humans do one-shot learning. The more we can crack the egg of how humans think, it is a good bet that the more we have a chance of getting AI to be imbued with human-like intelligence.
Next time that you are trying to learn something, consider how many exemplars you need to figure out the matter. Our approach today of needing thousands upon thousands of exemplars for ML and DL does not seem like a viable way to always approach learning. Depending upon the foundation you are starting with, it should potentially allow you to leverage that basis and possibly do sensible and on-target one-shot learning. I think about this all the time and especially when I see a platypus.
For free podcast of this story, visit: http://ai-selfdriving-cars.libsyn.com/website
The podcasts are also available on Spotify, iTunes, iHeartRadio, etc.
More info about AI self-driving cars, see: www.ai-selfdriving-cars.guru
To follow Lance Eliot on Twitter: @LanceEliot
Copyright 2018 Dr. Lance Eliot