Open Source Can Be Hacked And Undermine Self-Driving Cars

Dr. Lance Eliot, AI Insider

Image for post
Image for post

[Ed. Note: For reader’s interested in Dr. Eliot’s ongoing business analyses about the advent of self-driving cars, see his online Forbes column: https://forbes.com/sites/lanceeliot/]

You’ve likely had to enter a series of numbers and letters when accessing a web site that wanted “proof” that you are a human being and that you were not some kind of Internet bot.

These challenge-response tests are known as CAPTCHA, which is an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart.”

The idea is that if a website wants to keep automated bots from accessing their site, there needs to be some means to differentiate between whether a human is trying to access the web site or whether it is some kind of automation.

There are numerous variations nowadays of CAPTCHA algorithms.

Some use just letters and numbers, while some also add into the mix a variety of special characters such as an ampersand and a percentage symbol.

You’ve likely also encountered CAPTCHA that ask you to pick images that have something in common. For example, you are presented with six images of a grassy outdoor field, and are asked to mark the images that have a horse shown in the image. These aren’t so easy because the horse will often be obscured or only a small portion of a horse appears in any given image.

The reason why the acronym of CAPTCHA mentions a Turing test is that there is a famous test in the field of AI that was proposed by the mathematician Alan Turing about how to determine whether an automated system could exhibit intelligence.

The test consists of having a human ask questions or essentially interview another human and a separate AI system, for which the interviewer is not privy beforehand as to which is which, and if the interviewer is unable to tell the difference between the two interviewees, we presumably could declare that the automation has exhibited intelligent behavior.

There are some that are critical of this test and don’t believe it to be sufficient per se, but nonetheless it is quite famous and regarded by many as a key test for ascertaining AI.

In the case of CAPTCHA, the Turing test approach is being used to see if humans can outwit a bot that might be trying to also pass the same test.

Whomever is able to figure out the letters and numbers is considered or assumed to be a human. Thus, if the bot can indeed figure out the CAPTCHA, it momentarily has won this kind of Turing test. I think we would all agree that even if some kind of automation can succeed in winning in a CAPTCHA contest, we would be hard pressed to say that it has exhibited human intelligence. In that sense, this is a small and extremely narrow version of a Turing test and not really what we all truly intend a Turing test to be able to achieve.

In fact, because the human is having to essentially prove they are a human by passing a CAPTCHA, some refer to this test as a Reverse Turing test. The limelight of a conventional Turing test is for the automation to prove it has human-like capabilities. In this reverse Turing test, it is up to the human to prove that they are a human and able to perform better than the automation.

There is a popular CAPTCHA algorithm used as a plug-in for many WordPress developed websites that is known as “Really Simple CAPTCHA.”

In a recent article about it, a developer showed how easy it can be to develop a simple AI system to be able to succeed at cracking the CAPTCHA challenges.

The developer wanting to crack it used the popular Python programming language, along with the OpenCV set of programs that are freely available for doing image processing, and Keras which is a deep learning program written in Python. He also used TensorFlow, which is Google’s machine learning library of programs (Keras uses TensorFlow). I mention the tools herein to be able to emphasize that the developer used off-the-shelf programming tools. He didn’t need to resort to some “dark web” secretive code to be able to proceed to crack this CAPTCHA.

The CAPTCHA program was readily available as open source and therefore the developer could inspect the code at will.

He then used the CAPTCHA to generate numerous samples of CAPTCHA images, doing so to create a set of training data.

After doing some transformations on the images, the developer fed the images into a neural network that he setup with two convolutional layers and with two hidden connected layers. According to his article, by just having ten passes through the training data set, the neural network was able to achieve full accuracy. He then tried it with actual new CAPTCHA generated by the “Really Simple CAPTCHA” code, and his efforts paid-off as it was able to figure out the letters and numbers. This particular article caught my eye due to the claim that from the start of this project to the finish it took just 15 minutes of time.

Now please keep in mind that this was a very simple kind of CAPTCHA.

What does this have to do with AI self-driving driverless autonomous cars?

At the Cybernetic AI Self-Driving Car Institute, we are using open source software to develop AI self-driving systems, and so are most of the self-driving car makers and tech firms, and this is both a boon and a danger.

As discussed about the CAPTCHA algorithm, it was available as open source, meaning that the source code for it was publicly available. Anyone that wanted to look at the source code can do so.

By looking at the source code, you can figure out how it works. By figuring out how it works, you are a leg-up on being able to find ways to crack it.

If you don’t use open source code, and instead develop your own proprietary code, you can try to keep the source code secret and therefore it is much harder for someone else to figure out how it works.

If an attacker does not know how the code works, it becomes much harder to try and crack it. This does not mean it is impossible to crack it, but merely that it is likely going to be harder to crack it.

Some refer to the open source approach as a white box method, while the proprietary code approach as a black box method. With a black box method, though you know what comes into and out of it, you don’t know what is going on inside the box to do so. Meanwhile, with a white box method, you know what goes into it and comes out, along with how it is doing its magic too.

Today, open source code is prevalent and found in an estimated 95% of all computer servers, along with being used in high profile systems such as the systems that run stock exchanges and the systems that run the International Space Station. Some estimates say that there is at least 30 billion lines of open source code available, but even that number might be understated.

Notably, open source is extensively used for AI software and many of the most popular AI packages today are available as open source.

Generally, there is an ongoing debate about the use of open source as to whether it is unsafe because of the potential for nefarious hackers to be able to readily inspect the code and find ways to hack it, or whether it is maybe safer than even proprietary software because you can have so many eyes inspecting it.

Presumably, something that is open to anyone to inspect can be seen by hundreds, thousands, maybe millions of developers, and that such a large number of reviewers will ensure that the open source code is safe and sound to use.

One caveat about using open source is the classic use-it-and-forget-it aspect that arises for many developers that decide to use open source code in their own systems.

Developers will go ahead and wrap the open source into a system they are building, and pretty much move on to other things. Meanwhile, if a hole is spotted in the publicly posted open source, and if there is a fix applied to the hole, the developer that grabbed the open source at an earlier time might not be aware of the need to apply the fix in their instance. This can happen readily by the aspect that the developer forgets they used that particular open source, or maybe they don’t become aware of the fix, or they no longer have anything to do with the developed proprietary code and others that are maintaining it don’t know that it includes the open source portions.

Currently, most of the automakers and tech firms are feverishly incorporating all sorts of open source into their AI of their self-driving cars systems.

It makes sense to do so, since otherwise you would need to reinvent the wheel on all sorts of software aspects that are needed for a self-driving car.

The cost to develop that same open source from scratch would be enormous. And, it would take time, lots of time, in order to create that same code. That’s time that nobody has. Indeed, there is a madcap rush today to achieve a true self-driving car, and no one developing self-driving cars wants to be left behind due to writing code that they could otherwise easily and freely get.

We do need to ask some serious questions about this.

Does the use of open source in the AI and the other software of the self-driving cars mean that we are laying ourselves bare for a substantial and really ugly security problem down-the-road, so to speak?

Are there nefarious hackers that are right now inspecting the self-driving car open source code and looking for exploits?

This open source conundrum exists for all aspects of self-driving cars, including:

  • Sensors — open source software for sensor device control and use
  • Sensor Fusion — open source software for sensor fusion
  • Virtual World Model — open source software for virtual world modeling
  • Action Planning — open source software for creating AI action plans
  • Controls Activation — open source software to activate the car controls
  • Tactical AI — open source software for self-driving car tactical AI
  • Strategic AI — open source software for self-driving car strategic AI
  • Self-Aware AI — open source software for self-driving car self-aware AI

Depending upon how a particular car maker or tech firm is building their self-driving car, each element is likely to either have open source in it, or be based upon some open source.

It is incumbent upon the self-driving car industry to realize the potential for exposures and risks due to the use of open source.

Self-driving car developers need to be make sure they are closely inspecting their open source code and not just blindly making use of it.

Any patches or fixes need to be kept on top of. We need more audits of the open source code that is being used in self-driving cars. And, overall, we need more eyeballs on reviewing the open source code that underlies self-driving cars. As mentioned earlier, it is hoped that the more “good” eyeballs involved will mean that any holes or issues will be caught and fixed before the “bad” eyeballs find them and exploit those holes.

If the bad eyeballs have their way, it will be not so much a CAPTCHA as a GOTCHA.

For free podcast of this story, visit: http://ai-selfdriving-cars.libsyn.com/website

The podcasts are also available on Spotify, iTunes, iHeartRadio, etc.

More info about AI self-driving cars, see: www.ai-selfdriving-cars.guru

To follow Lance Eliot on Twitter: https://twitter.com/@LanceEliot

For his Forbes.com blog, see: https://forbes.com/sites/lanceeliot/

For his AI Trends blog, see: www.aitrends.com/ai-insider/

For his Medium blog, see: https://medium.com/@lance.eliot

For Dr. Eliot’s books, see: https://www.amazon.com/author/lanceeliot

Copyright © 2019 Dr. Lance B. Eliot

Written by

Dr. Lance B. Eliot is a renowned global expert on AI, Stanford Fellow at Stanford University, was a professor at USC, headed an AI Lab, top exec at a major VC.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store