What Drives Research in Self-driving Cars? (Part 2: Surprisingly not Machine Learning)

In the first part of this article, I wrote about how two major events shaped research in self-driving cars: the DARPA Grand Challenges and Google’s Self-driving Car (hereafter: SDC) project. In this post, I will talk about my surprise at the unfulfilled yet pervasive promises of machine learning in SDC research.

SDC researchers often attribute the advancement in autonomous driving to the successful deployment of machine learning. For instance, Sebastian Thrun, a former robotics researcher at Stanford and one of the driving forces behind the “Google driverless car” project, told Wired Magazine that “data can make better rules”, implying that SDCs ought to learn from experience rather than pre-programmed rules. In a technology assessment of self-driving cars, the authors conclude that “allowing the vehicles to write their ‘own code’ through machine learning” (Moore/Lu 2011:6) will lead to more dependable SDCs because the behavior is derived from the experiences of the robot rather from the human developer’s intuitions.

The promises of machine learning in the field of self-driving cars are based on the belief that the computer will do most of the necessary programming by itself, thereby leading to more efficient and dependable programming solutions. In the field, any human involvement is often figured as a harmful and “inelegant” way of getting things done.

How do you teach a computer to see pedestrians?

For the remainder of this post, I will try to capture the practitioner’s perspective when deploying machine learning. For example, how do the researchers teach a self-driving car to detect pedestrians? The short answer is, computer systems are trained by labeled data sets. The first step is obtaining the training data. Researchers take the research vehicle for a spin in real-life traffic and record a video with a fixed mounted camera. The data may look like the video stills in middle row of the collection of pictures below. (These stills are actually taken from freely available training set provided by the German car manufacturer Daimler, better known for its Mercedes brand).

Image taken from Dariu M. Gavrila

The second step is labeling the data. Positive samples, a.k.a. regions of the still which show a pedestrian (top row), are labeled as such by a programmer. Negative samples, i.e. stills that do not include pedestrians, are labeled as well (middle row).

During the third step, the researchers train a classifier. The classifier is a piece of software which settles the question of whether or not a pedestrian is present in a region of the video stream. The training is conducted by feeding the software with positive and negative samples, i.e. pictures with and without pedestrians in them. A mathematical function is derived from all the samples that best mimics the prior labeling of the pictures.

To use the classifier for pedestrian detection, you ask it to process certain regions of the video stream. The successful detection of a pedestrian is usually visualized by a bounding box, like you can see on the pictures in the bottom row.

Now to complicate this story of of how machine learning is used, I would like to share three surprising discoveries which occurred during my fieldwork and which shed a different light on the promises of machine learning. That is, the researchers refrain from using machine learning even though they are members of an AI department. How do they justify their decisions?

Labeling is “inhumane” work

As we have seen during the previous example, machine learning may involve a considerable amount of work done by humans. Labeling the data in cases like these cannot be automated. During an ethnographic interview, a researcher told me that he single-handedly labeled 3000 pictures of cars to generate a large enough training set. When I asked whether he would delegate the work to students researchers in the future, he shook his head. He described this kind of work as “inhumane” (German: unmenschlich). He would neither ask students to perform it nor hire Amazon’s Mechanical Turkers to do it. In conclusion, contrary to the promise of simply delegating work to the machine, new forms of undesirable work crop up.

Obscurity of the Classifier

Another disadvantage of machine learning as compared to – what practitioners call – “manual programming,” is that machine learning algorithms are often black-boxes. One researcher explained that machine learning is unreliable and possibly dangerous because

You don’t know what it learns (German “Du weißt nicht, was er^ⁱ lernt)

The researcher argues that there might be situations in real-life traffic which are not covered by the training samples. Hence, in these cases, the behavior of the classifier and, by implication, the SDC is unpredictable. For example, a SDC might make an emergency stop because it falsely detects a pedestrian in its way or even worse, it fails to detect a pedestrian. Both cases might lead to accidents. What surprised me here was how machine learning is seen as a source of uncertainty and danger: the behavior of the classifier is, at least in part, obscure to the developer who designed it.

Machine Learning impedes tweaking

The opacity of the classifier (discussed above) has another crucial implication. In machine learning, unlike “manual” programming, tweaking the parameters of the classifier is difficult. Tweaking is a common – if also a somewhat contested practice – in robotics. Although it is considered “inelegant”, it is often necessary in order to get the robot to work. However, making changes to the classifier or to the training set may not only eradicate unwanted behaviors, but also the desired characteristics of the classifier. Hence, adjusting a classifier to new situations involves the risk of weakening its performance.

Footnotes:

i The literal translation from German is “he” (er) not “it”. This particular gendering of the classifier may be attributed to German grammar. However, it may also reflect the longstanding history of the computer/robot as a masculine artifact.