Narrative Science is now part of Salesforce!

Learn More

Blog Four Questions on Deciphering the World of Machine Learning Models

How would you compare and contrast how explainability and interpretability compare in contrast in AI and ML models?

By now, most people who buy, work in, or follow artificial intelligence (AI) and machine learning (ML) are aware of two broad categories of ML approaches. There are transparent and “Black Box” models. Transparent models are able to be understood by a human, while “Black Box” models are not. It’s actually a little more nuanced than that: it’s a spectrum, not binary.

As an example, let’s consider a classic ML regression problem: we have a training corpus consisting of many loan applications, plus whether or not a human loan officer decided to grant the loan. We now want to build an ML model where, given a new loan application, can predict whether or not a human would have granted the loan.

Transparent models

These are in a form that a human could follow exactly. Decision trees are fully transparent. A decision tree model of our problem would represent explicit decisions. For example “IF total_net_worth < 20,000 THEN deny loan”.

Interpretable models

A human could look at the model and straightforwardly understand what it is doing. In our example, we could build our model using a “bag of words” approach and then a simple ML algorithm like TF/IDF (term frequency–inverse document frequency, i.e. how important a word is).

This would give us a big list of words that appear in applications and their associated scores, which either make the system more or less likely to grant the loan. For example, positive words might look like {“family wealth”: 0.9462, “expansion”: 0.84716…} and negative words might look like {“bankruptcy”: -0.89817”, “default”: “-0.85617}. After the model is trained, a person could look at those list of words and “interpret” what the system is looking for and how much it’s weighting those words when considering a loan.

Explainable models

A human can do additional work to try and figure out what the model is doing. This is typically done as a separate, out-of-band process because the underlying model isn’t interpretable. Instead, the human creates a second model that is trying to determine and explain what the actual model is doing. In our example, you could model the loan granting/denying process with deep learning (DL). Deep learning is not itself explainable. Once you had your DL model, you would create a second model that (hopefully) provides information like “The DL model considers the person’s net worth and the size of the loan as very important when deciding whether or not to provide a loan.” More details on that later.

Black Box models

A human literally can’t understand what the model is doing. When I first learned about black box models, I thought the idea was ridiculous: how could we use math – that we know – to create a model that works well, but is impossible to understand? It didn’t click for me until I realized how much “thinking” I do in a day that I also can’t explain. I picked up a mug of coffee just now and took a sip. My brain did a bunch of work to figure out just how to form and close my fingers in order to pick up the mug by its handle. I can’t explain any of it to you, and none of it felt like a decision or reasoning. I have a model in my brain of how to safely pick up a mug of hot coffee, but I can’t explain it. It’s a black box.

What do you see as some of the biggest challenges in implementing these in practice?

The conventional wisdom here is something like: “transparent models are better from development, deployment, and privacy perspectives. Black box models perform better.”

This conventional wisdom isn’t right, though. There are definitely specific problems that we only know how to solve well using black box models. These are in areas like machine translation, speech-to-text/text-to-speech, and robotics. These are the kinds of problems that we humans solve unconsciously. If you say “Hello” and I hear you, my mind does many marvelous things to make me consciously aware of you saying “Hello”; my brain converts the vibrations of my ear bones into conscious understanding of your speech, and it generally works stunningly well. But I can’t tell you how I did it: it all happens un/subconsciously. These kinds of problems are great for black box techniques. They work well, and it’s not even clear what a transparent approach would mean here: what set of explicit instructions would a human being follow to transform a soundwave into a word?

But there are many other problems where a more transparent approach performs about as well. If the performance is similar, then the advantages of the transparent model generally win out. It’s easier to inspect more transparent approaches to confirm they’re picking up sensible features, and it’s always better to be able to answer questions like “Why did the model make this particular decision?”

Because deep learning (a black box approach) has been so successful, there’s been a rush to apply it everywhere, sometimes with impressive results. But this means applying black box models everywhere, which is often tough and can result in various negative consequences for the users of the system and society as a whole. As an example, I recently read an article about HireVue, which uses deep learning techniques to try and predict whether someone is a good candidate for a particular job. Applicants get zero feedback from the system, and even HireVue doesn’t know what the system is doing:

HireVue offers only the most limited peek into its interview algorithms, both to protect its trade secrets but also because the company doesn’t always know how the system decides on who gets labeled a ‘future top performer.’

This utter lack of feedback for the applicant is understandably disheartening, and would be resolved if HireVue used a more transparent approach that was able to provide feedback like “You had insufficient work experience” or “Your response to this question was not encouraging”.

As more transparent methods continue to advance and improve, I expect they will catch up to deep learning on problems where humans are required to make conscious decisions. Such as when hiring an employee, granting a loan, or determining parole status.

What are some of the best practices for implementing explainable AI?

In general, you want to be as transparent as you can. The first step is choosing an approach that outputs transparent/interpretable models, or making an investment in explaining your otherwise black box models.

But there are two other considerations that also weigh heavily: understanding why you want transparent models, and making a commitment to actually doing something with that transparency.

There’s a lot of reasons to prefer transparent models. They’re often more straightforward to develop and debug because you can see what the model is actually learning and course correct if appropriate. Transparent models also make it easier to determine if you have biased data. There’s a (probably apocryphal) story where the Department of Defense was trying to train a computer vision system to distinguish between ally and enemy tanks. They trained a black box model which worked great on the training data but then failed dramatically in the real world. The problem? The training photos of the enemy tanks were taken on a rainy day, while the training photos of friendly tanks were taken on a sunny day. It turns out the system wasn’t learning to distinguish between friendly and enemy tanks: it was distinguishing between rainy and sunny days! An explanation of the black box model, which would show which parts of the image the system was focusing on when making its determination, would have made the issue obvious to the developers before it was deployed in tanks.

Transparent models also help avoid or correct embarrassing / creepy / distressing learnings by the model. One example here is an internal AI system that Amazon developed to screen CVs from applicants. The system was trained on current employees and was in limited use for years before anyone realized that it was heavily biased against women. The system learned to hire employees that had CVs similar to the CVs of current successful employees, which had the effect of institutionalizing and operationalizing sexism. Amazon employees eventually discovered the issue because the model was interpretable: they could actually see the low scores the model applied to graduates of all-women’s colleges and how it penalized activities called things like “women’s club chess champion”.

Which raises the other important point: you have to actually do something with the transparency for it to have any value. Amazon had an interpretable system for years, but no one ever actually took the time to Interpret it and realize they had inadvertently created a sexist system. Transparent or interpretable models aren’t enough; they need to be paired with an intentional effort around examining and evaluating the model and making decisions based on those findings.

What do you see as some of the leading tools and projects that make it easier to implement explainability or interpretability and why?

Cynthia Rudin out of Duke is a key player in the academic field of interpretable models. She’s led a lot of research focusing on Interpretable models that are competitive with black box deep learning models. Regarding explanations, there’s a lot of interest academically in what are called SHAP values. These are a way of understanding which features a black box neural net is using to make its determination. Rudin and SHAP values are too much to get into here, but it’s easy to learn more online if you’re interested.

In general, we need to use black box deep learning approaches where they’re useful (the kinds of things we do unconsciously, without real consideration or understanding) and use more semantically meaningful, interpretable approaches everywhere else we can. This requires advancements in semantic reasoning, and in particular reasoning over the output of lower-level, black box models.

Learn More

Follow Along

Subscribe to our newsletter and stay up to date with all things Narrative Science