You wouldn’t trust someone that lies to you on a regular basis, would you? Of course not. That is why we must tread carefully when using large language models (LLMs) or other machine learning algorithms to answer questions for us. They might lie (or hallucinate, if you want to put it nicely) about anything and everything with no rhyme or reason. Major details in a response may be completely false, or there may be tiny inaccuracies sprinkled throughout an otherwise sound response.
The problem is that these falsehoods can be very hard to spot. LLMs, in particular, are notorious for giving confident answers that sound correct, even when they are not. If you want to be sure about a response, the only safe thing to do is independently fact check everything in it. But doing that negates the main reason that one would choose to use these tools in the first place — convenience.
In an effort to make machine learning more trustworthy, researchers have developed tools that can give a detailed explanation of a model’s reasoning process so that a trained eye can spot anything that may be of concern. But these detailed explanations are very, very detailed, to the point that they can be nearly impossible to make heads or tails of. Researchers at MIT recognized that if we are to trust machine learning models, we will need explanations of the explanations. Toward that goal, they have developed what they call EXPLINGO , which converts machine learning explanations into human-readable narratives.
This system uses LLMs, but limits their role to transforming existing SHAP explanations — visual representations that assign values to features impacting a model’s predictions — into readable text. This approach minimizes inaccuracies, as the LLM does not generate explanations from scratch but instead translates them into human-friendly formats.
EXPLINGO operates in two parts: NARRATOR and GRADER. NARRATOR generates the natural language descriptions by mimicking user-preferred styles, which are defined through three to five example explanations provided by the user. This customization allows it to adapt to different use cases and specific requirements. Once a narrative is created, GRADER evaluates its quality based on four metrics: conciseness, accuracy, completeness, and fluency. GRADER uses the original SHAP explanation and the generated text to assess whether the narrative effectively captures the explanation. The evaluation can also be customized, prioritizing specific metrics depending on the importance of accuracy or readability in the given context.
Testing EXPLINGO on various datasets showed that the system could produce reliable, stylistically adaptable explanations, provided the example narratives were carefully crafted. Looking forward, the team aims to improve EXPLINGO’s handling of comparative language and expand it into an interactive tool, enabling users to ask follow-up questions about model predictions. This additional work would empower users to critically evaluate and better trust machine learning predictions in real-world scenarios.EXPLINGO explains AI predictions to increase trust (📷: Jose-Luis Olivares, MIT)
Sample inputs and outputs to NARRATOR (📷: A. Zytek et al.)
A test run of GRADER (📷: A. Zytek et al.)