Briefing note

Explainable AI in fraud detection

How to use explainability to fight fraud

ByInès Zitouni, Jan Thiemen Postema, Dominik Sznajder, and Raymond van Es

22 December 2022

Artificial intelligence (AI) is a revolutionary tool transforming virtually every industry by automating an increasing variety of complex tasks. However, difficulties with explainability of AI slow down its adoption due to unverifiable results. As a result, there is a growing need for eXplainable Artificial Intelligence (XAI) whose results can be understood by humans.

XAI is a set of techniques that make AI models and their outcomes comprehensible to humans. In many contexts, it is useful or even necessary to understand the process behind the results of an AI decision. In fraud detection, for example, explanations of how or why the AI arrived at a particular conclusion can help fraud investigators pinpoint the source and type of a fraud.

In this brief, we provide a general overview of a fraud detection XAI model that we investigated for a health insurance company in the Netherlands¹.

Health insurance fraud is estimated to cost hundreds of millions of euros each year in the Netherlands. In particular, the decentralisation of the healthcare system since 2015 has resulted in an increase of fraudulent claims from healthcare providers.² Registration of a healthcare provider has since been simplified. It is, for example, not necessary to have a medical degree or patients. As a result, it is more difficult to control the legitimacy of the claims. Health insurance fraud can involve, for example, a healthcare provider billing for care that was not provided or billing for services that are reimbursed at a higher rate than appropriate for the services needed to treat the patient’s condition or disease.

How is AI used in fraud detection?

To tackle fraudulent claims, health insurers traditionally employ fraud investigators to verify the legitimacy of the claims. However, fraud is a relatively rare phenomenon, so most claims are legitimate. As a result, fraud investigators spend an extensive amount of time filtering out the non-fraudulent claims. Therefore, a typical fraud detection process consists of two steps: firstly, identifying the suspect claims and then reviewing them to determine whether they are legitimate. With the advancement of AI, insurance companies incorporate fraud detection models in the identification step to reduce the operational costs and increase efficiency of the verification step.

Straightforward fraud detection processes could utilise AI models to flag the legitimate claims that would be accepted automatically. The claims flagged as potentially fraudulent would be reviewed by the claim investigators to ensure they are legitimate and, more importantly, to provide a proof for the fraud.

Flowchart of an explainable AI model incorporated into a fraud detection process.

While fraud detection models reduce the number of legitimate claims that fraud investigators must process, this amount can still remain relatively high. In fraud detection, the false negative cases bear the highest risk because flagging a fraudulent claim as a legitimate one means that it is being automatically processed for reimbursement. For claims of a certain size, it can be less expensive to flag a legitimate claim as fraudulent because it will be reviewed by a human anyway. The cost of an error of missing a real fraudulent case can often outweigh the costs of investigating additional non-fraudulent cases.

Additionally, non-fraudulent claims are substantially more common than their fraudulent counterparts, i.e., the models are built on highly imbalanced data. For these reasons, fraud flags are usually set to trigger even if there’s only a slight suspicion of fraud. Therefore, fraud detection models are tuned to make fewer errors on fraudulent claims, leading to models that often confuse legitimate claims with fraudulent ones. These model error cases are called false positives. As a result, operational costs remain very high, as fraud experts are still faced with a large number of legitimate claims to review.

XAI can further improve this part of the fraud detection process. Namely, it can explain to the fraud investigator why a certain claim was flagged. This information aids the process of determining whether the claim is fraudulent, improving the efficiency of the investigation.

When investigating the legitimacy of a claim flagged as fraudulent by the AI model, fraud experts now have access to the model's explanations in addition to the model's probability, the information provided in the claim and other information such as historical data.

How can XAI help detecting false positives?

To help fraud experts detect false positives, we applied XAI techniques on a meta-learning model. XAI techniques such as the popular SHapley Additive exPlanations (SHAP), do not help directly detect false positives. Given that the model classifies both false and true positives as positives, the explanations are likely to be similar. After all, SHAP focusses on explaining why the model classifies it as a positive. For example, SHAP output on a fraud detection model will explain what features contributed to moving a claim from the overall probability of being a fraud in the data to the specific probability of being a fraud. Therefore, it could explain how a false positive and a true positive went from the overall probability of being a fraud to their respective final probabilities with possibly similar contributing features for both. This will be of no help to the fraud investigator in understanding why a particular claim was labelled as fraud and thus both cases consume investigatory resources.

A meta-learning model is an alternative model that learns from the controlled predictions of an AI model rather than from the raw observations. We can use a meta-learning model to help fraud investigators distinguish false positives in the set of claims that the AI model predicts as fraudulent. The result is a meta-learning model that predicts which fraudulent predictions from the AI model are true positives and which are false positives.

The meta-learning model is built with the subset of false and true positives from the data set used to train the AI model. The outcome variable is relabelled into false positives and true positives given the prediction of the AI model on the training data set. The meta-learning model is then trained on the relabelled data set, which then learns to distinguish false positives from true positives.

Implementing an XAI technique on the alternative meta model allows fraud investigators to quickly understand whether an alert is more likely to be actual fraud or a model error. While the explanations of the original model are about the model’s prediction that a claim is fraudulent or not, where the model makes a lot of errors, by contrast the explanations of the meta-learning model are about the model's prediction that a claim is a true positive or a false positive.

The XAI techniques can be applied to the meta-learning model in a similar way to a regular model. What the output tells us, however, is different. For instance, SHAP in the original model explains why the model predicts a certain probability that a claim is fraudulent. Whereas, with the meta-learning model, the output is now an explanation of why the model predicted a certain probability of it being a false positive.

Take the example of a legitimate claim, which has been flagged as fraudulent by the AI model. A fraud investigator will need to review it to determine whether it is fraudulent. The SHAP explanations from the meta-learning model will tell the expert why this claim is likely to be a model error. This provides a straightforward explanation of what the fraud expert is trying to determine.

Using SHAP on the meta-learning model can now explain how the false positive and the true positive diverged from the overall probability of being a true positive to their respectively predicted probabilities. Therefore, by using SHAP on the meta-learning model, it becomes clear which one is a false positive and what are the main drivers for such prediction.

XAI techniques used on the meta-learning model can therefore help to substantially reduce the time fraud investigators spend on reviewing the legitimate claims.

Conclusion

Models for highly imbalanced data, where minimisation of the false negative cases is the top priority, may produce extensive amounts of false positive outcomes by design. In such cases, XAI techniques can be harnessed to provide further explanation by differentiation of the false positives from the true positives.

For a fraud detection process it means that XAI techniques can indicate the reasons why a model expects a claim to be truly fraudulent. Such explanations can form a starting point for the claim investigators, which reduces their inspection time.

In other words, XAI techniques can help reduce the operational costs of the whole fraud process further by going beyond the automated fraud labelling. In particular, we found that combining XAI techniques with a meta-learning approach provides beneficial insights.

Milliman consultants have extensive experience in supporting clients in understanding their complex models. We provide individually tailored analyses and advise on developing models utilizing cutting edge technologies. In that respect, explainability of the recent modelling advancements is crucial for establishing their applicability in the insurance industry.

¹ Full research has been described in a thesis “Explainable Al in Practice. Explaining fraud detection model for decision support system” by Inès Zitouni in Master of Statistics and Data Science programme at KU Leuven.

² NOS (2 January 2020). Controle zorgfraude faalt, OM en verzekeraars luiden noodklok. Retrieved 8 December 2022 from https://nos.nl/nieuwsuur/artikel/2317099-controle-zorgfraude-faalt-om-en-verzekeraars-luiden-noodklok.

Explainable AI in fraud detection

How is AI used in fraud detection?

How can XAI help detecting false positives?

Conclusion

Explore more tags from this article

About the Author(s)

Inès Zitouni

Jan Thiemen Postema

Dominik Sznajder

Raymond van Es

We’re here to help

CHOOSE A LOCATION AND LANGUAGE