Precision, effectiveness, and the compliance dilemma!

Attending the FinCrime World Forum (virtually) today and listening in to one of the panel sessions, I was reminded how often people confuse system precision with system effectiveness. The confusion is made worse in the world of anti-money laundering (AML) and compliance as the industry lacks a reliable way to measure the true effectiveness of systems.

Precision: Precision is a measurement of how efficient a system is. In the world of compliance, precision is usually termed the false positive rate of a system. In simple terms, this is measured as follows:

False Positive Rate = Bad Actor Alerts / Total Alerts
False Positive Rate = True Positives / (True Positives + False Positives)

As an example, if my bank’s AML solution generates 1000 alerts a month in total and if operational teams find 100 alerts related to bad actors (true positives) and these are escalated for reporting or further investigation then the false positive rate of this system would be 100/1000 or 10%. The system’s precision is 10% as it gets the right answer (finds a true positive) on average once for every ten alerts generated (1 in 10).

Improving precision: This, in theory, is easy! Remove the unwanted alerts generated against legitimate customers (the erroneous false positives) and maintain the same number of alerts generated against the bad guys (true positives). Many vendors are now offering artificial intelligence and machine learning methods that attempt to do this.

In the example, if we can reduce the total alerts generated each month to 500, but still capture the same 100 alerts on the bad guys, then the system precision becomes 1 in 5 or 20%. Great news, the precision of the system has improved!

The system is now more precise, investigations can be performed more efficiently as there are fewer alerts to review, but the improved precision has done nothing to change the effectiveness of the system. Before optimization, the system generated 100 alerts against bad actors and after optimization, it still generates 100 alerts against these same characters and so the system effectiveness is unchanged. The system is more precise but no more effective.

A subtlety, and an error I have seen a number of times at institutions that should have known better, is that improving precision can often make effectiveness worse!

Improved precision can mean lower effectiveness: A naive team of data scientists might run an algorithm that reduces the alert rate to 250 alerts each month but now catches only 75 of the bad actor alerts (true positives). The precision is now 75/250 or 30%, which means even more efficiency and potential cost savings but this comes at a penalty in that the system is now less effective. There are now only 75 true positives alerts detected and the system is missing 25 other true positive, bad actor, alerts that it would previously have detected. So be careful!

Now that we’ve discussed how system precision can be measured and what it means for efficiency and false-positive rates, we can turn to the more difficult issue of measuring effectiveness? Now, this is where it gets tricky!

Effectiveness: System effectiveness is a measure of the total number of accurate bad actor alerts (true positives) that are generated by a system as a ratio of the complete set of bad actor alerts that should have been detected. The formula can be expressed as:

Effectiveness = Bad Actor Alerts / All Bad Actor Alerts
Effectiveness = True Positives / (True Positives + False Negatives)

Here’s where we run into the big issue, the one that is at the crux of all compliance debates. People talk endlessly about the need to measure system effectiveness but to know how effective a system is we also need to know how many bad actors there are operating at our bank so that we can see how many we need to detect! There is a tautology here, if we knew who these bad actors were we would not need to detect them! It is only once we know the complete number of bad actors that we can actually assess whether our AML system is 100% or 0.01% effective.

Returning to our example, if we find 100 bad actor alerts each month and there are only 100 bad actors active at our institution then our system could be 100% effective. But if there are millions of bad actors abusing the institution our effectiveness rate could only be 0.01%.

“Without knowing the unknown it is impossible to accurately assess what we do know.”

The Compliance Officer’s Dilemma

Compliance officer’s dilemma: This leads us to the compliance officer’s dilemma which, to paraphrase Donald Rumsfeld, is that without knowing the unknown we cannot accurately assess what we do know. Or to put it another way, without knowing about all the bad actor cases that our systems should have detected it is impossible to get an accurate measure of overall system effectiveness.

In practice, you can use trade-off graphs and other styles of analysis to get estimates of system effectiveness. These work in the way a gold prospector would, and look at rates of return of detection as you dig deeper into the pile of potential alerts that could be generated. Even with these approaches, it is still impossible to know all the unknowns.

Two takeaways …

First, next time you are asked how effective your AML transaction monitoring solution is perhaps you should give the real answer “it is impossible to know” and then qualify it with the evidence that you have as to why your teams look at the number of alerts that they do and the trade-offs that this represents.

Second, as an industry, we should focus on relative measures of effectiveness and look towards the incremental improvement of these over time. You may not know the absolute end goal of the effectiveness of your systems and processes, but the incremental improvement over time means that wherever that goal is you will be moving in the right direction.

Finally, if you have found this interesting you might also like my article on the challenges of non-verifiable judgements and why fast feedback loops are essential to improve the performance of compliance (and other) systems.

Compliance risk and non-verifiable judgments

I’ve recently started reading Daniel Kahneman’s new book “Noise“. Like his previous book “Thinking, Fast and Slow” it’s what I would call a contemplative read, one that introduces concepts and stimulates thinking. I like these kinds of books. One of the concepts in “Noise” that he considers (in chapter 4, I’m still reading!) is that of verifiable and non-verifiable judgments.

Noise: The new book from the authors of ‘Thinking, Fast and Slow’ and ‘Nudge’ by [Daniel Kahneman, Olivier Sibony, Cass R. Sunstein]

In short, a verifiable judgment is one where the outcome can be verified. So predicting tomorrow’s weather is a verifiable judgment as you can very quickly validate whether the prediction of rain or shine was correct. In business and life, many decisions are verifiable but others, many of the most important ones you will make, are non-verifiable. This can be because at the point that the judgment is made they are impossible to test, have dependencies in how they play out, or that the time frame for validation is just too long.

Making a decision on the right business strategy. Selecting your partner for life. Lowering emissions to address global warming. At the critical decision point that these are made, these judgments all fall into the non-verifiable category. They may have been informed by the best available evidence, business trends, dating history, or scientific principles, but the timeline and dependencies make the actual judgments non-verifiable. As Steve Jobs suggests you need the luxury of hindsight to really prove you are right, so you have to trust in the judgments that you make.

“You can’t connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future”

Steve Jobs, 2005 Commencement Address

The world of risk, compliance, and financial crime prevention is full of non-verifiable judgments. As a financial crime officer, you have to make judgments on your risk policy and decide if it sufficiently protects your institution from financial crime or future regulatory action. In the last few years, we have seen a significant regulatory push for attestation and senior management accountability. This is all about trying to make those non-verifiable judgments verifiable, or at least to ensure sufficient due diligence is done in policy and process implementation and ongoing review.

Life would be easier if every judgment was verifiable. For this to happen we need things to be measurable, testable, and have rapid feedback to assess results and outcomes. We proved years ago that this is possible for sanctions filters, where outcomes can be measured against synthesized data and matched to the risk policy. There have been a few attempts to do the same for other areas of AML such as transaction monitoring but these are more difficult problems. It is possible to validate thresholds and settings of transaction monitoring tools but to answer the question of whether those systems are keeping money launderers at bay is one that puts us back into the land of non-verifiable judgments. This is especially true given a global regulatory framework that provides limited direct feedback on results against the millions of suspicious activity reports that are filed by banks and financial institutions annually.

In the new digital world, it is possible to verify the impact of website and mobile app changes, marketing campaigns, and sales initiatives in days rather than years or months. The speed of feedback for regulatory compliance looks archaic in comparison.

There are two take-aways here.

The first, that there is still a huge market opportunity for someone that can really crack the challenge of creating tools to make AML transaction monitoring and other compliance systems truly verifiable. Vendors continue to try, BAE Systems, Cable, AML Analytics, and others are moving in this direction but no one yet is doing it well. And anyway, shouldn’t these be capabilities be embedded in the AML transaction monitoring systems themselves?

The second, that there is no surprise that we already have evidence that fast feedback and qualified outcomes work. The UK National Crime Agency reports significantly better outcomes for Defence Against Money Laundering (DAML) over traditional suspicious activity reporting. DAMLs provide a fast feedback loop that allows iteration and improvement that helps make some of those non-verifiable judgments verifiable. One day all compliance will work this way!