NextGen SupTech: Machine Learning for the Regulatory Life Cycle

Author: Hugh Purcell, Solution Architect

“Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years.”

Andrew Ng

One of the world’s leading artificial intelligence (AI) researchers’ Andrew Ng, current CEO and founder of, compares the transformative effects awaiting those who embrace AI techniques to that which was experienced by the safe availability of electricity in the early 20th century. These techniques are already taking affect in our everyday lives with the expansive use of virtual assistants, Siri and Alexa etc, the appearance of targeted adverts which make one question if the internet is reading their mind and even the backgrounds showing of where we would wish to be on our now normal video calls on applications like Teams and Zoom.

"Over the last two years alone 90 percent of the data in the world was generated." (2018)

The driving force behind this rise in AI is the vast availability of data. We have now well and truly arrived in the age of data with more and more interactions between humans and machines happening every moment and with these interactions all largely being recorded and stored in the vast datasphere. The increase in these interactions together with the value now associated with data has seen the data volume generated growing exponentially with an estimated 90% of all the digital data in existence, having being generated in the between 2016 and 2018 alone. The numbers are literally astronomical with 44 zettabytes of digital data existing at the start of 2020, that is 44 trillion gigabytes. This vast availability of accessible data has led to an explosion in research in the area of AI, which is powered by data, and has seen the sector evolve from the research lab to mainstream.

Indeed as AI techniques have become more mainstream their usability and accessibility have similarly improved. This in turn has led to a surge in industry confidence in taking advantage of AI and specifically its subfield machine learning (ML) which is conducive to identifying trends in the large data sets currently being collected as well as making predictions on outcomes based on these trends. We will investigate the need for adding ML to the regulatory lifecycle, the benefits of doing so and highlight the hurdles recent advancements have overcome which now give regulators the confidence that they can take advantage of this technological shift by integrating ML techniques into their regulatory lifecycle.

Machine Learning in Regulatory Reporting

In this section, we will highlight the increase in the scale and complexity of the regulatory data being collected since the global financial crisis, together with the challenges this presents. We highlight the benefits of utilizing ML to extract insight from the collected data and investigate the recent advances in accessibility and transparency which have allowed ML to evolve to the virtual commodity it is today.

Data Growth

“In 2004, banks faced an average of 10 regulatory changes per day. This has jumped to over 100 in 2020.” (2020)

After the Global Financial Crisis in 2008, regulators introduced additional stringent requirements for banks and other financial institutions to meet in order to ensure transparency of their operations and their financial security. These requirements have led to an explosion in the volume of data being requested by and provided to regulators, with data requirements being of higher complexity and delivered in much greater frequencies than ever before. To accommodate these requirements the data has become in some cases more granular, growing in scale but also in other cases grown in terms of complexity. Essentially the data collected has entered the realm of big data, or a high level of complexity, which both mean it is no longer easily managed/interpreted by a human. These requirements have made regulatory compliance more burdensome for both regulators and industry alike. Even the introduction of standardised collections, such as the Solvency II and Basel initiatives across the EU 27, have seen gains in using a common standard being outpaced by the complexity of the data being requested.

IMAGE: Figure 1: Bank of England forms to staff comparison 1981 – 2013. Source: Modelling metadata in Central Banks

A view of this expansion in data collected is seen above as a comparison, from a study by the Bank of England, of their staff number to the number of forms collected, where we see the exponential growth in the amount of forms (data) reported after the global financial crisis, leading to the number of staff having to grow to process this new data. While it is feasible to imagine that the expansion of data collected will continue to grow, it is infeasible to continue similar growth in staff numbers. This highlights the need for new and innovative solutions to allow regulators to take advantage of the insights contained within this data.

No single solution exists to reduce these additional burdens but as we have seen in other sectors where there has been a boom in the amount of available data, technological advancements in the form of AI and ML have presented opportunities to harness the power contained within this data.

Data Journey

Machine learning (ML) provides a level of insight into the underlying trends contained within data which are not easily found even by subject matter experts completing exploratory data analysis. The traditional exploratory data analysis process, as shown in the Traditional Data Journey (redline) section in the figure below, follows the path:

  • Review data in the form of information.
  • Query the information to extract knowledge.
  • Interpret the gained knowledge using wisdom obtained by experience to make a decision

IMAGE: Figure 2: Decision Details measured against Decision Time when using ML vs Traditional Data Analysis

In the ML world (grey line) the data is automatically interrogated to find all of the trends it contains and these are then provided to the human decision-maker with the significance of each trend also highlighted. This insight can be achieved in a much shorter time frame and provide a much greater level of detail than those provided by traditional methods. In other words, it enables better decisions to be completed in a shorter timeframe.

This paradigm allows regulators to collect a much lower level of data, often referred to as granular data or transaction-level data, where the regulator receives the records of each transaction completed and collates, slices and dices this data as they require. The advancements in ML have made the use of granular data possible as ML techniques can be used to highlight the areas of most interest and to extract insight not previously considered and allow the regulator to operate more effectively and in closer to real-time.

Machine Learning Evolution

Early iterations of ML as we know it today were difficult to implement, as we discuss below, but with recent advances, there has been an expansion in its use as it has become:

1. AccessibleML knowledge is easier to ascertain.

2. CommoditisedMultiple machine learning models are now available off the shelf.

3. TransparentNew techniques allow previously opaque decision logic to be provided.

ML was until recently confined to research labs with the knowledge of how to implement such techniques being maintained by a small number of highly specialised individuals with access to an even smaller pool of supercomputers with the power to train and validate the techniques. The advancements in computing power, brought about by the fulfilling of Moore’s Law over the past 50 years, has seen ML techniques being able to move from requiring supercomputing hardware to many models being successfully trained and deployed on desktop and laptop computers.

This has extended the reach of learning these techniques from the basement labs to online learning platforms providing hands-on tutorials which can be completed using one’s personal computer. With the added availability of high-performance computing environments in the cloud, even models which require additional computing power to be trained are accessible without the need for large infrastructure investment.

As these techniques have become more usable and the availability of channels to learn these techniques has grown, there has been a monumental shift in how the techniques are developed. Until recently each model would have to be built from scratch requiring advance mathematical, statistical and computing know-how.

This has now changed with a plethora of available software solutions which no longer require the deep understanding of advanced mathematics but rather the understanding of how the data relates to the business process to which they can add so much value. This allows regulators the possibility to harness the power of the data they are already collecting by infusing their existing area of expertise, regulatory requirements, together with a small amount of professional development in how to feed this data to existing ML methods and interpret the provided results.

We already have evidence of this paradigm in action, as it is only a relatively short amount of time since our regulators successfully infused their knowledge of regulatory requirements with the availability of digital submissions as opposed to previous paper submissions.

IMAGE: Figure 3: Performance v Explainability of widely used ML models. Source: DARPA

ML has shown its awesome power with significant increases in model accuracy, however, the most accurate models traded the accuracy for a lack of transparency, as though the results presented by advanced ML techniques such as support vector machines, deep neural networks and deep random forests, surpassed the accuracy of humans and other AI systems significantly, the reasoning behind the logic used in making their decisions for individual instances was unknown, leading to these models being categorised as opaque black boxes.

In other words, modern ML techniques are extremely good at telling you what to do but not so good at telling you why you should do it. This presents a major challenge as transparency is a key cornerstone of regulation and regulators are required to evidence any regulatory action which they take and as such, using black-box algorithms with no details of the logic used to make their decisions to assist in the regulatory lifecycle is not an option.

This has recently changed with the recent development of the “eXplainable AI” (XAI) discipline which provides interpretable and explainable techniques, such as LIME, Deep LIFT and SHAP, to extract and highlight the logic used by these black-box models in making a decision. Regulators can now take advantage of the existing highly accurate ML methods while utilizing these new techniques to provide the transparency they require. The growth in this area can be seen below by its expansion in a number of contributions to the literature over the past decade, with Interpretable AI, XAI and Explainable AI all working to present the logic used in making automated predictions.

IMAGE: Figure 4: Growth in Explainable and Interpretable AI Research. Source: Information Fusion


In this section, we highlight existing published research and implementations which have been completed by regulators around the globe. We follow this up with an investigation into the areas of the regulatory data lifecycle which ML is currently being used and the areas where it can be unleashed in the near future.

Existing Implementations


The Bank of England have been one of the regulators leading the charge in regards investigating how ML can be used to assist in it supervisory process and regulatory data lifecycle. Initially publishing a foundational paper on Machine learning at central banks in 2017 and building on this to show ML use cases in a number of additional publications with an evident move to have a greater focus on the need for explainability within the regulatory use of ML such as in:

1. Shapley regressions: A framework for statistical inference on machine learning models (2019)

2. Predicting bank distress in the UK with machine learning (2019)

3. Machine learning explainability in finance: an application to default risk analysis (2019)

4. Credit growth, the yield curve and financial crisis prediction: evidence from a machine learning approach (2020)


In the U.S. the SEC have been ratcheting up their use of ML since the global financial crisis and in more recent times have started to use decisions by machine learning models to guide regulators to the entities whose behaviour required more detailed analysis. This is using a method called keeping the human in the loop, which takes advantage of the power of ML but maintains a level of human accountability by ensuring all privileged decisions are ultimately made by a human actor as opposed to autonomously by a machine.

Areas of Implementation


The first port of call for those who are working to utilize ML techniques on regulatory data has been to extract large historical data sets from their data warehouses and use ML to verify previously considered trends and to identify previously unknown trends contained within this data which may assist in the future prediction of issues together with forecasting future performance, all of which work to enhance industry stability into the future. There have been a number of successful implementations completed in this area such as those implemented by the Bank of England and the SEC previously discussed.


Building on the successes that have been found by training ML models on downstream data, the time is now for the transfer of this knowledge upstream to assist in real-time regulation. This is achievable by integrating trained ML models into the data collection platform and exposing regulators to predictions which these ML models make while each submission is being reviewed. The ideal time in the regulatory cycle to introduce ML is as early as possible, so as to inform the human in the loop. From a regulatory point of view, the human in the loop is vitally important for a number of reasons:

1. Ensure accountability for decisions.

2. Ensure privileged information and decisions are processed in a safe and secure manner.

3. Ensure adequate evidence exists to deliver consequential regulatory actions.

4. Build trust in ML models.

While we do not envisage machine learning replacing regulators, it is expected that the job of a human regulator will be streamlined by the availability of insight from ML models integrated into their decision space. This enables the regulator to be more agile and focus on areas of most risk and identify concerns much earlier than was previously possible using existing methods. The table below shows how we can optimise data usage by implementing the most appropriate instances where the most value is realised.

IMAGE: Table 1: Comparison of Benefits of Downstream vs Upstream Data


As previously stated, it is hard to see an area of the industry which will not be affected by AI advances. One area which has seen recent significant growth is artificial intelligence for IT operations (AIOps). For the regulator this offers two key benefits:

  1. AI Ops can predict operational issues leading to better operational systems with less downtime

  2. AIOps can provide an insight into the interactions a user has with the gateway used to provide the regulator with data.

These gateways have until recently been predominantly web portals but more recently this is in the process of moving to a hybrid approach with a web portal combined with APIs to assist in the streamlining of methods to provide regulators with the data they require.

It can be broadly expected that the users of one entity will broadly follow the same interactions with these gateways as their peers. ML provides many methods to identify outliers and these can be used to highlight entities, users and their behaviours which are outside of the norm, when compared to their peers and may require a more detailed investigation by the regulator.


We have seen how the world has undergone a monumental shift into the digital age. This has seen an explosion in the amount of data being generated with a corresponding increase in complexity. This realm of data has arrived at the financial regulators door and the time is now for innovation to further unleash the power of the insight contained within this data.

Initially, the solution sought by many regulators was to hire more personnel to process/analyse the data being collected, however, the rate of growth in the digital age has made the continuation of this trend untenable. There are many disruptors out there all vying for your attention to help deal with these challenges and it can be difficult to pick the right ones for you. ML is one of the more realistic ones of those which can add real value to the regulatory life cycle and address some of the challenges around data.

Furthermore, recent advances in the commoditisation of algorithms, computing power and the field of explainable AI have opened up the possibility of regulators using AI to a greater degree and in particular, its subset ML to streamline their data processing pipeline with gains available in the level of insight which can be automatically extracted from data in near real-time.

Within financial supervision, the next step will be to inject this technology into their existing data collection pipeline. Exposing the available insight to the regulator as early as possible in this pipeline will be the key to success. Vizor Software has been providing software for effective regulation for the past twenty years and has been investing in credible disruptive solutions including ML for this end result. Thus, providing regulators with the complete toolset for modern supervision and importantly keeping ahead of the tide of data and challenges therein.