How Experian solved the Big Data problem of Open Banking through categorisation

Today’s digital, data-fuelled economy means many businesses are struggling to extract the right level of insight to inform their operations and decision-making strategies. At the same time, the UK introduction of Open Banking and the Payment Services Directive 2 (PSD2), as well as the FCA’s consultation into creditworthiness, means the way in which decisions are made, or should be made, is changing significantly.

Many are assessing whether open data has the potential to change lending processes. These factors mean lenders need to think and apply methods differently. The Experian DataLab set out to solve this challenge.

What is the scale of the problem?

The future of banking: shifting demographics and regulations

Shifting demographics, such as the rise of the Millennial, plus emerging regulations including Open Banking, PSD2 and the General Data Protection Regulation (GDPR – the Data Protection Bill), are impacting lenders according to our research. The rise of digital, plus new technology, such as Blockchain, are also causing disruption and causing many to play catch up.

Many believe banks have undergone a huge transformation already. But true digital banking will mean instant switching, instant applications, greater transparency and much more personalised experiences. It’s when we’ve achieved this that we’ll see the true extent of the transformation.

A data-driven future

Businesses today, particularly established players, hold a huge amount of data. Equally, new entrants will likely overstore data without the right architecture in place. APIs have supported the speed of data in and out of businesses, while Open Banking means organisations’ consumption of data will only increase over the next decade.

Digitising banking

Digital is moving at such a rate that many are struggling to keep up. Many banks have made headway in the front layer via portals or apps. As this is the layer the customer sees this approach appears logical; in reality the other layers in the system must be able to support it.

Digital means real time and advanced. If an appropriate architecture is built, systems and services will keep pace with expectations and, more importantly, be sustainable. This would move us to a place where banking and financial services become systemic in leading the evolution, and defining the revolution too. That will require not only a digital interface, but also the ability to maintain pace and be one step ahead. Self-learning will be essential.

The DataLabs challenge

We were challenged to find a solution whereby banks and other financial service providers could maximise data – specifically Open Banking data and transactional data sharing – to better inform decisions. That could mean credit application decisions or deciding which products best suit which customers throughout the lifetime. However, in its raw state, transactional data just adds more complexity.

It was immediately evident that the ability to categorise the data would be essential. Categories that correlated to actual spending and income patterns – such as utility bills, or recreational spending – would make the data much more meaningful.

The strategy of approach

We created a strategy using a small set of test data and gained enough evidence that our approach would scale up for larger datasets.

We planned to use machine learning to group expenses, reducing the overall flow of data and creating meaningful and usable categories that gave insight into financial behaviours. This informed the taxonomy we used. (Taxonomy is the categories we use within the system to group income and expenses based on a common factor in the transaction)

We intended to use the data to give a granular depth of information into customer behaviours, meaning the taxonomy required a certain level of granularity. Here we needed to ensure the level of detail presented back was enough to give the required insights.

Connecting the data was also important. As well as matching the data with our own, we combined it, updated it, completed it where needed, and created a single customer view. We then enhanced it by categorising through similarities aligning it with a clear taxonomy.

From here, we created Trusso, which is a categorisation engine that is built using machine learning which can categorise bank account data into categories. Read more about it here

The outcomes and conclusion

The potential and business value for categorisation through Trusso is enormous – especially in the context of customer churn, retention or cross selling.

Very soon banks, and every business, will need to compete in a real time environment to a much more sophisticated level. This means moving to a place where they can decide, in seconds, whether to offer a loan and what type of loan it should be. This is a massive change from where the market is today.

Machine learning is often deemed as a black box, offering quick and easy solutions to those who know little or nothing of the inner workings. So while the results are transformational, we believe they should be applied with discretion. Machine learning offers a much more granular approach for the cases which require more accuracy than classical models. However, where classical models suffice, we would advocate these be used in place of machine learning of this kind.

We are in a data insight environment where we are not only catching up with society, but trying to innovate, balance risk and reward, and compete too. This is a huge opportunity and can be achieved through implementing the right advanced analytics framework and tools.

The concept of machine learning and artificial intelligence has long been a trend, but this trend is not going away. In fact, it’s a solution that responds to a trend, offering an opportunity to accelerate business models and, ultimately, better serve businesses and communities and reduce unnecessary risk.