Leverage graph technology for real-time Fraud Detection and Prevention

Written by Deepak Patankar and Mathijs de Jong

Introduction

At Booking.com we are dedicated to maintaining a secure and trustworthy platform for both our customers and partners. Our work involves addressing a multitude of threats, ranging from payment fraud to the proliferation of fake hotels and reviews, as well as the abuse of marketing campaigns. It’s only by effectively managing these challenges that we make it easier for everyone to experience the world.

Detecting and combating fraud presents a formidable challenge. Fraudsters continually evolve their tactics, demanding adaptable and agile fraud detection and prevention systems that immediately leverage newly available information. At Booking.com, we employ innovative and adaptable approaches such as machine learning models, manual investigations by fraud domain experts, and heuristic rules.

Problem statement

A key insight guiding our efforts is the interconnected nature of fraud attacks. Often, there are links between various actors, identifiers, and requests. For instance, knowing that an email address was previously associated with fraudulent activity provides valuable context for our systems. A natural way to represent such link data is through the mathematical concept of a graph. Recognizing the power of this interconnected data, we have therefore invested in a graph technology service. This service enables us to provide real-time link information crucial for effective fraud detection and prevention.

How do we represent requests in a graph?

We utilize historical data, such as reservation requests, to construct a graph representation. In this graph, nodes represent transaction identifiers like account number and credit card details, while edges connect identifiers that have previously been observed together. This data is stored within a graph database. When it’s time to assess fraud risk, we query the graph database to construct a local graph centered around the request identifier.

To illustrate, let’s explore how a graph evolves over time and how it can reveal suspicious patterns.

Time step 1

A reservation request comes in from ‘account 1’ with ‘credit card 1’. We create nodes for those entities, and an edge between them (Figure 1).

Figure 1: evolution of a graph over time; time step 1.

Time step 2

A reservation request comes in for ‘account 1’, but now with a different card: ‘credit card 2’. We add a new node for the new credit card, and create the relevant edge (Figure 2).

Figure 2: evolution of a graph over time; time step 2.

Time step 3

A reservation request comes in for ‘account 2’ with ‘credit card 2’. We add a new node for the new account, and create the relevant edge (Figure 3).

Figure 3: evolution of a graph over time; time step 3.

Time step 4

We received a notice from the issuing bank of ‘credit card 1’ that this card was used in a fraudulent way. We add a node that indicates the fraudulent status (Figure 4).

Figure 4: evolution of a graph over time; time step 4.

Time step 5

We receive a reservation request from ‘account 2’ with ‘credit card 3’, and are tasked with the fraud risk estimation (Figure 5). Though both these entities (in purple), ‘account 2’ and ‘credit card 3’, are not directly associated with previous fraudulent activity, only a few links away such activity has been observed (in red). It might indicate participation within a larger fraud scheme, or not. This information was surfaced by the graph service in real time, and can be used as one of the bits of information that goes into risk estimation. In absence of graph building, this type of information is difficult to surface.

Figure 5: evolution of a graph over time; time step 5.

In Figure 6 you see two different examples of real graphs generated by our graph service. Such a request is based on a given identifier (green star in the graph), which could for example be the account number used to make the reservation. Starting from this central identifier, the graph will include connected identifiers. For the request on the left, the central identifier was connected to only two other identifiers previously observed on our platform (blue), which were not associated with any other identifier. For the request on the right, the identifier was connected to many other identifiers, resulting in a large and deep graph.

The shape and size of these graphs can reveal information about the likelihood that the given request is fraudulent or not. In this particular instance, it turned out later that the request that generated the small graph on the left was a non-fraudulent request, whereas the request that generated the large graph on the right was a fraudulent request. This illustrates that the size and shape of the graph can contribute to the fraud risk estimation.