Tech For Good - Could graphs stop coronavirus fraud?

Attempts to defraud coronavirus financial aid are increasing across the world. Scammers are using fake identities to drain governmental emergency funds. This pattern is particularly acute in Germany, where fraudsters are trawling businesses applying for emergency funds, and using the information to divert funds to flow from state coffers into their own accounts.

According to the FT, authorities are investigating no less than 104 such fake sites. Governments have been keen to make the funds as easily accessible to businesses as possible, but these scams present a major obstacle to getting the support to those that need it most.

Can the financial services industry offer insight?

Financial services companies are used to comparing and checking transactional and personal data to monitor for fraud. Similar to the application for financial aid, those that defraud banks use synthetic identities when creating accounts (or loan applications). Information such as home address, phone number and email details are stolen from real people and then reassembled and augmented into a fictitious persona.

Conventional fraud detection solutions are not sufficient to expose synthetic persons. They can only relate two to three pieces of information at a time, such as name, home address or bank account, for instance, and while this is useful for catching individual perpetrators, it falls short when it comes to uncovering the distinguishing features and behaviour of a fraudulent network. In addition, these systems also produce an alarming number of false-positive results. According to research by Microsoft, banks are reporting these as high as 95 to 99%, which can be detrimental to customer relations.

Revealing the hidden networks of a fraudster community

The underlying reason why conventional approaches are ineffectual is that most fraud detection systems are based on a relational database model. Information is stored in predefined tables and columns. With large, unstructured datasets, they quickly reach their limits. Queries turn out to be too complex and the response times drag on. Furthermore, these systems are trying to detect fraud with no real context. Banks and government authorities need the ability to trace a trail from one account to another, viewing a fraud network as the whole, complex entity that it is, to determine how activities are related.

Graph database technology may be the answer to combating these bad players. In contrast to relational databases, graphs not only depict individual data (e.g. person, account number, home address), but also their relationships with one another (e.g. “resident in”, “requested”, “transacted with”). The data model can thus depict relationships such as “account number B belongs to person A residing in address C that received a transaction from D”. Data and relationships are referred to as “nodes” and “edges or relationships”. Any number of qualitative or quantitative properties can be assigned, showing complex relationships in an understandable and descriptive manner.

In order to stay one step ahead of fraudsters and get more aid to the people that really need it, the use of graph analysis is turning out to be key

One of the best-known graph algorithms for potentially fending off coronavirus fraudsters is ‘PageRank’, which finds important nodes (objects) based on their additive relationships and ranks the nodes with a relative score. For fraud detection in banking, the algorithm identifies important or influential customers who are at the end of countless money transactions. Nodes with a high PageRank Score can be illustrated using a visualisation tool so that they appear larger in the view, and are therefore immediately noticeable.

Another important algorithm is ‘Weakly Connected Components’, which works to reveal the hidden networks that form a fraudster community based on common identity features (e.g. a telephone number being used by more than one person, or multiple applicants with different names apparently living at the same address). At the same time, if person A and person B have the same phone number but do not live in the same address, that’s a big clue as to a possible attempt to defraud. Spotting patterns like these allow analysts to identify suspicious activity concerning synthetic and stolen identities. These hidden connections provide valuable information when searching for fraudsters.

A notable example of what graphs can uncover with such connections is highlighted by the International Consortium of Investigative Journalists, the group behind the Panama and Paradise Papers. These reports exposed the rogue offshore finance industry after the group opted to go with graph technology because of its powerful capabilities in mapping complex financial connections and picking out irregularities. Graph technology has had a big part to play in recouping more than $1.2 billion in resulting fines and back taxes since the original 2016 investigation.

Do you really ‘know your customer’?

Graph technology has the same capacity to help when it comes to stopping financial trickery, like coronavirus aid fraud. Business technology and data company Dun and Bradstreet is using graphs to get smart about fraud detection. To check who the ultimate/real economic owners of a company are more quickly, it runs extensive ‘know-your-customer’ queries. Before their new graph-based system was introduced, the research required highly-qualified personnel, and just one query could keep employees busy for up to 15 days. By using graphs, Dun & Bradstreet can now perform customer reviews faster and more accurately, surfacing fraud and other crimes far more quickly.

Like Dun & Bradstreet, everyone from law enforcement to proactive chief information security officers could use graph-based queries to check the legality of applications and investigate suspicious information. Whether you are applying for immediate coronavirus help, or any other financial product, fast, accurate data analysis in real-time is essential for combating fraud. With fraud attempts becoming more complex and faster to execute, keeping up is not an easy task for authorities and financial service providers.

In order to stay one step ahead of fraudsters and get more aid to the people that really need it, the use of graph analysis is turning out to be key.

Amy Hodler is Director, Analytics and AI Program at Neo4j, the world’s leading graph database company, and co-author of Graph Algorithms: Practical Examples in Apache Spark & Neo4j, published by O’Reilly Media