This past July, Capital One was hit by one of the largest financial hacks in history, impacting the personal data of over 100 million people. Unlike other attacks, however, this one was notable because it originated from a cloud vendor. The attack targeted data stored on AWS servers, with access coming through a misconfigured firewall.

Security is only as strong as the weakest link in your organization’s line of defense, and the many threats facing the financial sector are escalating both in terms of quantity and complexity. So it’s projected that by 2021, the amount of damages due to malware will likely hit $6 trillion. It’s not surprising, then, that in a new report from the World Economic Forum, cyberattacks are again ranked the number one risk to doing business… and by a very large margin.

There is immense pressure on cybersecurity and fraud detection experts to identify and neutralize threats in near real-time. Yet today’s detection tools – including graph search databases – face daunting challenges, particularly when parsing and analyzing the massive datasets upon which so many financial institutions now rely.

In response, banks and other large organizations are exploring new graph search technologies to identify malware patterns. Cutting-edge in-memory computing and graph search tools can identify cyber risks in near real-time, condensing what typically takes weeks down to just minutes.

So – if you’re a security expert who works with especially large datasets – there are now three steps you can take to reduce the time it takes to find and neutralize malware. These include:

Migrate from graph databases to graph search tools.

There is an overwhelming amount of data out there; too much for conventional tools to scan in a practical time frame. Companies must regularly scan their network log data to identify lateral movement. At the same time, banks easily generate terabytes of network log data per day… which means threats cannot be identified in a reasonable time frame. Conventional tools will simply never catch up to the amount of data being generated and the number of incoming threats hitting the network This is why the average dwell-time for malware on a bank’s network is an astounding 71 days.

To dramatically reduce this figure, it’s essential to look beyond conventional graph databases. While graph databases are ideally suited for smaller datasets because they scale both vertically and horizontally without introducing data consistency or integrity issues, they simply don’t scale well once you have terabytes of data to analyze. At this scale, graph database performance declines for two reasons:

  1. Scaling horizontally results in practically every memory fetch (edge traversal) requiring a message to be sent across the network to another node.
  2. Storing data on disk, and working with only a small fraction of that data in-memory, results in data thrashing between RAM and disk to traverse edges.

Conversely, graph search tools are specifically built for very large datasets. For example, the Department of Defense helped develop the Trovares graph search tool, which adopts supercomputing techniques such as extreme multithreading and fine-grain locks to achieve substantial increases in both scale and speed. A team of data scientists applied analytics and supercomputing expertise to deliver a significantly different graph search tool, which returns queries hundreds of times faster than conventional graph tools, while supporting large in-memory graphs for fast queries. It also enables the direct ingest of data into the system to avoid database performance issues.

Don’t search just a slice of your data. Search all of it.

A common solution to scanning large datasets is to slice-and-dice, or analyze just a piece of the overall dataset at a time, to try and find malware patterns. But this is no longer the best option.

Graph search tools can ingest more data while answering complex searches. The performance boost of graph search – particularly when combined with symmetric multiprocessor systems (SMP) systems – lets it leapfrog conventional search tools when searching much larger datasets. Thus it can quickly find intrusions that have continued to reside in the data. Critically, increased speed and scale allows organizations to scan the full dataset, in a matter of seconds or minutes, to neutralize malware.

Consider migrating to an SMP system.

Lastly, you should consider which compute platform will meet your need for extreme performance. Server clusters are popular, but they are not ideal for graph search, and indeed the typical computation over a graph data structure is among the worst for clusters.

SMPs, on the other hand, are excellent for graph search. Implementations from the team commercializing the DoD technology were built on SMP systems such as HPE’s Superdome Flex.

One SMP system today can range in size from 1000+ threads of execution, and 3 to 48 terabytes of memory, providing the balance of processing capacity and storage to meet the demands of scaling graph search performance. These platforms enable high performance ingest of data and are built on industry standard x86 processor technology and PCIe-based IO. They support the full range of software needed to complete a workflow around graph search.

Benchmark data for these systems shows near linear scalability when querying 3 terabytes of data with 20 billion graph edges and 212 billion edge properties. The combination of the graph search tool and an SMP system demonstrates orders of magnitude improvements in speed, reducing query time from days to minutes.

You should expect graph search tools on SMP systems to outperform conventional tools on datasets of all sizes… but they become particularly compelling when your dataset reaches a billion or more records.

Why this matters

The malware challenge continues to evolve and grow, and organizations must adopt new technologies to remain one step ahead. Today’s leading enterprises can find the performance needed to overcome the speed and scale challenges of identifying and neutralizing malware by adding graph search and SMP systems to their cybersecurity roadmaps. With these tools, it’s now possible for organizations to quickly scan their entire datasets to address data breaches as they happen in near real-time.

James Rottsolk is co-founder, president and CEO of Trovares