Big Data can fight malware
Big Data can fight malware

Today's malware is a moving target. Cyber criminals change the appearance of malware samples by using sophisticated polymorphic techniques and leveraging the dynamic nature of interpreted languages, such as JavaScript. In addition, the malware infrastructure (i.e., the “malscape”) is in continuous flux in an effort to avoid detection and deter actions from “the good guys.” As a result, some of the sites used in the delivery of attacks and in the management of infected machines have a lifetime of just a few hours.

These evasive techniques are effective when the detection of malware is performed from a single observation point. However, the ever-changing nature of malware and the malscape generates anomalous network behavior that can be detected by leveraging large corpuses of data collected from multiple – hundreds, thousands and even millions of – observation points. For example, the fact that multiple machines in different networks are downloading different executables from the same URL can be an indication that the server host is distributing polymorphic malware. As another example, the fact that many hosts that are distributed across the globe regularly connect to the same list of hosts using the same access patterns can be evidence of migrating command-and-control servers.

To identify these network behaviors, it is necessary to collect large volumes of information about the events in a network (such as passive DNS information, network flows, checksums of downloaded files, etc.), which then need to be analyzed using highly distributed algorithms.

This is exactly what “Big Data” analysis is. Pioneered by search engines like Google that need to process internet-wide information efficiently, these new approaches for the storage and processing of large data sets are a promising new countermeasure against the continuous growth of the malware threat. By collecting billions of records about the activity of computers across multiple networks, it is possible to spot anomalous network behavior (in almost real time) by comparing hundreds of networks with seemingly no relationship. 

Maybe not a silver bullet – but, for sure, one that could hit a moving target.