Machine Learning has the potential to be a crucial weapon in the fight against cybercrime – particularly when it’s augmented with full packet capture

There are two big advantages that cyberattackers have over the security teams that defend corporate networks: Agility and Amplification:

 Agility – because an adversary is limited only by imagination and is armed with tools that allow attacks to change appearance and avoid detection.

Amplification – because a single adversary has access to powerful tools that can wreak havoc to organizations across the globe and tie up hundreds of cyber defenders for days, weeks or months.

 Artificial Intelligence (AI) and Machine Learning (ML) offer the potential to level the playing field, allowing cyber defense systems to learn and adapt to new threats, amplifying detection protocols and even automating aspects of threat response.

 If machines can continuously monitor the network, and learn the difference between normal and threatening behavior, they can help by distinguishing real threats from the noise of network activity and identify and prioritize the threats that security analysts need to investigate and respond to.

And if machine learning can be trusted to take appropriate and immediate action on threats, it offers the potential to redress some of the current imbalance between attackers and defenders: allowing more agile response and amplifying the productivity of security teams.

 Machine Learning, the Engine for AI

ML works by identifying patterns and anomalies in data. When applied to security analysis, ML can be used to identify anomalous behavior that indicates a likely threat. Broadly speaking there are two main approaches to ML: Supervised and Unsupervised.

 Supervised ML involves feeding the learning algorithm examples of both malicious and innocent activity that have been categorized (“labelled”) by humans. From these examples, the ML algorithm learns to be able to identify how new occurrences should be classified.

 Unsupervised ML does not use labelled data to learn from. Instead it learns by inspecting various features of the data set and applying statistical techniques, such as clustering, to find natural groupings and relationships between them. Clusters usually represent normal behavior, while outliers may represent abnormal, potentially threatening behavior.  

ML-based security solutions may use one approach or the other, or both: a “hybrid” approach.

Why Packet Capture is Important to ML

Cyberattacks typically take place on the network, meaning evidence of those attacks can be seen in the packet data that traverses the network. For this reason, ML-based cybersecurity solutions rely heavily on analyzing network traffic to identify threats: packets provide the truth about what has really taken place on the network.

An accurate source of network traffic is essential both to ML-based systems and to the human analysts that rely on their output. It provides source data for training Supervised ML systems as well as providing the packet data that Supervised, Unsupervised and Hybrid ML systems need to analyze to detect threats.

 In order to provide accurate data for ML-based systems to analyze, packet data must be complete. If packets are missing, it can result in the ML engine reaching erroneous conclusions or simply not detecting threats it would otherwise have seen.

 Hardware-based solutions are the packet capture is necessary for highly accurate, lossless packet capture, particularly at high-speed. This hardware may be built into ML-based systems, or it may be provided by a packet-capture platform that captures and records traffic and feeds it to ML-based tools – as well as other security tools – for analysis.

The advantage of the latter approach is that recorded packet data can be made available for analysts to use in investigating threats that are detected by their security tools – including ML-based tools. With access to the actual recorded network packets analysts can pinpoint precisely what took place so they can respond to the threat accurately and quickly.

 Building Trust in AI

In order for humans to trust AI/ML based security tools, it’s important that the decisions these tools make can be verified.

Packet data allows analysts to validate the alerts the ML engine raises and build confidence in the ability of the tool to correctly identify real threats and ignore false positives. Analysts can also provide accurate feedback to ML-based tools when false positives are detected, continuously improving the tool’s accuracy over time.

It’s important that human analysts can see exactly the same traffic that the ML engine analyzes so they can accurately validate the decisions their ML-based system makes.

Building trust in the accuracy of AI/ML-based tools is essential if we are to trust them to autonomously – and accurately – respond to the threats they detect. Providing the ability for human validation is essential in building that trust and reaping the productivity benefits that AI/ML-based systems promise.

Conclusion

AI/ML technology has the potential to tip cybercrime’s scales back in favor of the good guys by learning from our collective wisdom and amplifying the ability of our security teams to respond quickly to threats.

Combining a full packet capture solution alongside AI/ML-based tools is key to unlocking this capability.

John Attala is the Senior Director Americas, Endace