It can be easy to forget that there are people behind just about everything that happens on the internet. However complicated a technology may be, it was engineered by people and is used by people too, and people are predictable. The British code breakers at Bletchley Park knew this when they set about decrypting messages from the German Enigma machines during World War II. Insight into human behavior was critical to the codebreakers' success, and it can be an important tool in building modern, predictive cyber intelligence programs as well.
Admittedly, the cryptanalysts at Bletchley Park were aided by their knowledge of the enemy; they often probed for predictable phrases such as “heil Hitler” and ruled out possibilities based on knowledge that no letter could be enciphered as itself. Conversely, predicting cyber attack behavior on a global, 21st century scale is significantly more intricate. Networks carry millions of transactions a day and sustain attacks from thousands of IP addresses and sites. Tactics are constantly changing, and attack vectors become more sophisticated and hard to foil. Yet it stands to reason that there should be human fingerprints in network data that can help us predict and protect against future compromises.
To simplify, imagine an oil company, let's call it Clampett Oil. Clampett Oil's executives suspect that their information networks are under attack, and data is being stolen. Executives want a detailed report and a solution, so Clampett Oil infosec analysts start with the company's risk profile: What do they have worth stealing or attacking, who might be doing it and what might they be doing with it? Clampett Oil not only has sales, exploration, development and acquisition data worth stealing; it is also an innovator in extractive technologies and a frequent target for environmental activists.
Depending on the sophistication of the attacks and the geostrategic importance of the information, it could be targeted by a nation-state or by organized criminals for sale to competitors. Defacement and distributed denial-of-service (DDoS) attacks are more likely to be motivated by ideology than espionage and theft. On the other hand, Clampett Oil's crown jewels, its most protected intellectual property, can probably only be stolen through a combination of social engineering (phishing or insider attacks) and sophisticated malware.
Using information from this initial analysis, Clampett analysts can look back months and even years, scrutinizing cycles of attacks and compromises, looking for patterns. Perhaps Clampett Oil was hit by a phishing attack not long after announcing a new partnership, or after a competitor launched a major, new project that will stretch its technical capabilities. Clampett might want to provide refresher training and remind employees of best practices to avoid becoming a victim of phishing or spam.
Clampett analysts also notice a rhythm of suspicious activity that marks time with the holidays of a certain country, a time zone, or that subsides during a commute time on the other side of the world. Based on their evolving theory of the attacker, they might start watching social media for signs that their theory may be supported by geopolitical evidence.
Of course, relying solely on stereotypical geographic, cultural, or political lines often results in misinformation, as cultural references in malware code may be strategically placed to cover the tracks of criminal actors. The organized crime group in one country may be acting on the request of a competitor in another country. Code written by a state-backed hacker may be copied and re-purposed by an inexperienced individual motivated by ideology on the other side of the world. Moreover, our assumptions about how the world looks from someone else's vantage point are likely to be distorted.
Of course, these fingerprints in our data are just one part of the picture. Given the increasing complexity of malicious actors, targets and attack vectors, our conclusions and actions must begin and end with the data itself. A strong cyber threat intelligence program should include proactive analysis of network traffic, testing of theories based on our understanding of human behavior, and ultimately, letting the data speak for itself to lead us where it will.