Magic Lantern: Shining a light on the AV numbers game?

You don't hear anything about the FBI's Magic Lantern spyware – sorry, policeware – for years, and then suddenly it's all over the place. Media-wise, at any rate: I don't have any exciting news of an epidemic of electronic surveillance, but there seems to be a lot of interest in Computer and Internet Protocol Address Verifier (CIPAV) again...

Only a week ago I wrote about it in Cybercrime Corner (following some enquiries from Kevin Townsend that he subsequently blogged about here), while I was in the early stages of a European workshop/conference grand tour of AMTSO, CARO and EICAR. And this week, at that selfsame EICAR conference (where I presented a paper, but I'll come back to that another time), Magic Lantern was referenced again in an interesting paper by Eric Filiol and Alan Zaccardelle called “Magic Lantern... Reloaded/Anti-Viral psychosis McAfee Case."

The paper – and presentation – referred to persistent rumors that McAfee and Symantec modified their products so as not to detect Magic Lantern, though I remain sceptical about that, much as I love a government conspiracy theory. But that was actually peripheral to the main thrust of the paper. The authors claimed to have uncovered some weaknesses in McAfee detection that I'm unable to comment on: no doubt McAfee themselves will have something to say on that in due course. And they (Filiol and Zaccardello, that is) made some interesting points on phased infection and a hypothetical attack by reactivating quarantined files.

Having had a couple of days to think about it, though, I think there was one issue on which they came to a wrong conclusion. They reported some inconsistent naming of ZouAV detections: again, I'm not in a position to confirm or refute their data, but their conclusion seemed to be that McAfee – and, by implication, the rest of the industry – exaggerates the size of the malware threat by detecting the same threat under more than one name. Well, it can happen that a single malicious binary is detected by different names at different times. There are obvious cases where this might happen: for instance, where a detection name is changed for consistency with a consensus of vendors, or where ongoing analysis of a threat uncovers new information that leads to reclassification by family. Less obviously, a different detection name may be triggered by the same base code according to context (vector, AV configuration, changes in the obfuscator, and so on).

Does this mean, then, as was suggested in the presentation, that because a single binary might be detected by different names, an AV company's estimate of the volume of the malware threat is likely to be exaggerated?

In general, no, it doesn't, because we're talking about two different things. The number of threats a company processes on a daily basis has nothing to do with the total number of threats it detects. ESET will not quote a total number of threats it detects. I can't speak for McAfee, but I doubt if it does either, because any such figure would, however accurate it might be in terms of the way a specific lab measures, be subjective and meaningless, and of value only as a marketing/PR exercise.

For every binary that might conceivably be detected under more than one name, there are detection names that might be applied to tens or hundreds of thousands of unique binaries. We have detection names that are applied to every sample that uses specific self-concealment or infection techniques known to be used by malware, and which we wouldn't expect to see used by legitimate software. That doesn't mean they're all the same program. It means we can use similarities in the way that they are presented to detect them generically. Thus, we detect a massive quantity of otherwise unrelated malicious programs as INF/Autorun simply because they use the same primary infection vector.

While some scanners use more heuristic and/or generic detections than others, no anti-virus product worth its salt uses the “one malicious program, one signature” model today.