Threat Intelligence, Network Security

Is an NSA algorithm really killing ‘thousands of innocent people’…probably not


Questions surrounding the use of metadata have continuously plagued the NSA. The agency has repeatedly come under fire for its controversial collection and use of metadata.

The NSA's SKYNET program earned more unwanted attention following a report from Ars Technica that analyzed NSA documents published by The Intercept last year, quoting a data scientist who said the NSA's use of a machine learning algorithm in monitoring the metadata of Pakistan's mobile network is “ridiculously optimistic” and “completely bull***t.”

Human Rights Data Analysis Group director of research Patrick Ball was more diplomatic in his correspondence with He noted via Twitter that the Random Forests machine learning method used by the NSA's SKYNET program is a good algorithm, but he stated that the training data is used by the NSA insufficient and unrepresentative – and said the validation is inadequate.

There are no machine learning programs that have “anything close to 100 percent accuracy,” ProtectWise vice president of security research Jim Treinen told

He echoed some of the concerns raised by Ball, noting that a weakness in the size of the training corpus used by the NSA could “run the risk of getting overly optimistic.”

The challenge of striking the right balance between false positives and false negatives is common among machine learning programs, and for this reason, machine learning algorithms are seldom the sole determinant in crucial decisions, either in business or military situations. “It's more likely that the algorithm is being used in an advisory functions,” Treinen said. “I would expect that human analysts are making the final decisions, or at least I would hope so.”

Get daily email updates

SC Media's daily must-read of the most current and pressing daily news

By clicking the Subscribe button below, you agree to SC Media Terms and Conditions and Privacy Policy.