The NSA's SKYNET program earned more unwanted attention following a report from Ars Technica that analyzed NSA documents published by The Intercept last year, quoting a data scientist who said the NSA's use of a machine learning algorithm in monitoring the metadata of Pakistan's mobile network is “ridiculously optimistic” and “completely bull***t.”
Human Rights Data Analysis Group director of research Patrick Ball was more diplomatic in his correspondence with SCMagazine.com. He noted via Twitter that the Random Forests machine learning method used by the NSA's SKYNET program is a good algorithm, but he stated that the training data is used by the NSA insufficient and unrepresentative – and said the validation is inadequate.
There are no machine learning programs that have “anything close to 100 percent accuracy,” ProtectWise vice president of security research Jim Treinen told SCMagazine.com.
He echoed some of the concerns raised by Ball, noting that a weakness in the size of the training corpus used by the NSA could “run the risk of getting overly optimistic.”
The challenge of striking the right balance between false positives and false negatives is common among machine learning programs, and for this reason, machine learning algorithms are seldom the sole determinant in crucial decisions, either in business or military situations. “It's more likely that the algorithm is being used in an advisory functions,” Treinen said. “I would expect that human analysts are making the final decisions, or at least I would hope so.”