NSA's website outage due to lack of topological "diversity"'An easy-to-fix -- but often overlooked -- problem most likely took the National Security Agency's website and its mail services down for six or seven hours on Thursday, according to a security researcher at Arbor Networks.
Visitors couldn't reach the NSA's NSA.gov site because of misconfigured Domain Name Servers (DNS), Danny McPherson, chief research officer with Arbor, told SCMagazineUS.com on Friday. McPherson wouldn't speculate on the precise nature of the misconfiguration, but he had some ideas about why the outage occurred.
His analysis indicates that NSA committed several basic mistakes in the configuration of its DNS systems.
First, a web server was running on the same computer or the same IP address as one of the so-called authoritative name servers for nsa.gov. The authoritative name servers are the primary and secondary servers that translate the web addresses humans understand (i.e., NSA.gov) to machine-readable IP addresses (in the NSA.gov case, 18.104.22.168).
Moreover, the primary and secondary authoritative name servers were both downstream from the Qwest edge access router in Washington, D.C. They should have been separated topologically within the network infrastructure, according to McPherson.
This indicated that the architecture of the NSA's authoritative name servers lacked what McPherson called "diversity." In this context, diversity means placing the authoritative DNS servers in both geographically and topologically diverse locations, he explained.
The Internet Engineering Task Force's (IETF) RFP 2182 outlines what McPherson called industry best practices for deploying DNS servers. These stress configuration best-practices, such as ensuring that all authoritative servers run the identical copy of what is called the zone file, which maps addresses within the domain, and the physical and topological location of authoritative servers.
In addition, the guidelines suggest that authoritative servers not be placed within the same building or even city to avoid power-related outages. Moreover, they should not be connected to the same switch or router within an organization's network infrastructure, he said.
Google is a major offender of the IETF's diversity guidelines, according to McPherson. "If you take a look at Google, it doesn't have diversity because all of its name servers are on a single network block of IP addresses," he explained.
McPherson pointed out that YouTube recently experienced intermittent outages because of a similar lack of DNS diversity. And Microsoft's Hotmail web mail service suffered a similar blackout about 10 years ago due to the same lack of DNS diversity, he said.
"The NSA people had their name servers close together from a network topological perspective, and if they'd had diversity in their DNS architecture, the outage wouldn't have occurred," McPherson said. "This isn't something that takes much capital to fix."
McPherson said that the primary ‘control plane' protocols in use on the internet, namely DNS for name resolution and BGP [Border Gateway Protocol] for internet routing, are two of the weakest links in the availability and security chain.
"They are often overlooked simply because of their ubiquitous use and characteristics that make them essentially transparent to most users. Diversity and security of this infrastructure is critical.”