The failure of URL filtering in an increasingly dangerous web world

The web is a big place — and thanks to the dynamic nature ofWeb 2.0 applications and user-contributed content it grows bigger by the minute.According to Netcraft's latest survey, there are over 127 million activewebsites.

While most malware in the past was distributed via email,more recently the web has been the primary attack vector used by malwareauthors to distribute malicious code. The mushrooming of the web and thebrowser's status as a critical tool has made it a prime target for cyber criminals.In a recent study, Google reported that in an in-depth analysis of 4.5 million websitesover a 12-month period, it discovered 450,000 sites were successfully launchingdrive-by-downloads of malware code. A University of Washingtonstudy found that of all the sites with downloadable content they examined, sixpercentcontained malware.

The most deployed gateway web products today are used for URLfiltering, which are in use in up to 95 percent of enterprise networks. URLfiltering products were originally designed to increase employee productivityand limit legal liabilities by enforcing acceptable use policies for the web.

As trojans, keystroke loggers, rootkits and other webmalware have become a major enterprise security issue, URL filtering companieshave sought to remake themselves into security companies, and their URLfiltering products have been repositioned from web productivity solutions to web securitysolutions. Products often claim comprehensive protection from web malwarethrough their sizable URL databases and offer regular updates that aredownloaded daily or more frequently from a central server.

But it has become glaringly apparent that URL filtering,which, after all, was not designed to stop malware, cannot keep pace withtoday's web threat environment.

URL filtering can only be as effective as its database ofcategorized web sites. URL filtering solutions rely on visiting each URL or “crawling”the web in an effort to inventory “bad” sites. In a recent ad, a leading URL filtering vendorclaimed it crawled 80 million websites a day. While this sounds like a bignumber, keep in mind that it still leaves 45 million sites unexamined.

More troubling is that these figures don't take into accountWeb 2.0 websites that are powered by third-party and user-contributed contentand as a result are constantly changing. For example, MySpace alone has 100million accounts, each account with several different web pages and Wikipediais a site with more than 7.9 million individual articles. So crawling 80 millionwebsites per day is really just a drop in the bucket and leaves a lot of the webuncategorized. More importantly, if you are relying on a URL database, yourusers may be exposed to the threats that reside on “good” sites if there is agap between when they were last scanned and when the malware is posted on thesite.

According to Gartner, “URL filtering suffers a fundamentalflaw to be an effective security filter: It does not monitor threats in realtime.”

In fact, a third-party test using 200 known spyware samplesrevealed that an enterprise URL filtering product significantly underperformeda signature-based gateway anti-malware scanner, missing 31 percent of thekeystroke loggers, browser hijackers, adware and other malicious code.

The onlyway toensure that users are not infected by malware is to scan all content inreal-time.

Using URL filtering to defend against malware is likereading yesterday's newspaper to find the current price of your favorite stock.

So why are most companies still relying on URL filtering to deliverprotection from malware?

There was a time when most malware resided on suspect URLs,like porn or gambling sites. So deploying a URL filter and blocking user accessto dodgy sites might have offered some protection from web-based malware. Thatday has come and gone.

Web threats are no longer restricted to dodgy sites. Intoday's web security world, threats are just as likely to be found onreputable, trusted sites.

In fact, the past year there have been countless incidencesof legitimate sites being found to host malware. MySpace, the Miami Dolphinswebsite, Wikipedia and the Samsung website have all been contaminated withmalware.

The decentralization of website content has made it easy forcybercriminals to inject malware onto unsuspecting sites. Malware is beinginserted on web pages via insecure ad servers, compromised hosting networks,user-contributed content, and even through third-party widgets, commonly foundon many legitimate sites.

In May, the Tom's Hardwarewebsite, a popular technical product review site visited by thousands oftech savvy users, unknowingly hosted a malware infected ad which used theanimated cursor (ANI) vulnerability to spread a trojan.

Of even more concern is that cybercriminals are using verysophisticated tactics to seed malware on all types of sites.

In late June, a fast-fluxnetwork, a disturbing advance in the development and use of bot networks, wasused to spread malware via a flash movie on MySpace. Possibly 100,000 MySpaceaccounts were affected by the attack. In effect, this MySpace attack in Junewas a double-whammy, combining the insecurities inherent in many Web 2.0 siteswith a powerful, new and incredibly stealthy distribution technique.

Unlike traditional “bot” networks, fast fluxnetworks abuse DNS to dynamicallyresolve an address to any number of infected PCs, as well as using the sametechnique to hide the control servers, which make them much harder to shutdown. This high-tech game of Whack-a-Mole ensures that the offending site(s)are active for a much longer period of time.

If URL filtering provides good policy enforcement but notsecurity, what does?

It's simple: Real-time scanning of web traffic is the onlytrue defense against malware.

There's been a lot of hype surrounding real-time scanning ofweb traffic, but what does it mean and what does it need to encompass in orderto be an effective defense against web-based malware?

First and foremost, real-time scanning means that allcontent on a URL is scanned in real-time every time it is requested. This is animportant distinction from URL filtering — which merely filters URLs andcompares them to a limited database of known categorized URLs.

Effective real-time scanning should be powered by acombination of multiple detection technologies — which when used on their ownto combat malware, can often fall short. However, when these techniques arecombined in a cocktail approach, their strengths are leveraged and theirshortcomings are mitigated.

Signature-baseddetection: Signature-based engines are extremely effective at identifyingand blocking known threats. Multiple signature-based engines form an importantpart of a multi-layered cocktail approach to real time scanning.

However, signature-based malware detection only works forknown malware. It is not useful for new threats. Additionally, in order to beeffective signatures must be delivered quickly and propagated — a timeconsuming task.

Heuristics: Usinga rule of thumb to detect variants ofknown malware is an effective tool in the fight against malware. However, ifyour heuristics are too aggressive, you experience false positives. Also,heuristics are designed to increase the probability of detecting something thatis similar to something that you have seen before. This means that a heuristicwon't detect completely novel malware.

Code Analysis: Thebehavior of the code can be determined by modeling program logic, behavioralrules, and contextual system call analysis techniques that suggest good or badintentions.

Code reputation: UnlikeURLs whose content can change, a binary can, in fact, have a reputation basedon historical analysis. “Good” code can be treated differently than unknown orbad code.

URL Reputation: URLreputation is derived by examining parameters such as IP address information,country of the web server, history and age of the URL, domain registrationinformation, network owner information, URL categorization information, andtypes of content present.

URL reputation provides a “credit history” of sorts for aURL, but it does not provide current information about the safety of a URL. Whenlooking at web safety, it is useful to remember what you learned in Investment101: “Past performance does not predict future performance.” As we've seen,“good” websites today may host malware tomorrow. In the Web 2.0 world there arefew examples of “good” websites that are guaranteed to be good forever.

Using URL reputation alone to defend against malware is liketrying to know if it will rain today by checking to see if it rained yesterday.

Traffic Behavioral Analysis:Traffic behavior analysis identifies suspicious, atypical traffic that wouldsuggest, for example, a new phishing scam or perhaps active malware communicationsfrom an infected notebook computer to a command-and-control computer.

Unlike reputation techniques, which are based on pastbehavior and provide valuable historical context, actively monitoring web-trafficpatterns and anomalies provides a real-time look into emerging threats.Behavioral analysis of traffic, however, is only effectiveif it is based on a large volume of real world traffic.

Using this cocktail of threat detection techniques inreal-time provides a 360-degree view of the current web threat environmentcompared to the limited view you get when relying solely on URL filtering. It'sthe difference between seeing the full picture and just seeing one piece of thepuzzle.

The 360-degree view of web threats that real time scanningdelivers also allows the dots to be connected. Simply crawling the web fordangerous sites yields a random collection of bad sites that are seeminglyunrelated. However, relying on multiple techniques – including those describedabove, real time scanning can provide critical information on the source ofmalware infection.

- Dan Nadir is vice president of product strategy at ScanSafe.