Dan Nadir
Dan Nadir
The web is a big place — and thanks to the dynamic nature of Web 2.0 applications and user-contributed content it grows bigger by the minute. According to Netcraft's latest survey, there are over 127 million active websites.  

While most malware in the past was distributed via email, more recently the web has been the primary attack vector used by malware authors to distribute malicious code. The mushrooming of the web and the browser's status as a critical tool has made it a prime target for cyber criminals. In a recent study, Google reported that in an in-depth analysis of 4.5 million websites over a 12-month period, it discovered 450,000 sites were successfully launching drive-by-downloads of malware code. A University of Washington study found that of all the sites with downloadable content they examined, six percentcontained malware.

The most deployed gateway web products today are used for URL filtering, which are in use in up to 95 percent of enterprise networks. URL filtering products were originally designed to increase employee productivity and limit legal liabilities by enforcing acceptable use policies for the web.

As trojans, keystroke loggers, rootkits and other web malware have become a major enterprise security issue, URL filtering companies have sought to remake themselves into security companies, and their URL filtering products have been repositioned from web productivity solutions to web security solutions. Products often claim comprehensive protection from web malware through their sizable URL databases and offer regular updates that are downloaded daily or more frequently from a central server.

But it has become glaringly apparent that URL filtering, which, after all, was not designed to stop malware, cannot keep pace with today's web threat environment. 

URL filtering can only be as effective as its database of categorized web sites. URL filtering solutions rely on visiting each URL or “crawling” the web in an effort to inventory “bad” sites.  In a recent ad, a leading URL filtering vendor claimed it crawled 80 million websites a day. While this sounds like a big number, keep in mind that it still leaves 45 million sites unexamined.

More troubling is that these figures don't take into account Web 2.0 websites that are powered by third-party and user-contributed content and as a result are constantly changing. For example, MySpace alone has 100 million accounts, each account with several different web pages and Wikipedia is a site with more than 7.9 million individual articles. So crawling 80 million websites per day is really just a drop in the bucket and leaves a lot of the web uncategorized. More importantly, if you are relying on a URL database, your users may be exposed to the threats that reside on “good” sites if there is a gap between when they were last scanned and when the malware is posted on the site.

According to Gartner, “URL filtering suffers a fundamental flaw to be an effective security filter: It does not monitor threats in real time.”

In fact, a third-party test using 200 known spyware samples revealed that an enterprise URL filtering product significantly underperformed a signature-based gateway anti-malware scanner, missing 31 percent of the keystroke loggers, browser hijackers, adware and other malicious code.

The onlyway to ensure that users are not infected by malware is to scan all content in real-time.

Using URL filtering to defend against malware is like reading yesterday's newspaper to find the current price of your favorite stock.

So why are most companies still relying on URL filtering to deliver protection from malware?

There was a time when most malware resided on suspect URLs, like porn or gambling sites. So deploying a URL filter and blocking user access to dodgy sites might have offered some protection from web-based malware. That day has come and gone.

Web threats are no longer restricted to dodgy sites. In today's web security world, threats are just as likely to be found on reputable, trusted sites. 

In fact, the past year there have been countless incidences of legitimate sites being found to host malware. MySpace, the Miami Dolphins website, Wikipedia and the Samsung website have all been contaminated with malware.

The decentralization of website content has made it easy for cybercriminals to inject malware onto unsuspecting sites. Malware is being inserted on web pages via insecure ad servers, compromised hosting networks, user-contributed content, and even through third-party widgets, commonly found on many legitimate sites.   

In May, the Tom's Hardwarewebsite, a popular technical product review site visited by thousands of tech savvy users, unknowingly hosted a malware infected ad which used the animated cursor (ANI) vulnerability to spread a trojan.  

Of even more concern is that cybercriminals are using very sophisticated tactics to seed malware on all types of sites. 

In late June, a fast-flux network, a disturbing advance in the development and use of bot networks, was used to spread malware via a flash movie on MySpace. Possibly 100,000 MySpace accounts were affected by the attack. In effect, this MySpace attack in June was a double-whammy, combining the insecurities inherent in many Web 2.0 sites with a powerful, new and incredibly stealthy distribution technique.

Unlike traditional “bot” networks, fast fluxnetworks abuse DNS to dynamically resolve an address to any number of infected PCs, as well as using the same technique to hide the control servers, which make them much harder to shut down. This high-tech game of Whack-a-Mole ensures that the offending site(s) are active for a much longer period of time.

If URL filtering provides good policy enforcement but not security, what does?

It's simple: Real-time scanning of web traffic is the only true defense against malware.

There's been a lot of hype surrounding real-time scanning of web traffic, but what does it mean and what does it need to encompass in order to be an effective defense against web-based malware?

First and foremost, real-time scanning means that all content on a URL is scanned in real-time every time it is requested. This is an important distinction from URL filtering — which merely filters URLs and compares them to a limited database of known categorized URLs. 

Effective real-time scanning should be powered by a combination of multiple detection technologies — which when used on their own to combat malware, can often fall short. However, when these techniques are combined in a cocktail approach, their strengths are leveraged and their shortcomings are mitigated.   

Signature-based detection: Signature-based engines are extremely effective at identifying and blocking known threats. Multiple signature-based engines form an important part of a multi-layered cocktail approach to real time scanning.

However, signature-based malware detection only works for known malware. It is not useful for new threats. Additionally, in order to be effective signatures must be delivered quickly and propagated — a time consuming task.

Heuristics: Using a rule of thumb to detect variants of known malware is an effective tool in the fight against malware. However, if your heuristics are too aggressive, you experience false positives. Also, heuristics are designed to increase the probability of detecting something that is similar to something that you have seen before. This means that a heuristic won't detect completely novel malware.

Code Analysis: The behavior of the code can be determined by modeling program logic, behavioral rules, and contextual system call analysis techniques that suggest good or bad intentions. 

Code reputation: Unlike URLs whose content can change, a binary can, in fact, have a reputation based on historical analysis. “Good” code can be treated differently than unknown or bad code.

URL Reputation: URL reputation is derived by examining parameters such as IP address information, country of the web server, history and age of the URL, domain registration information, network owner information, URL categorization information, and types of content present. 

URL reputation provides a “credit history” of sorts for a URL, but it does not provide current information about the safety of a URL. When looking at web safety, it is useful to remember what you learned in Investment 101: “Past performance does not predict future performance.” As we've seen, “good” websites today may host malware tomorrow. In the Web 2.0 world there are few examples of “good” websites that are guaranteed to be good forever.

Using URL reputation alone to defend against malware is like trying to know if it will rain today by checking to see if it rained yesterday.

Traffic Behavioral Analysis: Traffic behavior analysis identifies suspicious, atypical traffic that would suggest, for example, a new phishing scam or perhaps active malware communications from an infected notebook computer to a command-and-control computer.

Unlike reputation techniques, which are based on past behavior and provide valuable historical context, actively monitoring web-traffic patterns and anomalies provides a real-time look into emerging threats.Behavioral analysis of traffic, however, is only effective if it is based on a large volume of real world traffic.  

Using this cocktail of threat detection techniques in real-time provides a 360-degree view of the current web threat environment compared to the limited view you get when relying solely on URL filtering. It's the difference between seeing the full picture and just seeing one piece of the puzzle.

The 360-degree view of web threats that real time scanning delivers also allows the dots to be connected. Simply crawling the web for dangerous sites yields a random collection of bad sites that are seemingly unrelated. However, relying on multiple techniques – including those described above, real time scanning can provide critical information on the source of malware infection.

- Dan Nadir is vice president of product strategy at ScanSafe.