A Practical and Realistic Approach for Testing the Performance of Firewalls: What Your Security Vendors Don’t Want You to Know

While Internet security has always been important, only recently has it emerged as an important issue for almost all who use the Internet.

Rigorous security procedures and practices can prevent, or at least circumvent, most attacks on computer networks. An important part of a secure network usually includes firewalls.

In many networks, the first line of defense is a firewall. Firewalls allow limited access to networks from the Internet, only allowing approved inbound and outbound traffic according to rules set by an administrator. Think of them as fences with a few, well-chosen gates that restrict access to only these particular points.

Because firewalls form a crucial part of many networks, their performance and stability must be trusted. The tips below will help determine firewall performance and failure characteristics in practical installations through the help of real-world capacity assessment tools. For those that use intrusion detection systems as well, the assessment of IDSs can also follow the same ideas introduced below.

Testing Is Key

How do administrators determine the right devices for their network? Until recently, short of configuring a firewall and "going live" with it, system administrators had to base purchase decisions on marketing literature, word of mouth and magazine comparative tests. As expected, marketing literature always shines the best light on its products. Word of mouth can be unreliable - it may be based upon a single experience, an ulterior motive, or a different and inapplicable network design.

Comparative tests strive for objective, well-designed test methodologies. Conventional firewall testing focuses on long and short packet tests that attempt to find the highest no-drop-rate. However, real Internet traffic is mostly asymmetrical, with data packets significantly larger than TCP ACK packets (or UDP request packets). Firewalls also have unique performance behaviors. Some perform better with inbound traffic vs. outbound. Each will degrade differently when configured with complex rule sets and additional features like network/port address translation, URL logging, user authentication, and content filtering.

Because of lab environment limitations, the sake of highlighting performance, and the need to publish results of comparative tests, most firewall reviews are based on simple dual-homed configurations with unidirectional traffic. Real world applications rarely resemble the simple rule sets used in these reviews. Furthermore, even with realistic rule sets, each setting can have profound effects on firewall performance. Inferring performance of a configuration different from the published one is risky - while published results represent a best effort to simulate specific environments, they will not apply to every network.

In contrast to the limitations and risks in estimating performance, a reliable, realistic understanding of firewall and IDS performance requires comparative capacity assessments using your own environment and settings.

A Step by Step Methodology

We propose using HTTP as the main testing protocol. Http constitutes the largest percentage of IP traffic, combines long and short-lived connections, and has measurable application-level performance with data integrity checking.

Several devices are available for conducting assessments with HTTP, and choosing the right testing tool will not only increase your confidence in its results, but also save time and money. You will derive the most benefit from a device that will not only have the performance to scale well beyond the anticipated growth of your network, but also contains the right features and realism to ensure proper and reliable results. Important features include:

protocol support: HTTP 1.0/1.1, HTTP GET, HTTP POST; SSL
realistic connection behavior (explained below in the connection rate test);
browser emulation;
realistic link speeds;
packet loss simulation;
user profiling abilities;
load stepping ability;
URL scripts;
stateful user simulation;
ability to abort transactions;
real time statistics;
test results over time.

The closer the tool simulates real world Internet traffic, the more reliable the results will be.

Identifying Thresholds

Some tips to consider while assessing firewalls:

Configure the firewall to reflect settings used in production.
Firewalls exhibit different errors when overloaded, summarized in the section below. Keep track of these errors during an assessment to determine the load that causes these errors.
A firewall often increases its time to TCP SYN/ACK significantly when it reaches its maximum connection rate, leading to a growing number of outstanding connections. Use a test tool with real-time statistics that shows current connections to find the firewall's limitations.
Pay attention when determining the applied load to the firewall when failure occurs. Because of TCP retransmission and application-level timeouts, the load shown when the firewall exhibits the first failure may be higher than when the failure actually starts. Using small incremental steps in load and maintaining that load for a significant period of time will more accurately determine the load threshold.

Failure Characteristics of a Firewall

When firewalls overload, expect one or more of the following behaviors:

stops passing valid traffic;
leaks disallowed traffic;
application timeouts;
HTTP errors;
new incoming connectios timeout;
resets incoming connections and continues to process valid traffic;
accepts and maintains incoming connections but refuses to forward traffic;
accepts and maintains incoming user authentications but refuses to proxy;
failover of the primary firewall to the backup firewall.

Connection Rate Test

In assessing firewall capacity, start with this maximum connection rate test to establish a baseline for subsequent tests. This test has a high rate of connections opening and closing accompanied with fast retrievals of small objects. It approximates frequent accesses to the main page of a web server behind a firewall along with outbound traffic seen by the firewall from a site with many active users retrieving, say, stock quotes. Using small object sizes limits bandwidth utilization, which can be another limiting factor.

To properly gauge performance, be sure to use a robust test tool with realistic connections and data transfers. Some tools inaccurately open all the connections first, then transfer data, then close the connections. Others tools do not properly respond to connection errors, simply continuing to send traffic as if nothing happened. These tools make the firewall only do one thing at a time, creating performance results higher than in the real world. Real traffic will have a myriad of connections opening, transferring data and closing all at the same time, and the tool should support that.

The test should start at a rate known to produce error-free results. Configure the test with a predetermined percentage of illegal traffic, ensuring that the firewall does not leak disallowed traffic when overloaded. Step up the rate of connection arrivals until one or more failure behaviors described above are observed.

This test gauges the maximum arrival rate that the firewall can successfully bear under low throughput and low number of open connections. It also helps determine the firewall's resilience under stress by characterizing its failure behavior. For example, does it reset new connections when overloaded? Does it continue accepting new connections, hurting the performance and reliability of existing connections?

Maximum Connections Test

"Maximum open connections" is commonly used in evaluating firewall capacity. However, firewalls that run at only 9600 bps with 10,000 connections are inferior to ones that only maintain 1,000 maximum connections, but at the same time run with no throughput degradation. Therefore, basing results solely on this measure can be misleading, as a rapidly climbing number of open connections usually points to an overloaded system or one that is exhibiting latency.

During this test, verify that the outstanding connections still pass traffic after the firewall reaches a certain number of open connections. Since an HTTP transaction can stall mid-conversation if a packet from the server is not received by the client, have the test tool simulate an application-level timeout where it aborts transactions after a certain amount of time, resets the connection, and logs the failure as an application timeout. Run this test at the maximum rate determined in the previous test.

What are some ways to increase connection count? Inject long client and/or server latencies, or use persistent HTTP connections with long think times between subsequent accesses. For the latter method, configure the simulated user to sequentially access two or more URLs using a persistent HTTP connection. Each connection will retrieve an object from the server, remain open, and retrieve another object after a given think time, say 60 seconds. After retrieving the last URL in the script, the usual TCP FIN/FIN-ACK sequence closes the connection.

Bandwidth Test

While different sources disagree on the mean object size of HTTP transactions on the Internet, the number tends to be around 8-13k. Increasing the size of the object retrieved increases the utilized bandwidth in the test. Using the maximum error-free connection rate and open connections determined previously as constraints, run a test that gradually increases the connections per second until the maximum expected bandwidth or maximum open connections is reached.

Long-Term Stability Test

Firewalls not only maintain an important function in a secure network, they also lie in the path of network data traffic. As a result, high performance alone is not enough - they must also continue to function long-term without degradation in security, performance and uptime. Using the maximum traffic levels determined above, run a long term (several days), constant-load assessment against the firewall, checking that it maintains its performance, continues to enforce the security rules, and runs without failure.

Considering Realism

While the previous tests define specific, universally acceptable benchmarks for networks, they don't consider how realistic traffic will affect them. Realistic traffic depends on several factors, including the number and complexity of the security rules, number and rate of connections, bandwidth, and important characteristics of Internet traffic (e.g. packet loss, link latency, think time, application timeouts/click away).

Network Characteristics

The network that the firewall protects will have its own characteristics. A reliable capacity assessment will consider these characteristics, using them to ensure accurate results. Items to consider include network topology and packet loss rates. The network topology should reflect practical installation scenarios, including two and three tiered networks, DMZs, and inbound and outbound traffic rates. Packet loss, common on large networks like the Internet, can decimate network performance, especially for stateful devices like firewalls. The type of traffic will also affect performance, e.g. ICMP, GRE tunnels, IPsec tunnels, etc.

User Behavior Characteristics

Internet users often exhibit particular behaviors that affect performance, especially for stateful systems like a firewall. These behaviors include think times, click aways, protocol usage (UDP, TCP), link speeds, file mix, SSL usage, browser version, etc. Be sure to use a capacity assessment tool that realistically simulates all these factors for the most accurate results.

User Mix

During the deployment of a particular system, the mix and behavior of users can usually be ascertained. Keep in mind that the mix/behavior will vary from system to system, and more importantly, will change over time. As a result, the mix of traffic being simulated should evolve along with the changes occurring on a deployed system to improve the results. Capacity assessment tools will vary in their ability to support diverse mixes of traffic during a test - ones that provide the most flexibility will have the most longevity and hence, improved return on investment.

Summary

The rising tide of network attacks has made network security crucial to a well run network, with firewalls playing an important part of this security. Because of the complexities in securing network traffic, adding a firewall almost certainly reduces overall network performance. Understanding how the network performs under load with its complex components and security rules will not only help increase service levels, but reduce unexpected surprises. The most reliable way to gauge this performance before failures occur is through capacity assessment.

Capacity assessment now forms a critical part in robust system deployments. With the help of a robust capacity assessment tool and some quick tests, the network will benefit from increased reliability, efficiency and return on investment.

Johnson Wu ([email protected]), Philip Joung ([email protected]) and John Kenney ([email protected]) and are members of the engineering team at California-based Caw Networks, an authority in L4-7 real-world capacity assessment.