Firewall
Load Balancing
John Paolillo
About four years ago, I was asked to design a completely redundant,
scalable e-commerce infrastructure consisting of multiple demilitarized
zones (DMZs), with no single point of failures. In other words,
this infrastructure would break up the layers of firewalls to protect
the DMZ for Web server clustering, and the next layer of firewalls
would protect the application server cluster, and eventually the
internal network would consist of database servers, LDAP servers,
and backend systems.
In this article, I will describe how to set up your firewalls
and separate DMZ subnets. I will show how to create a single firewall
policy that can be loaded to each cluster of load-balanced firewalls
across separate data centers to maximize your company's return
on investment and lower your cost of ownership. I will also discuss
how to reduce the basic mistakes involved with creating the same
firewall objects and policies if you have to manage multiple firewall
policies.
Clustering (or load-balancing technology) has been around for
a while, but there haven't always been a lot of options. You
could easily load balance a Web server farm with software solutions
(such as Resonate) or hardware solutions (such as Cisco's local
director). Databases could be clustered in failover mode (High Availability)
with Veritas Cluster or Sun Cluster. My goal, however, was to truly
load-balance firewalls much like you would with a cluster of Web
servers.
The challenge was to ensure that a session that originally traversed
one firewall would return through that same firewall or be blocked
by a different firewall in that cluster. This setup becomes more
complicated if you want to leverage NAT (Network Address Translation)
to protect some of your services.
The problem with using NAT through a firewall is that the original
IP addresses that are used to determine through which firewall to
send a session will change during the NAT phase in that firewall.
Therefore, the algorithm used to determine the return path may fail
to select the same firewall.
To resolve this, the packet must be returned through the original
firewall using the MAC SA stored in the firewall load balancer's
(FWLB) session table. You will also need redundant FWLBs on both
sides of your network, thereby removing that as a single point of
failure. What if you also want to make this as transparent as possible
for your firewalls so they can share the same routing rules even
though they are connected to different firewall load balancers?
For a firewall to route traffic through its multiple interfaces,
the IP interfaces of the corresponding devices to which it will
be routing that traffic must be defined. The challenge is that in
a normal setup, each firewall would have the IP address of the interface
of the corresponding FWLB to which it is connected in each firewall's
routing table. Since these IP addresses are different, you wouldn't
be able to share these tables across your firewalls at each cluster.
The solution is to implement Virtual Redundant Routing Protocol
(VRRP) at the FWLB layer, which I will address in this article.
Thus, I wanted to build a fully load balanced, vertically scalable
firewall solution without increasing the complexity of managing
these firewalls. To do this, I chose a layer 4 switching solution
(provided at that time by Alteon, which was acquired by Nortel Networks
in 2000) to handle the management of the distribution of load balancing
sessions through the firewall layer(s), while minimizing
routing table maintenance on the firewalls. I wanted to further
hide the firewalls by configuring the internal FWLB network to an
un-routable address space per RFC 1918 (http://www.ietf.org/rfc/rfc1918.txt?number=1918),
such as 10.0.0.0 for a Class A network. This would allow for the
creation of a single firewall policy not only across a firewall
cluster, but across multiple clusters across data centers. Administration
of the firewalls was to be via a separate management network; therefore
routing access was simplified in addition to tightening the policy
rules.
I mentioned through because in this solution, we test continuity
across the firewall to the other side to ensure that not only is
the firewall (that's directly connected to the FWLB) available,
but so is the network behind it. And I specified firewall layers(s)
because you may need to define a load-balancing rule to traverse
more than one layer of firewalls, such as a rule to allow internal
staff to access a proxy server on the outermost DMZ by simply specifying
a FWLB filter to test across both firewall layers, instead of one
layer at a time. This solution has provided 100% uptime throughout,
regardless of firewall, FWLB or router/switch failures.
Design and Implementation
For the sake of simplicity and clarification, I'll focus
on a single firewall cluster layer. You may notice in this article
that a multi-layered solution is nothing more than managing your
routes and redirection rules to achieve your desired goals. I will
start with basic FWLB design, where the Internet will act as the
dirty network and the Web DMZ will act as the clean network (Figure
1). I'll then jump to the finished design consisting of the
application servers (DMZ2) and connection to the internal corporate
network. Keep in mind that at each DMZ, you can have many services
and load balance them, or cluster them independently as I did. In
this article, however, I'll define how to do one FWLB group
and provide steps for the rest.
The FWLBs used were Alteon's layer 4-7 switches. The firewalls
in this example are Checkpoint on Solaris. You can also do a subset
of this to load balance, for example, your merchant network if your
company is performing transactions with other companies via PVC
(or dedicated secure networks).
I will set up Virtual Router Redundancy Protocol (VRRP) to simplify
the ingress traffic from the external routers, default and static
routes for the firewalls, as well as for the default route of the
servers in the clean DMZ. VRRP works much like Hot Standby Router
Protocol -- if you're familiar with Cisco products --
and essentially is a Virtual Router ID that provides alternate route
paths to eliminate SPF. Note that the L4 switches, on their internal
interfaces (facing the firewalls, physical port 7) have two IP interfaces
defined, one 24 bit and the other 32 to guarantee that traffic between
the upper and lower switches communicate always via the same route.
For example, in Figure 1, when "Internal Switch 1" attempts
to connect to "External Switch 2", it will contact that
switch's IP 10.0.1.21 as the destination IP (DIP) and will
have a route through "Internal Switch 2" with its source
IP (SIP) = 10.0.3.11. Likewise, the return packet from "External
Switch 2" will have an SIP of 10.0.1.21 and DIP of 10.0.3.11
and that switch will have a static route via firewall FW-B's
interface = 10.0.1.31, thereby guaranteeing the return packet will
traverse the same firewall it originally went through.
I will tackle this setup in the following order, starting with
the external switches:
- Configure firewalls and routers with their IP definitions and
routing requirements.
- On the switches, configure VLANs, IP interfaces, trunks, and
static routes and information.
- Configure VRRP settings, real interfaces, SLB group definitions,
and filters.
- Repeat for the internal (clean side) switches and test.
A common step in these types of installations is to make the Sun
system a firewall first and then configure the switches. I find
that the reverse saves time and reduces possible points of failure
when testing your load balancing rules. I want to eliminate anything
that may be blocking necessary services used by the FWLB to test
and determine correct path to send a session (e.g., ICMP, HTTP,
etc.) or routing exceptions such as IP spoofing definitions on network
segments.
To begin, configure the interfaces on the UNIX system and create
the routes as necessary. Enable ip_forwarding (not in.routed or
any other routing daemon) on your UNIX system to test connectivity
and load balancing across the switches. Since the firewall software
is not installed yet, ensure that traffic is returning on the same
firewall it came from by snooping on those interfaces. Once this
has been tested, install your firewall software and create the appropriate
rules to define all the interfaces, switches, and health monitoring
protocols you will need to allow between the switches. In the case
of FW-A, create a script in /etc/rc3.d, say S95routing_table and
include the ip_forwarding statement:
Route add -net 192.168.100.0 10.0.3.1 1
Route add -net 73.10.20.0 10.0.3.1 1
/usr/sbin/ndd -set /dev/ip ip_forwarding 1
Make sure to remove this line when the firewall is installed and operational
as you would want IP forwarding to be managed by Checkpoint. In /etc/defaultrouter,
you can just type 10.0.1.1 since all unknown networks would be from
the Internet source. The same routes will be needed on FW-B.
Since we're using VRRP to create a VRID to ease routing for
our firewalls, we do not need to differentiate between the IP addresses
of each switch for routing purposes.
Next we can configure the interfaces, VLANs, and static routes,
STP, trunks, etc. on our switches. For the sake of speed, I'll
address the necessary settings for the externally facing switches.
Make the appropriate changes for "External Switch 2" matching
the diagram.
Note: I recommend redundancy of your trunk ports. The manuals
will suggest the use of port 9, but I have seen failure of a trunk
port, which will affect your FWLB completely; therefore I suggest
using ports 8 and 9 on each switch as your trunk ports. Bear this
in mind when defining your VLANs (Figure 2). Tag ports 8 and 9 so
you can use them in other VLANs:
/cfg/port 8
tag ena
/cfg/port 9
tag ena
You need two VLANs for your external switches (dirty side):
/cfg/vlan 2
add 1
add 8
add 9
ena
Set up your trunk definition:
/cfg/trunk 1
add 8
add 9
ena
By default, VLAN 1 will have all of the other ports. Ensure that ports
8 and 9 are in that VLAN as well since they are tagged. Next, set
up your interface definitions on those same switches:
/cfg/ip/if 1
addr 73.0.10.251
mask 255.255.255.0
broad 73.0.10.255
vlan 2
ena
/cfg/ip/if 2
addr 10.0.1.10
mask 255.255.255.0
broad 10.0.1.255
ena
/cfg/ip/if 3
addr 10.0.1.11
mask 255.255.255.255 (not a typo! Reason previously defined)
broad 10.0.1.11
ena
Note that if you configure the mask prior to the IP address, the broadcast
will be automatically calculated. I did it the hard way so you know
when you see this that it isn't an error.
Define the static routes. Depending on whether you're using
version 8.3.X or 9.X, some of the commands may differ slightly.
But for simplicity, I'll use 8.3 conventions for /cfg/ip/route;
whereas in 9.X it would be /cfg/ip/frwd/route (Figure 3).
/cfg/ip/route Bind source interface
add 10.0.3.10 255.255.255.255 10.0.1.30 2
add 10.0.3.20 255.255.255.255 10.0.1.30 2
add 10.0.3.11 255.255.255.255 10.0.1.31 3
add 10.0.3.21 255.255.255.255 10.0.1.31 3
Set up the default gateway, which in this case is the HSRP of the
Internet routers:
/cfg/ip/gw 1
ena
addr 73.0.10.249
Ensure STP (Spanning Tree) is off:
/cfg/stp/off
Apply and save:
Apply
Save
/boot/reset
Perform similar settings for the other "External Switch 2"
based on the diagram. Similarly, configure "Internal Switch 1"
with the required settings. I'll speed it up and use shortcut
conventions to reduce this document.
/cfg/port 8/tag ena
/cfg/port 9/tag ena
/cfg/vlan 2/add 1/add 8/add 9/ena
Ensure that ports 8 and 9 are in default VLAN as well!
/cfg/trunk 1/add 8/add 9/ena (setup your trunk definition)
(mask before address so no need for broadcast, save yourself a step!)
/cfg/ip/if 1/mask 255.255.255.0/addr 192.168.100.2/ena
/cfg/ip/if 2/mask 255.255.255.0/addr 10.0.3.10/ena
/cfg/ip/if 3/mask 255.255.255.255/addr 10.0.3.11/ena
/cfg/ip/if 4/mask 255.255.255.255/addr 73.0.20.2/ena
/cfg/ip/route
add 10.0.1.10 255.255.255.255 10.0.3.30 2
add 10.0.1.20 255.255.255.255 10.0.3.30 2
add 10.0.1.11 255.255.255.255 10.0.3.31 3
add 10.0.1.21 255.255.255.255 10.0.3.31 3
/cfg/stp/off
apply
save
/boot/reset
Perform similar settings on secondary switch. Be sure to define in
your rules the IP interfaces that are assigned to the appropriate
switch.
Next you should verify that you can ping across the firewalls
and snoop on the interfaces on the interfaces of the firewalls to
ensure that traffic is returning properly across the same firewall
from which it was initiated.
A shortcut for maintaining and keeping your load-balancing definitions
across the two switches is to set them up as peers. This way when
you create a filter that you want activated, you simply need to
sync the peer switch to have that definition created there as well
and applied. On the primary dirty-side switch (External Switch 1):
/cfg/slb
on
sync/peer 1
addr 73.0.10.252
ena
apply
save
Perform the same for External Switch 2 but change the address to:
73.0.10.251/cfg/slb/sync/peer 1/ena/addr 73.0.10.251
Repeat for the internal switches but use 192.168.100.2 and 192.168.100.3,
respectively. Then when you create a filter on one switch and want
to synchronize the adjacent switch, just type: /oper/slb/sync.
Although this setup will allow bi-directional synchronization,
it is a good rule to always modify the same switch at each layer
and perform the sync command from there. Likewise, you must perform
the same procedure for the clean-side switches and their corresponding
peer addresses.
The FWLB switches manage firewall health monitoring by checking
the real IP (rip, not to be confused with the routing protocol)
addresses across the firewalls assigned to the interfaces of the
opposite FWLB switches. You define them in your local switch, include
them in a newly created real server group to be used later on in
a L4 load balancing filter, and assign which method will handle
your distribution algorithm. We'll use hash, and for health
checking, we'll use ICMP (layer 3 echo).
There is an important difference between 8.3.X and 9.X when performing
NATs through the firewalls or when performing other metrics for
distributing load other than hash. As mentioned previously, the
problem with using NAT through a firewall is that the original IP
addresses that Alteon uses to determine the firewall through which
to send that session will change during the NAT phase in that firewall.
Thus, the hash algorithm will fail to choose the same firewall when
returning that packet. In the earlier release, Alteon introduced
an option called VPN, which essentially returns that packet to the
corresponding firewall using that MAC SA stored in its session table.
In version 9.X, Alteon more appropriately changed the naming convention
to RTS (Return To Sender). Also, on the ports where you need to
manage the session state, those ports must not have L4 filters assigned
to them. In our case:
/cfg/slb/port 7/vpn ena (or "rts ena" if using IOS 9.X)
/cfg/slb/port 8/vpn ena
/cfg/slb/port 9/vpn ena
Define the real IP addresses in the dirty (Internet) side (Figure
4):
/cfg/slb/on
/cfg/slb/real 1/rip 10.0.3.10/ena
/cfg/slb/real 2/rip 10.0.3.11/ena
/cfg/slb/real 3/rip 10.0.3.20/ena
/cfg/slb/real 4/rip 10.0.3.21/ena
/cfg/slb/group 1/metric hash/health icmp/add 1/add 2/add 3/add 4/
apply
save
Repeat these steps on the adjacent switch. A good command to know
in order to check the state of your real servers, redirect filters,
ports, and such is: /info/slb/dump.
For simplicity, let's say that our external network is small
subnet 73.0.10.0/248 and our ISP has provided 73.0.20.0/24 for our
DMZ-based services. It could have been a single 73.0.10.0/24 network,
which would have introduced additional complexity as far as the
class routing and filters required to avoid redirecting those IP
addresses through the firewalls. For this simplified approach, we
simply need three filters -- the first two to avoid redirecting
the local traffic and VRRP across the firewalls and the third to
redirect anything else to the predefined group (Web cluster) through
the firewalls. We could have set up that last filter with the actual
dip (destination IP) and dmask (destination mask).
/cfg/slb/filt 10/dip 73.0.10.0/dmask 255.255.255.248/ena
/cfg/slb/filt 20/dip 224.0.0.0/dmask 255.255.255.0/ena
/cfg/slb/filt 30/group 1/action redir/ena
Next, bind these filters to the port, which is connected to the Internet
side (Port 1):
/cfg/slb/port 1/filt ena/add 10/add 20/add 30
Remember to either manually perform these steps on other switches
or synchronize them via your peer definition.
We need to set up the VRRP configurations for the VRID (virtual
router id), to which the external routers will be passing the 73.0.20.0
traffic (73.0.10.250), and the VRID to which the firewalls will
be passing their traffic (10.0.1.1). (See Figure 5.) Note that,
like HSRP, one of the switches must have a higher priority for the
VRIDs. We'll make "External Switch 1" priority 101
and "External Switch 2" priority 100.
/cfg/vrrp/on
/cfg/vrrp/vr 1/ena/addr 73.0.10.250/if 1/share dis/prio 101/track/ifs
ena/ports ena/share dis
/cfg/vrrp/vr 2/ena/addr 10.0.1.1/if 2/share dis/prio 101/track/ifs ena/ports
ena/share dis
Be sure to configure the secondary switch's VRRP priority definition
to 100!
On the Internal switches, you will perform everything I've
shown so far but using the External switches information. Therefore:
Enable VPN or RTS:
/cfg/slb/port 7/vpn ena (or "rts ena" if using IOS 9.X)
/cfg/slb/port 8/vpn ena
/cfg/slb/port 9/vpn ena
Define the real IP addresses in the dirty (Internet) side:
/cfg/slb/on
/cfg/slb/real 1/rip 10.0.1.10/ena
/cfg/slb/real 2/rip 10.0.1.11/ena
/cfg/slb/real 3/rip 10.0.1.20/ena
/cfg/slb/real 4/rip 10.0.1.21/ena
/cfg/slb/group 1/metric hash/ health icmp/add 1/add 2/add 3/add 4/
apply
save
Repeat these steps on the adjacent switch.
Set up your redirection filters:
/cfg/slb/filt 10/dip 192.168.100.0/ dmask 255.255.255.0/ena
/cfg/slb/filt 15/dip 73.0.20.0/ dmask 255.255.255.0/ena
/cfg/slb/filt 20/dip 224.0.0.0/ dmask 255.255.255.0/ena
/cfg/slb/filt 30/group 1/action redir/ena
Bind the filters to the port, which is connected to the Web server
cluster (Port 1):
/cfg/slb/port 1/filt ena/add 10/add 15/add 20/add 30
Ensure that you are performing these steps on the other switches or
synchronizing them.
Set up VRRP definitions and priority:
/cfg/vrrp/on
/cfg/vrrp/vr 1/ena/addr 192.168.100.1/if 1/share dis/prio 101/track/ifs
ena/ports ena/share dis
/cfg/vrrp/vr 2/ena/addr 73.0.20.1/if 4/share dis/prio 101/track/ifs
ena/ports ena/share dis
/cfg/vrrp/vr 3/ena/addr 10.0.3.1/if 2/share dis/prio 101/track/ifs ena/ports
ena/share dis
When you do the secondary switch, be sure to change that priority
to 100!
You may be wondering why the internal switches have interface
definitions and VRID existing in the 73.0.20.0 network. If you've
ever worked with IPSec solutions, for example a VPN, you would not
be able to NAT that address since some applications contain it within
the session and thus will not allow your encrypted session to establish.
In this case, you can keep the 73.0.20.X address and have a default
route for that device of your pre-defined VRID 73.0.20.1. In your
firewall rule, be sure to include an allow rule above the NAT rule
so it passes through without being altered.
I strongly suggest protecting your switches as much as possible,
which is why I advise creating ACLs on your Internet routers to
block any incoming traffic to the interface definitions on those
switches. On the external switches, you should also create deny
filters on the externally facing ports.
Okay, that was easy, but what if what we really wanted was a network
that resembles Figure 6, where the Intranet connected switches can
load balance across both layers of firewalls to the primary DMZ
(192.168.100.0) without having to load balance each layer independently.
Believe it or not, it's not that hard. If you have followed
this article so far, then you will realize that all you need to
do is set up the required routing, VRIDs, VLANs, and Real IP interfaces
across the point-to-point switches to create a SLB group consisting
of those rip addresses defined on the opposite switches (not every
single one in between).
Therefore, the Intranet connected switches would additionally
require Real IP addresses consisting of the interface information
of the switches connected to DMZ1 and an SLB group to use in a filter,
for example, if your proxy server or mail relay happened to be on
DMZ1. The same holds true for the switches connected to DMZ1 bi-directionally.
They would require the Real IP interfaces of the switches connected
to the Intranet network. Routing would be defined across to allow
the ICMP health to work (Figure 7).
The next image (Figure 8) reflects the load balancing groups created
in order to redirect traffic bi-directionally from the Internet
to DMZ1 (HTTP(S), SMTP, DNS, VPN, etc.), from DMZ1 (Web cluster)
to DMZ2 (application server cluster), from DMZ1 to Intranet (mail
relay, proxy services, messaging services, authentication protocols,
etc.), from DMZ2 to Intranet (LDAP, database access, legacy systems
access, messaging services, etc.).
What is important to understand in this article are the concepts
required to achieve the goal of firewall load balancing. Whatever
load balancer you choose should be able to perform, in its own way,
the same functionality (or better).
John Paolillo specializes in design and implementation of Internet
e-commerce infrastructure and network security solutions for SIS
(Secure Internet Solutions) out of New Jersey. He has a 16-year
background in UNIX and networking solutions from Police/Fire to
fortune 200 financial institutions. He can be reached at: Paolillo@s--i--s.com.
|