Network
Management with Overlapping IP Address Ranges
Scott Kirkwood
Every UNIX systems administrator has some knowledge of networks
and routing, and most have basic experience with network devices.
For example, configuring NFS, DNS, or NIS are all common network
configuration tasks for a sys admin, but the network infrastructure
does not typically require more than an Ethernet cable, an IP address,
and an entry in the DNS server for each new Unix system. There are,
however, certain implementations in which the network infrastructure
must be designed as an integral component of the systems architecture.
This article will detail just such a scenario, and will show how
expanding your knowledge of network infrastructure configuration
can make the difference between a dead project and a functional
architecture.
The Scenario
Service providers have always been necessary in any enterprise
WAN, supplying the telecommunications circuits and WAN bandwidth
needed to operate a business network. However, the advances in transmission
technology along with increased competition in the telecom sector
have resulted in gluts of bandwidth available to enterprise clients
for ever-decreasing costs. A service provider can no longer sell
simple T1, T3, and OC3 service to companies and expect to stay in
business.
Most service providers have looked deeper into the enterprise
network for additional services to sell, and network management
is an obvious target. The result is managed network services from
telecommunications carriers. The enterprise client benefits by reducing
its need for expensive WAN savvy employees, and gains the peace
of mind that the service provider has a cutting-edge NMS system
monitoring its WAN 24x7. That's how the marketing brochures
read, and it sounds simple -- until a UNIX administrator actually
tries to implement the Network Management Systems. This case study
outlines the technological pitfalls that are inherent in such a
systems design and the network solutions that can make it all work
the way the marketing brochures advertise.
The Problem
The problem of managing hundreds of separate networks from a single
NMS point appears to be one of scale and security. The most likely
issue seems to be getting a system with enough power to monitor
hundreds of thousands of network devices in real time. However,
a bit of research reveals a greater problem to the configuration
that cannot be overcome with more processing power or extra memory.
Most enterprise companies are not in possession of large blocks
of private IP addresses, and have implemented RFC 1918 public IP
addressing (e.g., 10.0.0.0/8 or 192.168.0.0/16) on their internal
corporate network. This solution, along with Network Address Translation
or proxy service for Internet access works fine for a single enterprise
company. However, the service provider supplying NMS service may
have hundreds of companies with overlapping IP address ranges, and
hundreds of devices to manage with identical IP addresses.
This results in a fundamental problem for the systems administrator,
and one that cannot be overcome with additional software or hardware.
The issue is actually external to the UNIX system, but will impact
the systems design if not properly corrected. Neither the network
architect, nor the systems architect can individually provide a
solution to this problem. It takes a combination of the two, and
the UNIX administrator will likely need to drive the effort to completion.
The problem is not one of network management, because most modern
NMS software actually references devices in its database by the
hostname collected during SNMP polling. The NMS software does not
care whether there are hundreds of devices with identical IP addresses,
because the devices are being indexed in a database by other means.
The issue here is one of basic IP routing, as depicted by Figure
1.
The basic questions being asked in this configuration are: How
do you get network packets to find the proper destination when there
may be 200 devices with the same IP address, and how do the response
packets from these devices find their way back to the NMS system?
Network Address Translation (NAT) would be the logical solution,
but the nature of the Simple Network Management Protocol (SNMP)
makes that solution impossible. SNMP carries the IP addresses of
managed devices as payload data within its packets, and NAT cannot
effectively perform payload translation for SNMP. At the time of
this implementation, there were a few proprietary solutions in existence
that could perform this task, but none were available in a commercially
viable product.
So the task is this: Design a UNIX system and network architecture
that will allow NMS software to perform SNMP monitoring of hundreds
of networks with overlapping IP addresses, and correlate all the
information into a single display for centralized support. The following
requirements were made of the design, because of corporate hardware
standards and platform specifications of the NMS Software:
- Sun hardware for UNIX systems
- Solaris 8 operating system
- Cisco routers and switches
- Smarts In-Charge NMS software
- Complete separation of customer network traffic for security
The Solution
This problem can be solved neither in the network, nor in the
systems configuration alone. It requires a combination of complex
routing techniques within the network infrastructure, along with
a corresponding configuration in the UNIX system. As in that overused
business mantra, you have to think outside the box (the UNIX box,
that is).
Conquering this problem involves taking the overlapping IP addresses
out of the general mix. The solution is possible due mainly to the
multi-tiered architecture of the NMS software being deployed. Most
modern NMS software allows for a multi-tiered installation with
multiple SNMP collection systems forwarding information to a central
correlation and display engine. An example of this type of installation
is shown in Figure 2.
In such a deployment, the Tier-1 NMS systems perform active SNMP
polling of managed devices and systems within a given area of the
network. This includes periodic polling of set parameters, such
as interface status and performance metrics, as well as the receipt
of asynchronous SNMP traps sent by the devices whenever a fault
or preset event occurs. The Tier-2 NMS system consolidates information
from all of the Tier-1 systems for event correlation and eventual
display in a single management console.
This design is intended to allow for a more scalable implementation
of NMS software by distributing polling and correlation duties across
multiple systems, as well as reducing SNMP traffic on WAN links.
However, when combined with an appropriate network configuration,
this design can be modified to create a workaround for the overlapping
IP address issue.
In the overlapping IP addresses, the multi-tiered NMS architecture
described above could allow a local polling station to perform active
SNMP polling of a particular client network while passing events
back to the central reporting and display station. Events passed
back to the central station identify managed devices by fully qualified
domain names, thus eliminating IP addresses from the relationship.
This means that the customer networks can be monitored with SNMP
using local IP addresses and any event notifications forwarded to
the central monitoring station are communicating hostname information
rather than IP addresses.
In short, each network can have its own NMS system for local monitoring,
while a centralized system consolidates and displays the information
to a centralized support group within the managed services provider.
The only drawback of this solution is that every single customer
network to be monitored requires its own dedicated SNMP polling
station as depicted in Figure 3.
Although the solution is workable, it is far from scalable and
the cost of hundreds of customer monitoring stations (to fulfill
the requirements of this particular case) quickly outweighs any
potential profit in the business case for such a setup. Thus, it
is necessary to reduce the number of polling stations in the design
to attain a viable solution.
SNMP Monitoring
The specific SNMP monitoring product selected for this implementation
provided a portion of the solution through its basic design. This
product, known as SMARTS In-Charge, allows the running of multiple
instances of its collection engine on a single UNIX system. Each
instance requires its own virtual IP interface from which to send
and receive SNMP requests. Each virtual interface is assigned a
unique IP address, which forms the basis for routing packets despite
overlapping IP addresses. Most NMS software, including several freeware
packages, have followed this model to allow for multiple instances
of software on a single system, so the solution is reasonably software
independent, assuming this requirement is met.
Fulfilling the requirement for a virtual interface for each instance
of the NMS software is nothing new to most systems administrators.
Admins have been using virtual interfaces, or multiple IP addresses
per interface, for many years. It is a common configuration for
Web servers and shared LAN scenarios. However, simply using additional
IP addresses per physical interface would not fulfill the requirements
of this design for security. Since each customer requires complete
separation of network traffic, it is necessary to keep all traffic
segregated by customer until it reaches the NMS software on the
system. Within the UNIX system itself, customer data can be secured
by running each instance of NMS software as its own UID and by using
standard UNIX security to prevent information leaks. However, on
the network, the data cannot share a common LAN or security could
be compromised.
VLANs
Implementing a combination of policy-based routing and Virtual
LANs (VLAN) within the network architecture (and extending this
into the polling systems by means of 802.1q-compliant network interface
cards) handles the requirement of an individual interface for each
instance of SMARTS In-Charge, while keeping customer networks completely
segregated for security reasons.
VLAN technology allows multiple Ethernet networks to be carried
across a single cable with true frame-level separation. It requires
a network interface capable of supporting the 802.1q standard as
well as connection to an 802.1q-compliant switch. Configuration
of this component of the UNIX system is described later in the article.
The result is similar to that pictured in Figure 4.
In this scenario, policy-based routing is implemented in the access
router, which allows the network architect to pre-define a set of
routing policies based on source interface or IP addresses. IP Datagrams
sent by the monitoring systems are inspected at the access router
for their source interface. Policies are defined within the router,
which correlate a source interface (from a given instance of the
NMS software) to an interface on the router that leads to a particular
customer network. Thus, two packets with identical destination IP
addresses can be routed to the proper outgoing interface based on
pre-defined parameters. Figure 5 shows the policy defined within
the access router.
The problem of outgoing packets sent to duplicate IP addresses
is now solved. Response packets, or packets sent by the monitored
device back to the Tier-1 NMS system are not a problem because they
will have a destination IP address for their particular instance
of the NMS software. Thus, the access routers can simply forward
these packets back to the proper virtual interface based on standard
IP routing and existing VLANs within the network architecture. The
same principle applies to SNMP trap packets that are occasionally
sent by the devices.
The resulting solution requires a combination of systems and network
configurations to circumvent the overlapping IP address issue. Despite
the apparent complexity of the design, the implementation is straightforward,
and requires very little specialized network knowledge. However,
given the correlation between virtual interface, router interfaces,
and customer networks, a strict change management process is essential
to avoid configuration and security issues once the architecture
is implemented.
The design is capable of supporting hundreds of overlapping IP
addresses within a single NMS system. The only limiting component
of the architecture is the maximum of 64 VLANs per interface on
a Sun system. Based on loading estimates and managed node counts,
it was determined that a maximum of 20 client networks could be
managed by a single Sun E420R system. This particular case required
an initial installation of 14 Tier-1 NMS systems to handle the customer
load. Two Tier-2 NMS systems (Sun Enterprise 4500) were installed
in a high-availability configuration to handle event correlation.
Additionally, 4 display servers (Sun Netra T1) were installed for
accessing the network information from the operations centers via
a Web interface. Figure 6 shows the final systems design and layout.
The design called for a centralized storage system using a Fibre
Channel-based Storage Area Network (SAN), so no NFS configuration
was necessary. For centralized administration, the systems were
configured in an NIS+ domain with NIS+ access specifically denied
from the customer networks. This last configuration note, along
with packet filtering at the access routers (deny all except SNMP
and SNMP-Trap) on each customer-facing interface could have removed
the need for expensive firewalls in the design. However, this design
ultimately did include firewalls between the access routers and
NMS systems for additional inspection and logging of network traffic.
Configuring the Systems
The first requirement of the UNIX systems in this configuration
is to ensure that the systems being used include network interfaces
capable of supporting VLANs and the 802.1q standards. When using
802.1q VLANs, every frame transmitted on an interface has a four-byte
"tag" added to the header to identify the VLAN to which
the frame corresponds. Support for the tagging of frames must be
contained within the driver for the network interface as these additional
four bytes can push an Ethernet frame beyond the standard maximum
frame length, thereby causing the interface to report an error on
VLAN-tagged frames.
In this scenario, the E420R systems were equipped with standard
Sun Fast Ethernet adapters, which do not support 802.1q tagging.
Because our solution required VLANs only on the customer network
facing side of the Tier-1 NMS systems, the standard Fast Ethernet
interfaces were used for connection to the internal Tier-2 NMS LAN.
The Tier-1 systems each had a Sun Gigabit Ethernet adapter with
802.1q support installed for connection to the customer access network.
Currently, most Sun Gigabit Ethernet adapters are 802.1q capable.
Configuration of the VLANs on the Solaris systems is surprisingly
straightforward. It involves creating a file named /etc/hostname.vge<n>
for each VLAN interface. This is a flat text file that contains
a hostname for that VLAN interface. An IP address for this hostname
must be present in the /etc/hosts file. The 802.1q standard supports
up to 4092 VLANs per interface. However, Sun adapters are currently
limited to 64 VLANs per interface. The number of VLANs on a system
can be increased beyond 64 by using multiple Gigabit Ethernet adapters.
In our scenario, the limiting factor is the number of NMS instances
that the system can support, which is far less than the 64-VLAN
limit for a single interface. Numbering of the /etc/hostname.vge<n>
files is shown in Figure 7.
VLAN tag numbers must then be assigned to each of the VLAN interfaces
configured. The 64 VLANs that can be configured on a Sun adapter
can correspond to any of the 4092 VLANs that are possible in 802.1q.
Therefore, it is necessary to specifically assign the VLAN tag numbers
to each VLAN interface configured. This is done by creating files
named /etc/vlan.vge<n> for each VLAN interface. The numbering
scheme for these files is identical to the hostname.vge files. Each
/etc/vlan.vge<n> file must contain the VLAN tag number in
decimal, octal, or hexadecimal format with no additional characters,
lines, or spaces.
Once the /etc/hostname.vge<n> and /etc/vlan.vge<n>
files are created, a simple reboot of the system will activate the
VLAN configuration. It is important to note that on most hardware,
configuring even one VLAN turns on VLAN tagging for that entire
physical interface. At that point, the system must be connected
to an 802.1q-compliant device with tagging enabled in order to communicate
on the network. Connecting a system using VLAN tagging to a standard
network hub or switch port will result in a complete inability to
communicate with the network because all packets leaving the physical
interface will have a four-byte VLAN header that can only be interpreted
by an 802.1q-compliant device.
Network configuration of the NMS software requires the hostname
and IP address of a valid VLAN interface to be entered into the
startup configuration files for each instance of the NMS software.
Every NMS package has different configuration specifications for
this information, depending on the software installation instructions.
Configuration simply requires that the IP address, VLAN tag and
NMS instance correspond correctly to the customer network to be
monitored, and that this information correspond to the policy-based
routing configuration in the access routers.
Configuring the Network Devices
There are two components of the network devices that must be configured
to match this system configuration: the Ethernet switches and the
customer access routers. The switches used in this scenario were
Cisco Catalyst 5000 series switches running CatOS 6.1(1) software,
with blades for Gigabit and Fast Ethernet interfaces installed.
Configuration of these switches involved simply establishing the
VLANs within the switch and enabling trunking on the port to which
the Tier-1 NMS servers attached. Listing 1 shows a sample configuration
showing the VLAN configuration and trunking enabled on the Tier-1
NMS ports. Note that this is not a complete configuration file for
the switch as it shows only the portions related to VLAN configuration.
This configuration excerpt shows configuration of the VLANs that
correspond to VLANs established on the Solaris systems. The "vtp
mode transparent" command refers to VLAN trunking between switches
and simply indicates that no database-defined VLANs are in use within
the architecture. Each of the "set vlan" commands establishes
a VLAN and its corresponding tag number and VLAN name. Note that,
in this example, Cisco switches utilize VLAN tag number 1 for a
default VLAN within the switch. It is therefore not possible to
assign VLAN 1 for the purposes described in this case study. A common
workaround to this situation is to start the VLAN numbering at VLAN
100 so that VLAN tag numbers can correspond more directly with customer
numbers.
The "set trunk" command enables VLAN trunking on a particular
switch port; in this case, Gigabit Ethernet ports 1 and 2 in module
3 are enabled to use 802.1q trunking for connection to Tier-1 NMS
systems or the access routers. Cisco also allows the use of its
proprietary ISL (Inter-Switch link) trunking for VLAN trunk ports,
but this is not compliant with 802.1q and will not communicate properly
with the Sun adapters.
Configuration of the access routers is significantly more complex
because it contains the policy-based routing that circumvents the
duplicate IP address issue. Listing 2 shows a sample configuration
for a customer access router.
Although this configuration file is quite complex, it is still
only a scaled-down version of the full configuration presented to
demonstrate the policy-based routing technique used for this case
study. Complete configuration requires significant experience configuring
these devices. Two different components comprise the routing used
here.
Each of the incoming VLAN interfaces references a policy route-map
against which all packets arriving on that interface are applied.
Each customer will have one VLAN interface (denoted by the GigabitEthernet0/0.<n>
definitions) and one corresponding route-map reference. This ensures
that all packets arriving on a VLAN interface circumvent the IP
routing engine and are forwarded according to the defined policy.
The route maps referenced simply point all packets to the correct
outgoing interface for the customer.
Packets arriving on the customer network interfaces (serial interfaces)
can be routed using standard IP routing tables because the Tier-1
NMS systems each have unique addresses. The IP routing table maintained
by the router will point these packets directly to the proper VLAN
interface without policy intervention.
Conclusion
The example presented here is specific to the network management
scenario discussed, but the solution can be applied to numerous
other situations that a systems administrator may face. One of the
most common applications for this type of network configuration
is for vendor or third-party access to internal corporate systems.
This typically involves a huge outlay for firewall equipment, NAT
engines, and dedicated vendor servers to access corporate application
systems such as inventory databases or customer information databases.
In such a scenario, the systems administrator faces the same problem
seen here: access to a single system from multiple networks that
may include overlapping IP address ranges. NAT is a useful tool,
but not always a perfect solution to this scenario. Through careful
application of a network configuration in cooperation with the systems
configuration, these problems can be overcome in a scalable and
affordable manner. When faced with a complex network application
issue such as this, extend your administration and architecture
knowledge beyond the UNIX box and into the network to find your
solution.
Scott Kirkwood specializes in designing and implementing network
and systems operations centers for enterprise and service provider
companies. His previous positions have included: Network Architect,
Unix Engineer, IT Business Analyst and Jazz Musician. All of these
skills (except the Jazz Musician) are combined in his current role
as a business consultant to the IT industry. He currently works
for International Network Services, and can be reached at: scott.kirkwood@ins.com.
|