Troubleshooting
Network Performance with Alpine
Jeffrey Papen
As a Network Engineer, I am often annoyed by slow Internet performance
caused by network issues like congestion, fiber cuts, and packet
loss. I would love to wave a magic wand at my DSL modem to re-route
traffic down the fastest available path, but my only recourse is
opening trouble tickets with my ISP. Waiting for my ISP to fix the
problem often leaves me frustrated with their inability to avoid
problem networks and identify faster paths.
Single-homed networks (those having only one ISP, like my home
LAN) must wait for their ISP to resolve network issues. Multi-homed
networks (those having connections to multiple ISPs, like Yahoo!
and Peak Web Hosting) offer significantly reduced problem resolution
times by routing traffic around poorly performing networks.
Previously, when users discovered problems getting to external
networks, finding unique alternate paths was a complex process.
Learning all available paths required logging into multiple routers
and repeatedly executing the same diagnostic commands. Depending
on the situation, rerouting traffic to test a candidate path could
be as simple as adding static routes or could involve all kinds
of network voodoo. Also, this highly invasive testing method carried
the risk of degrading all network performance. I have often spent
more than an hour finding a faster path to an important external
network. Because rerouting traffic and testing alternate paths consumes
extensive network engineering time and resources, giving users the
ability to test available networking options without changing router
configurations or affecting live network traffic is clearly advantageous
for any multi-homed environment.
In this article, I will describe the freeware application Alpine,
which makes this troubleshooting process faster, easier, and less
invasive. Alpine does not affect existing network traffic and requires
no special user privileges, router access, or router configuration
changes. Troubleshooting tasks that previously required an hour
of manual configuration can now be done in less than 5 minutes.
Alpine can be used by network engineers, sys admins, and users to
research the performance for each available option and find the
fastest path.
How Alpine Works
Alpine's Web interface collects data, processes routes, and
reports network path performance statistics. (See sidebar "Alpine
Step by Step".)
Zebra is a software-based routing daemon, similar to gated, that
runs on UNIX servers. Zebra acts identically to other BGP routers,
allowing it to learn each router's full routing table. Alpine
uses the zebra daemon to learn Border Gateway Protocol (BGP) routes
from all externally connected routers. (See Figure 1.) Although
routers connected to different ISPs may learn multiple paths to
the same destination, they only re-advertise their best path to
BGP neighbors. While it would be convenient for routers to know
every possible path for each Internet prefix, keeping the BGP routing
table size manageable requires BGP to advertise only its best path
for each routing table entry.
Alpine will search zebra's BGP table for the prefix that
most specifically matches the target IP address. (See Figure 2.)
Because zebra's BGP table contains only the best route from
each router, Alpine employs SNMP to learn every router's complete
list of paths to the target prefix. SNMP read-only permission is
the only router access Alpine requires. This routing data could
also be obtained by accessing a router's command-line interface
via telnet or SSH, however this method represents a larger security
risk and the additional complexity of parsing multiple vendors'
command-line interface output.
After SNMP polling, Alpine displays each routing option's
list of ISPs used to reach the target destination. Included are
links to test each path's performance. Clicking on these links
sends ping, traceroute, or mtr (Matt's Traceroute) diagnostic
traffic out the specified ISP. (See Figures 3-5.)
Filter-Based Forwarding
When routers receive a packet, they examine its destination IP,
look up in their routing table the correct outgoing interface based
on that destination IP, and forward the packet out that interface.
Filter-based forwarding allows routers to alter this default process.
With filter-based forwarding, a router uses a firewall, or Access
Control List (ACL), to match a packet's source IP, destination
IP, protocol, port, or other fields. When the firewall matches a
packet, the router overrides the default routing table's choice,
forwarding the packet out a user-defined interface.
Filter-based forwarding directs only selected packets to non-default
paths, enabling testing of each candidate path's performance
without affecting existing traffic. Without filter-based forwarding,
traffic always follows the same path and network-wide rerouting
remains the only method to test alternate paths. Rerouting all traffic
is potentially dangerous and may utilize worse performing paths
or induce other network problems.
Every major router vendor supports some filter-based forwarding.
However, implementation differences can affect Alpine installations.
While Juniper Networks implements hardware-based policy routing,
imposing no performance impact, Cisco's software implementation
requires the router's CPU to make each routing decision, potentially
slowing overall system performance. (See Listings 1 and 2.) Alpine
servers should connect to Cisco routers on a dedicated interface
to avoid impacting other traffic. (See Figure 6.)
Alpine Configuration
The Alpine server is configured with one IP address for every
BGP neighbor. The Web interface explicitly sets the correct source
IP for the user-defined test application: ping, traceroute, or mtr.
The router's filter-based forwarding matches on the source
IP and reroutes traffic out the requested ISP rather than to the
default BGP path. Because filter-based forwarding only matches on
the packet's source IP, Alpine performance testing easily can
be expanded to include any network application, such as HTTP requests
or streaming media applications.
All server traffic destined to external networks uses a single
static default gateway. If the Alpine server's default gateway
is the only router connected to external ISPs, the Alpine server
requires no additional configuration. If there are multiple egress
routers, the Alpine server needs the additional configuration of
IPFW or IP chains. Rather than following the default gateway, IPFW
explicitly sets the correct egress router.
Understanding BGP and Alpine's Output
Every BGP network belongs to an Autonomous System (AS) number.
As networks throughout the Internet learn and exchange routing information,
each BGP network prepends its AS number to create the prefix's
AS path. The AS path is essentially a map showing which path traffic
to this prefix will follow. Because the AS path appears as a cryptic
string of numbers, Alpine displays (next to each AS number) either
a user-defined description or a link to look up the AS number. For
example, the AS path to my Alpine server might look like [701 2914
22208]. Traffic to the Alpine server would pass through UUNet to
Verio and then to Peak Web Hosting.
Unfortunately, BGP has no knowledge of round-trip latency or packet
loss. When determining the "best" path to a prefix, BGP
often relies on the misleading metric of AS path length. Because
AS path length is the most common method for BGP route selection,
many BGP networks influence other's routing decisions by prepending,
or AS padding, their AS number multiple times. AS padding may appear
as a mistake or routing loop, but it is commonly used to make certain
paths appear less preferable.
When reading Alpine's output, the shortest AS path is not
always the best performing path. To cross a single AS may require
between 1 and 20 individual router hops. Only running traceroute
or mtr exposes the hidden details behind each AS hop and the relative
performance of each candidate path. Selecting an AS path containing
well-known ISPs may not always produce the fastest results either.
These large networks face the same challenges and suffer the same
outages as smaller or lesser-known ISPs.
While Alpine offers a great tool for discovering ISPs' interconnectivity
and network route performance, no single Internet view produces
complete and accurate results. Alpine troubleshoots only the path
taken from the local network to the destination. Asymmetric routing
often directs return traffic via a completely different ISP. Accurate
network troubleshooting would require multiple geographically dispersed
Alpine servers testing traffic to and from the source and destination
networks.
Alpine cannot reroute traffic or change router configurations
after discovering a better path. Filter-based forwarding reroutes
only Alpine's diagnostic traffic. Administrators must manually
configure routers to update the network path selection for all traffic.
Because shifting traffic may saturate a previously well-performing
network, always re-test the new path's performance after updating
router configurations.
Alpine versus Looking Glass
Looking glass sites are Web sites that allow anyone to test performance
and connectivity from outside their network. The typical looking
glass server collects BGP route information from a single router
that may not know every available path to the target network. A
looking glass's traceroute and ping follow only the preferred
BGP route to the target even when other routing options exist.
Conclusion
Alpine provides any user the ability to proactively discover and
diagnose network problems. Alpine's simple GUI, flexible testing
methods, and extensive diagnostic data empowers users to research
and present solutions, not problems, to their network engineers
and systems administrators.
References
http://alpine.peakwebhosting.com -- Alpine server and
source code repository
http://www.zebra.org -- BGP daemon for UNIX servers
http://www.freebsd.org -- IPFW
http://www.juniper.net
http://www.cisco.com http://www.bitwizard.nl/mtr/
-- Matt's Traceroute, arguably the best traceroute implementation
to date.
http://www.traceroute.org -- Excellent resource for
traceroute tools and looking glass servers across the Internet.
Jeffrey Papen is a UCLA Computer Science Engineering graduate
and configures Yahoo!'s BGP peering and policy routing. Jeffrey
also authored the NetFlow analysis tool TUNDRA and the bandwidth
utilization and billing reconciliation package HappyDog. When not
evangelizing Juniper Networks or multi-homing Peak Web Hosting,
Jeffrey's three Siberian Huskies Alpine, Tundra, and Glacier
take him for walks along the California coast. He can be reached
at: jeffrey@papen.com.
|