Cover V11, I12

Article
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Listing 1
Listing 2
Sidebar

dec2002.tar

Troubleshooting Network Performance with Alpine

Jeffrey Papen

As a Network Engineer, I am often annoyed by slow Internet performance caused by network issues like congestion, fiber cuts, and packet loss. I would love to wave a magic wand at my DSL modem to re-route traffic down the fastest available path, but my only recourse is opening trouble tickets with my ISP. Waiting for my ISP to fix the problem often leaves me frustrated with their inability to avoid problem networks and identify faster paths.

Single-homed networks (those having only one ISP, like my home LAN) must wait for their ISP to resolve network issues. Multi-homed networks (those having connections to multiple ISPs, like Yahoo! and Peak Web Hosting) offer significantly reduced problem resolution times by routing traffic around poorly performing networks.

Previously, when users discovered problems getting to external networks, finding unique alternate paths was a complex process. Learning all available paths required logging into multiple routers and repeatedly executing the same diagnostic commands. Depending on the situation, rerouting traffic to test a candidate path could be as simple as adding static routes or could involve all kinds of network voodoo. Also, this highly invasive testing method carried the risk of degrading all network performance. I have often spent more than an hour finding a faster path to an important external network. Because rerouting traffic and testing alternate paths consumes extensive network engineering time and resources, giving users the ability to test available networking options without changing router configurations or affecting live network traffic is clearly advantageous for any multi-homed environment.

In this article, I will describe the freeware application Alpine, which makes this troubleshooting process faster, easier, and less invasive. Alpine does not affect existing network traffic and requires no special user privileges, router access, or router configuration changes. Troubleshooting tasks that previously required an hour of manual configuration can now be done in less than 5 minutes. Alpine can be used by network engineers, sys admins, and users to research the performance for each available option and find the fastest path.

How Alpine Works

Alpine's Web interface collects data, processes routes, and reports network path performance statistics. (See sidebar "Alpine Step by Step".)

Zebra is a software-based routing daemon, similar to gated, that runs on UNIX servers. Zebra acts identically to other BGP routers, allowing it to learn each router's full routing table. Alpine uses the zebra daemon to learn Border Gateway Protocol (BGP) routes from all externally connected routers. (See Figure 1.) Although routers connected to different ISPs may learn multiple paths to the same destination, they only re-advertise their best path to BGP neighbors. While it would be convenient for routers to know every possible path for each Internet prefix, keeping the BGP routing table size manageable requires BGP to advertise only its best path for each routing table entry.

Alpine will search zebra's BGP table for the prefix that most specifically matches the target IP address. (See Figure 2.) Because zebra's BGP table contains only the best route from each router, Alpine employs SNMP to learn every router's complete list of paths to the target prefix. SNMP read-only permission is the only router access Alpine requires. This routing data could also be obtained by accessing a router's command-line interface via telnet or SSH, however this method represents a larger security risk and the additional complexity of parsing multiple vendors' command-line interface output.

After SNMP polling, Alpine displays each routing option's list of ISPs used to reach the target destination. Included are links to test each path's performance. Clicking on these links sends ping, traceroute, or mtr (Matt's Traceroute) diagnostic traffic out the specified ISP. (See Figures 3-5.)

Filter-Based Forwarding

When routers receive a packet, they examine its destination IP, look up in their routing table the correct outgoing interface based on that destination IP, and forward the packet out that interface. Filter-based forwarding allows routers to alter this default process. With filter-based forwarding, a router uses a firewall, or Access Control List (ACL), to match a packet's source IP, destination IP, protocol, port, or other fields. When the firewall matches a packet, the router overrides the default routing table's choice, forwarding the packet out a user-defined interface.

Filter-based forwarding directs only selected packets to non-default paths, enabling testing of each candidate path's performance without affecting existing traffic. Without filter-based forwarding, traffic always follows the same path and network-wide rerouting remains the only method to test alternate paths. Rerouting all traffic is potentially dangerous and may utilize worse performing paths or induce other network problems.

Every major router vendor supports some filter-based forwarding. However, implementation differences can affect Alpine installations. While Juniper Networks implements hardware-based policy routing, imposing no performance impact, Cisco's software implementation requires the router's CPU to make each routing decision, potentially slowing overall system performance. (See Listings 1 and 2.) Alpine servers should connect to Cisco routers on a dedicated interface to avoid impacting other traffic. (See Figure 6.)

Alpine Configuration

The Alpine server is configured with one IP address for every BGP neighbor. The Web interface explicitly sets the correct source IP for the user-defined test application: ping, traceroute, or mtr. The router's filter-based forwarding matches on the source IP and reroutes traffic out the requested ISP rather than to the default BGP path. Because filter-based forwarding only matches on the packet's source IP, Alpine performance testing easily can be expanded to include any network application, such as HTTP requests or streaming media applications.

All server traffic destined to external networks uses a single static default gateway. If the Alpine server's default gateway is the only router connected to external ISPs, the Alpine server requires no additional configuration. If there are multiple egress routers, the Alpine server needs the additional configuration of IPFW or IP chains. Rather than following the default gateway, IPFW explicitly sets the correct egress router.

Understanding BGP and Alpine's Output

Every BGP network belongs to an Autonomous System (AS) number. As networks throughout the Internet learn and exchange routing information, each BGP network prepends its AS number to create the prefix's AS path. The AS path is essentially a map showing which path traffic to this prefix will follow. Because the AS path appears as a cryptic string of numbers, Alpine displays (next to each AS number) either a user-defined description or a link to look up the AS number. For example, the AS path to my Alpine server might look like [701 2914 22208]. Traffic to the Alpine server would pass through UUNet to Verio and then to Peak Web Hosting.

Unfortunately, BGP has no knowledge of round-trip latency or packet loss. When determining the "best" path to a prefix, BGP often relies on the misleading metric of AS path length. Because AS path length is the most common method for BGP route selection, many BGP networks influence other's routing decisions by prepending, or AS padding, their AS number multiple times. AS padding may appear as a mistake or routing loop, but it is commonly used to make certain paths appear less preferable.

When reading Alpine's output, the shortest AS path is not always the best performing path. To cross a single AS may require between 1 and 20 individual router hops. Only running traceroute or mtr exposes the hidden details behind each AS hop and the relative performance of each candidate path. Selecting an AS path containing well-known ISPs may not always produce the fastest results either. These large networks face the same challenges and suffer the same outages as smaller or lesser-known ISPs.

While Alpine offers a great tool for discovering ISPs' interconnectivity and network route performance, no single Internet view produces complete and accurate results. Alpine troubleshoots only the path taken from the local network to the destination. Asymmetric routing often directs return traffic via a completely different ISP. Accurate network troubleshooting would require multiple geographically dispersed Alpine servers testing traffic to and from the source and destination networks.

Alpine cannot reroute traffic or change router configurations after discovering a better path. Filter-based forwarding reroutes only Alpine's diagnostic traffic. Administrators must manually configure routers to update the network path selection for all traffic. Because shifting traffic may saturate a previously well-performing network, always re-test the new path's performance after updating router configurations.

Alpine versus Looking Glass

Looking glass sites are Web sites that allow anyone to test performance and connectivity from outside their network. The typical looking glass server collects BGP route information from a single router that may not know every available path to the target network. A looking glass's traceroute and ping follow only the preferred BGP route to the target even when other routing options exist.

Conclusion

Alpine provides any user the ability to proactively discover and diagnose network problems. Alpine's simple GUI, flexible testing methods, and extensive diagnostic data empowers users to research and present solutions, not problems, to their network engineers and systems administrators.

References

http://alpine.peakwebhosting.com -- Alpine server and source code repository

http://www.zebra.org -- BGP daemon for UNIX servers

http://www.freebsd.org -- IPFW

http://www.juniper.net

http://www.cisco.com http://www.bitwizard.nl/mtr/ -- Matt's Traceroute, arguably the best traceroute implementation to date.

http://www.traceroute.org -- Excellent resource for traceroute tools and looking glass servers across the Internet.

Jeffrey Papen is a UCLA Computer Science Engineering graduate and configures Yahoo!'s BGP peering and policy routing. Jeffrey also authored the NetFlow analysis tool TUNDRA and the bandwidth utilization and billing reconciliation package HappyDog. When not evangelizing Juniper Networks or multi-homing Peak Web Hosting, Jeffrey's three Siberian Huskies Alpine, Tundra, and Glacier take him for walks along the California coast. He can be reached at: jeffrey@papen.com.