Cover V10, I01
Article
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

jan2001.tar


Disruptions of Service: Types and Effects

Peter H. Salus

Because of its very nature, it is difficult to destroy a packet-switching network, much less the network of networks that is the Internet. However, a number of events have disrupted Internet service -- some large, some limited in extent. This article is an attempt at a typology and a spur to discussion.

We all know that in a packet-switching network, there are multiple routes between any two points. As a consequence, such networks are more robust when under attack than direct-line connections or circuit-switching networks (like the telephone system).

But disruptive events have visible effects on the Internet. Over the years, many of these events have been featured in the press. But the press has never looked at the disruptions as "natural phenomena." Nor have important distinctions been drawn.

I talk about disruptive events, because these things are quite different from virus attacks or denial of service attacks. On the one hand, virus attacks afflict individual PCs (sometimes hundreds of thousands of them), but have little visible effect on the Internet. On the other hand, denial-of-service attacks may put an individual site out of commission, but only influence the Net because such attacks add to the flow of bits in the Net as a whole.

Since 1990, Matrix.Net (aka Matrix Information and Directory Services) has been surveying traffic and mapping the Internet and other networks. We thus have a solid corpus of data to look at. Here are some instances.

The 1994 Northridge Earthquake

On 17 January 1994, at 4:31 AM PST (12:31 PM GMT), in the Northridge suburb 20 miles northwest of Los Angeles, a magnitude 6.7 earthquake struck. In 15 seconds it led to the deaths of 15 people and injuries to more than 9,000. This was a major disaster. What effect did it have on the Internet?

The Internet Weather Report (IWR, http://www.mids.org/weather/) was already running back then. It ran scans at 2AM and 6AM PST, thus taking before and after snapshots of the Internet. At that time, we noted:

Comparing the two series of scans, effects of the earthquake can be seen as far north as Lawrence Livermore Laboratories northeast of San Francisco (which shows higher latencies immediately after the earthquake) and as far south as San Diego (which shows more traffic several hours later). Ensenada in Baja California, Mexico, disappears just after the quake, but that is probably coincidence, since the host we are pinging in Ensenada sometimes doesn't respond.

Here are some Matrix.Net graphs of the IWR data from the Northridge Earthquake event. You can see the initial earthquake event in both packet loss and on the Internet at large. The effect is shortlived, however, and by the next scan, most of it is gone. Apparently the earthquake crashed quite a few computers briefly, but many of them were soon back up (see Figures 1-3).

This quake did tens to hundreds of billions of dollars of property damage, taking the San Fernando Valley off the Internet for more than a day. Most of the affected areas were back on line completely after a week. The Internet as a whole was not affected beyond the immediate geographical vicinity: there was no overall effect on packet loss; there was no visible difference in latency or packet loss of the Internet overall.

The main reason that much of the local Internet was down was simple: computers don't run without electricity. The only long-term Internet infrastructure damage was one router that fell over. All the others came back up when power was restored.

But let's move up a few years.

Hurricane Floyd

Hurricane Floyd threatened the Caribbean islands and the eastern coast of the United States for the first two weeks of September 1999. It achieved Category 4 status before it made landfall. It missed Hispaniola and Cuba completely, passed over the Bahamas, missed Florida, and Georgia, grazed South Carolina, and went ashore in North Carolina. By then it had tapered off to tropical storm classification, still generating high winds and much rain, but instigating far less damage than anticipated.

Overall Internet Effects

The expected overall effects on the Internet were negligible. These graphs represent average Internet performance. The darker line represents the Internet as a whole. The lighter one represents the WWW, a subset of the Internet.

Looking at the Internet Average is informative. The Matrix.Net Internet Average is a high-level summary of performance data measured from hosts all around the world. It provides one baseline against which more specialized Internet performance data might be compared, serving a similar role as the Dow Jones Industrial Average does in the financial world.

However, negligible isn't the same as indetectible. There was, in fact, a surge in both latency and packet loss on Monday 13 September 1999, starting around 8AM EDT and continuing the entire working day (see Figure 4).

The Internet "pinglist" shows a pointed latency spike about noon EDT. The peak of that spike is at 186.4 ms, which is about 16% higher than the usual 160 ms for a Monday afternoon. The WWW pinglist shows a smaller increase, to about 169 ms over the usual 158 ms of a Monday afternoon, or about 7%. There's a smaller spike on both pinglists about 5PM Monday.

What we don't see is much effect on Wednesday or Thursday, when Floyd actually made landfall. If anything, there may be a decrease in latency on Wednesday.

Looking more closely at the underlying data, and looking only at nodes in Florida as an example, we can see that there was a drop in reachability at noon EDT Monday, and another at 5PM. Reachability rachets back up by the same amounts at 7AM and 9AM Wednesday. That looks as though a number of people turned off their machines as they were being evacuated, and turned them back on when they returned.

ISP Effects

What might have been expected would be brief spikes in ISP performance either when Floyd passed over an area or later when batteries were drained. Most ISPs these days have pretty good backup power, so this would seem the most likely scenario. For big ISPs, the battery would only be on for about five minutes, and then emergency generators would take over. Then it's just a question of availability of fuel for the generators. Smaller ISPs might have bigger problems, as they have fewer resources for USPs and generators.

Or, if damage were sufficiently severe and direct to actually take routers off the Net, we might see an outage of a day or more, with a sharp beginning and a gradual end as ISPs dug out and replaced broken equipment. This would be similar to the effects of the Northridge Earthquake, except probably more severe because the earthquake effect was caused more by power outages than by direct damage.

What we see instead is a lot of small latency and packet loss events on Monday, before Floyd hit anything, and negligible effects thereafter. The only big ISP that seems to have significant later effects was Global-One.

Friday 17 September 1999

This hurricane's bark was worse than its bite, where the Internet was concerned. Evacuation and fear of the hurricane had much more effect on the Internet on Monday than the hurricane itself. However, there was finally a brief but noticeable event on Friday (see Figures 5 and 6).

ISPs these days are quite hardened to problems with electricity and telephone service, and even flooding. Certain kinds of problems still affect the Internet, however, such as hysteria, seen here; cable cuts, which are likely the cause of the Friday event; and configuration problems inside the ISPs themselves, which were not observed in this particular week.

Earthquakes and hurricanes are genuinely natural events; but there are other types of physical disruption.

Fiber Cut of 29 September 1999

About noon EDT on 29 September 1999 there was a massive fiber cut in Ohio, which took more than four hours to fix. This infrastructure outage was noticeable across the entire Internet, as illustrated by the Matrix.Net Internet Average.

Not much damage is visible in latency, although interestingly enough most of what is visible is in the curve for Top Level Domain (TLD) Domain Name System (DNS) servers.

Packet loss clearly shows the event in all three curves (Figure 7). Latency shows the most dramatic effects (Figure 8).

We also examined the top 30 ISPs one by one, and found that only a few of them (AboveNet, GTE Internet, and PSINet) were noticeably affected. AboveNet got it worst, and took a day to completely recover. All the other ISPs were essentially unaffected.

Outage of October 7, 1999

About 3AM EDT (8AM GMT) on October 7, 1999 there was a massive Internet outage, bigger than the one caused by the fiber cut of September 29, 1999. This infrastructure outage was noticeable for the entire Internet. Packet loss shows the outage clearly. And reachability shows it even more clearly (Figures 9-11).

Denial of Service attacks

On Monday and Tuesday, February 7 and 8, 2000, a large number of major sites across the US were assaulted by "Denial of Service" (DoS) attacks. These attacks were the result of millions of messages flooding a particular host or gateway, overwhelming the resources and backing up traffic in a domino fashion (Figure 12).

In the days since, Attorney General Janet Reno has said that the FBI would track and punish the miscreants. The President has called another meeting. All the "usual suspects" were rounded up.

There have been security warnings, as well. In general, the newspapers and the TV reporters appear clueless on the differences between cracking a site and blocking a site.

"Since smurf attacks originate from a variety of sources at unpredictable times, an Internet-wide system analysis is necessary,'' said John Quarterman, CEO of Matrix.Net, in Austin, TX. He continued, "Just look at the 'Internet Average' for 8 February. It's completely clear that the entire Internet had higher packet loss and far lower reachability for several hours. It's like a shark took a bite out of the Net."

There have been other interesting events: the Victoria's Secret lingerie show slowed down AOL, but had no real effect on the Internet; the release of the Starr Report caused several spikes in latency, at the release time and at about 5PM EDT and thereafter. Presumably, folks wanted to see the report when it went up, and then read parts of it when working hours were over.

Even more recently, on September 25, 2000, the ISP Applied Theory showed the a wild fluctuation in reachability. This was caused by a "router flap" at Sprint. Rather than the absolute drop shown in the cases of DoS attacks, or the massive outages shown by fiber cuts, the up-down seesaw here reflects the router trying to come up and succumbing again. When it was swapped out, everything returned to normal.

The important thing is to recognize just how limited the effects of all of these events have been, as well as how different from each other natural disasters, fiber cuts, blown routers, and denial of service attacks are. We are currently datamining our records to ascertain exactly what the early warning signs are.

Knowledge is power: it may be that with early recognition, the ISPs will be able to route around disruptions and thus ameliorate the consequences.

Peter H. Salus, the author of A Quarter Century of UNIX and Casting the Net, is Chief Knowledge Officer at Matrix.Net. He can be reached at: peter@matrix.net.