Disruptions
of Service: Types and Effects
Peter H. Salus
Because of its very nature, it is difficult to destroy a packet-switching
network, much less the network of networks that is the Internet.
However, a number of events have disrupted Internet service --
some large, some limited in extent. This article is an attempt at
a typology and a spur to discussion.
We all know that in a packet-switching network, there are multiple
routes between any two points. As a consequence, such networks are
more robust when under attack than direct-line connections or circuit-switching
networks (like the telephone system).
But disruptive events have visible effects on the Internet. Over
the years, many of these events have been featured in the press.
But the press has never looked at the disruptions as "natural
phenomena." Nor have important distinctions been drawn.
I talk about disruptive events, because these things are quite
different from virus attacks or denial of service attacks. On the
one hand, virus attacks afflict individual PCs (sometimes hundreds
of thousands of them), but have little visible effect on the Internet.
On the other hand, denial-of-service attacks may put an individual
site out of commission, but only influence the Net because such
attacks add to the flow of bits in the Net as a whole.
Since 1990, Matrix.Net (aka Matrix Information and Directory Services)
has been surveying traffic and mapping the Internet and other networks.
We thus have a solid corpus of data to look at. Here are some instances.
The 1994 Northridge Earthquake
On 17 January 1994, at 4:31 AM PST (12:31 PM GMT), in the Northridge
suburb 20 miles northwest of Los Angeles, a magnitude 6.7 earthquake
struck. In 15 seconds it led to the deaths of 15 people and injuries
to more than 9,000. This was a major disaster. What effect did it
have on the Internet?
The Internet Weather Report (IWR, http://www.mids.org/weather/)
was already running back then. It ran scans at 2AM and 6AM PST,
thus taking before and after snapshots of the Internet. At that
time, we noted:
Comparing the two series of scans, effects of the earthquake can
be seen as far north as Lawrence Livermore Laboratories northeast
of San Francisco (which shows higher latencies immediately after
the earthquake) and as far south as San Diego (which shows more
traffic several hours later). Ensenada in Baja California, Mexico,
disappears just after the quake, but that is probably coincidence,
since the host we are pinging in Ensenada sometimes doesn't
respond.
Here are some Matrix.Net graphs of the IWR data from the Northridge
Earthquake event. You can see the initial earthquake event in both
packet loss and on the Internet at large. The effect is shortlived,
however, and by the next scan, most of it is gone. Apparently the
earthquake crashed quite a few computers briefly, but many of them
were soon back up (see Figures 1-3).
This quake did tens to hundreds of billions of dollars of property
damage, taking the San Fernando Valley off the Internet for more
than a day. Most of the affected areas were back on line completely
after a week. The Internet as a whole was not affected beyond the
immediate geographical vicinity: there was no overall effect on
packet loss; there was no visible difference in latency or packet
loss of the Internet overall.
The main reason that much of the local Internet was down was simple:
computers don't run without electricity. The only long-term
Internet infrastructure damage was one router that fell over. All
the others came back up when power was restored.
But let's move up a few years.
Hurricane Floyd
Hurricane Floyd threatened the Caribbean islands and the eastern
coast of the United States for the first two weeks of September
1999. It achieved Category 4 status before it made landfall. It
missed Hispaniola and Cuba completely, passed over the Bahamas,
missed Florida, and Georgia, grazed South Carolina, and went ashore
in North Carolina. By then it had tapered off to tropical storm
classification, still generating high winds and much rain, but instigating
far less damage than anticipated.
Overall Internet Effects
The expected overall effects on the Internet were negligible.
These graphs represent average Internet performance. The darker
line represents the Internet as a whole. The lighter one represents
the WWW, a subset of the Internet.
Looking at the Internet Average is informative. The Matrix.Net
Internet Average is a high-level summary of performance data measured
from hosts all around the world. It provides one baseline against
which more specialized Internet performance data might be compared,
serving a similar role as the Dow Jones Industrial Average does
in the financial world.
However, negligible isn't the same as indetectible. There
was, in fact, a surge in both latency and packet loss on Monday
13 September 1999, starting around 8AM EDT and continuing the entire
working day (see Figure 4).
The Internet "pinglist" shows a pointed latency spike
about noon EDT. The peak of that spike is at 186.4 ms, which is
about 16% higher than the usual 160 ms for a Monday afternoon. The
WWW pinglist shows a smaller increase, to about 169 ms over the
usual 158 ms of a Monday afternoon, or about 7%. There's a
smaller spike on both pinglists about 5PM Monday.
What we don't see is much effect on Wednesday or Thursday,
when Floyd actually made landfall. If anything, there may be a decrease
in latency on Wednesday.
Looking more closely at the underlying data, and looking only
at nodes in Florida as an example, we can see that there was a drop
in reachability at noon EDT Monday, and another at 5PM. Reachability
rachets back up by the same amounts at 7AM and 9AM Wednesday. That
looks as though a number of people turned off their machines as
they were being evacuated, and turned them back on when they returned.
ISP Effects
What might have been expected would be brief spikes in ISP performance
either when Floyd passed over an area or later when batteries were
drained. Most ISPs these days have pretty good backup power, so
this would seem the most likely scenario. For big ISPs, the battery
would only be on for about five minutes, and then emergency generators
would take over. Then it's just a question of availability
of fuel for the generators. Smaller ISPs might have bigger problems,
as they have fewer resources for USPs and generators.
Or, if damage were sufficiently severe and direct to actually
take routers off the Net, we might see an outage of a day or more,
with a sharp beginning and a gradual end as ISPs dug out and replaced
broken equipment. This would be similar to the effects of the Northridge
Earthquake, except probably more severe because the earthquake effect
was caused more by power outages than by direct damage.
What we see instead is a lot of small latency and packet loss
events on Monday, before Floyd hit anything, and negligible effects
thereafter. The only big ISP that seems to have significant later
effects was Global-One.
Friday 17 September 1999
This hurricane's bark was worse than its bite, where the
Internet was concerned. Evacuation and fear of the hurricane had
much more effect on the Internet on Monday than the hurricane itself.
However, there was finally a brief but noticeable event on Friday
(see Figures 5 and 6).
ISPs these days are quite hardened to problems with electricity
and telephone service, and even flooding. Certain kinds of problems
still affect the Internet, however, such as hysteria, seen here;
cable cuts, which are likely the cause of the Friday event; and
configuration problems inside the ISPs themselves, which were not
observed in this particular week.
Earthquakes and hurricanes are genuinely natural events; but there
are other types of physical disruption.
Fiber Cut of 29 September 1999
About noon EDT on 29 September 1999 there was a massive fiber
cut in Ohio, which took more than four hours to fix. This infrastructure
outage was noticeable across the entire Internet, as illustrated
by the Matrix.Net Internet Average.
Not much damage is visible in latency, although interestingly
enough most of what is visible is in the curve for Top Level Domain
(TLD) Domain Name System (DNS) servers.
Packet loss clearly shows the event in all three curves (Figure
7). Latency shows the most dramatic effects (Figure 8).
We also examined the top 30 ISPs one by one, and found that only
a few of them (AboveNet, GTE Internet, and PSINet) were noticeably
affected. AboveNet got it worst, and took a day to completely recover.
All the other ISPs were essentially unaffected.
Outage of October 7, 1999
About 3AM EDT (8AM GMT) on October 7, 1999 there was a massive
Internet outage, bigger than the one caused by the fiber cut of
September 29, 1999. This infrastructure outage was noticeable for
the entire Internet. Packet loss shows the outage clearly. And reachability
shows it even more clearly (Figures 9-11).
Denial of Service attacks
On Monday and Tuesday, February 7 and 8, 2000, a large number
of major sites across the US were assaulted by "Denial of Service"
(DoS) attacks. These attacks were the result of millions of messages
flooding a particular host or gateway, overwhelming the resources
and backing up traffic in a domino fashion (Figure 12).
In the days since, Attorney General Janet Reno has said that the
FBI would track and punish the miscreants. The President has called
another meeting. All the "usual suspects"
were rounded up.
There have been security warnings, as well. In general, the newspapers
and the TV reporters appear clueless on the differences between
cracking a site and blocking a site.
"Since smurf attacks originate from a variety of sources
at unpredictable times, an Internet-wide system analysis is necessary,''
said John Quarterman, CEO of Matrix.Net, in Austin, TX. He continued,
"Just look at the 'Internet Average' for 8
February. It's completely clear that the entire Internet had
higher packet loss and far lower reachability for several hours.
It's like a shark took a bite out of the Net."
There have been other interesting events: the Victoria's
Secret lingerie show slowed down AOL, but had no real effect on
the Internet; the release of the Starr Report caused several spikes
in latency, at the release time and at about 5PM EDT and thereafter.
Presumably, folks wanted to see the report when it went up, and
then read parts of it when working hours were over.
Even more recently, on September 25, 2000, the ISP Applied Theory
showed the a wild fluctuation in reachability. This was caused by
a "router flap" at Sprint. Rather than the absolute drop
shown in the cases of DoS attacks, or the massive outages shown
by fiber cuts, the up-down seesaw here reflects the router trying
to come up and succumbing again. When it was swapped out, everything
returned to normal.
The important thing is to recognize just how limited the effects
of all of these events have been, as well as how different from
each other natural disasters, fiber cuts, blown routers, and denial
of service attacks are. We are currently datamining our records
to ascertain exactly what the early warning signs are.
Knowledge is power: it may be that with early recognition, the
ISPs will be able to route around disruptions and thus ameliorate
the consequences.
Peter H. Salus, the author of A Quarter Century of UNIX
and Casting the Net, is Chief Knowledge Officer at Matrix.Net.
He can be reached at: peter@matrix.net.
|