More Nodes or a Higher-Performance Interconnect?
If you want to boost the performance of your cluster, you typically
have two choices: buy more nodes or get a higher performance interconnect.
However, interconnect cost is often not linear.
Smaller systems can be made by tightly integrated, unmanaged switches
that can cost less than $100 per port. However, these inexpensive
switches are not suitable for building larger networks. If the cluster
size dictates managed Layer 3 switches, the cost jumps to $500-$1,000
per port. Switches often provide 2n or 2m +2n ports. If the switch
enclosure is fully equipped, expanding the switch carries more cost
per port. For example, if a single switch enclosure supports 64
ports, you will need six fully populated enclosures to support 128
ports maintaining full bi-sectional bandwidth -- a three-fold
price premium. In other words, the last 64 ports will cost five
times as much as the first 64. (See Figure 3.)
Applications can also be very sensitive to latency, which can
be driven by the processing power available on the cluster. This
is particularly true when the system is scaled. What is the overhead
for the application to send or receive messages, and how long are
packets in transit? How do these numbers change as the cluster is
scaled up in size? It's this limitation that will ultimately
constrain the scalability of any application. At some point, it
is not cost-effective to add more hardware to support the application.
However, larger clusters do offer two benefits. First, the systems
administration cost can be optimized by having fewer but larger
clusters. Second, if the key usage of the cluster is throughput
oriented, several instances of the application can run on different
parts of the cluster at the same time.
That said, different applications have different requirements
for the interconnect; they can be sensitive to latency, bandwidth,
or both. How you scale the application is therefore very important.
Scaling the system to reduce the turnaround time often changes the
application's characteristics. With few nodes, the application
will send a few large messages infrequently, whereas when you scale
it to many nodes (hundreds or thousands), the application will typically
send short messages at tighter intervals. Thus, frequent exchange
of small messages puts a larger burden on the interconnect in comparison
to infrequent exchange of large messages.
This restriction to the scalability of the application is depicted
by Amdahl's law -- for systems administrators, Amdahl's
law implies that the behavior of an application on a small cluster
can not necessarily be transcribed into a larger cluster. An application
that appears to run well using a legacy interconnect on a handful
number of nodes may likely require an exotic, low-latency interconnect
to take advantage of a larger cluster.
You can also scale the system to solve a larger problem. Findings
from a small cluster can often be transcribed into a scenario with
a larger cluster, in particular for its computational phase. On
the other hand, an application's input and output phase might
be handled sequentially where special consideration is needed to
avoid a bottleneck, which in turn restricts the performance gain
offered by the larger cluster.
|