The
Simplicity of Sockets
Randal L. Schwartz
The Internet connects people to other people and to services.
These connections are made with applications talking to each other
on separate machines, over various media such as Ethernet, dialup,
and 802.11b wireless. These applications, in turn, are using a simple
but powerful tool called a "socket" to provide the datastream
between them. In this column, I'll look at sockets and how
to describe and use them.
A socket is a connection from a process to another process. The
two processes are typically on different machines, but they can
also be on the same machine.
The most common use of a socket is to connect a client
with a server, similar to placing a phone call to some phone
service. The client creates a socket (picks up the phone), then
connects it to the server (dials the phone, waits for an
answer).
Once the sockets are connected, communication is generally bi-directional,
using some agreed-upon protocol. You must have a protocol to know
who is talking at any given time, and what it means. For example,
the HyperText Transfer Protocol (HTTP) defines how to get a Web
page, including defining precisely each side's role in the
conversation. Simple Mail Transfer Protocol (SMTP) defines the transmission
of email, and so on. When the connection is finished, both sides
hang up.
To Perl, a socket is presented as a filehandle, and is read and
written using ordinary filehandle operations. Just as a phone might
be identified by its phone number, a socket is identified using
its Internet (IP) address and port number.
The address is generally unique to a particular machine (although
a machine might have multiple addresses). In the current IPv4 scheme,
an address has a dotted quad value, like 10.1.2.3. Each number
within the address represents a byte value, and must therefore range
between 0 and 255. Because addresses can be difficult to remember,
or might change from time to time, we generally also assign names
as aliases, such as www.stonehenge.com. (One of the functions
of the DNS system is to map between names and addresses, but let's
not get distracted.)
Associated with a particular address are a range of ports, numbered
from 1 to 65535, although usually only a small number of these ports
are ever in use at one time. Ports below 1000 are reserved for "system
processes". Ports above 1023 are available for "user processes".
(Yes, there's a gap, because people disagreed about the meaning
of "1K" in some of the TCP/IP stack implementations.)
Many port numbers have pre-defined agreed-upon meanings, such
as port 25 for email, port 119 for news, port 80 for Web, and so
on. These pre-defined port numbers are important for well-known
services because knowing the port number is essential to establishing
the connection. (A server port can be established using a transient
port number, but then the port number must be communicated by other
means.)
All connections have both an address and a port number at both
ends. (You're always calling from a phone to a phone.) Generally,
you let the operating system pick a temporary port number for the
client end of the connection.
The easiest way to create a simple client socket is with the IO::Socket::INET
module, part of the libnet distribution in the CPAN. A sample
socket creation looks like:
use IO::Socket::INET;
my $connection = IO::Socket::INET->new(@parameters)
or die "Cannot connect: $@";
The @parameters will be described later. Note that the error
comes back in $@, not $!. (Some have argued that this
is a mis-design, either on Perl's part or on the author's
part, but we're stuck with the inconsistency either way.)
The resulting value in $connection acts like a filehandle;
we can print to it to send data over the wire:
print $connection "Some value\n";
Or read from it to accept the next data to be received:
my $result = <$connection>;
By default, if data is not yet ready, the read will block the process,
just like reading from an I/O device (like a terminal) or a pipe.
For a typical connection, the data won't necessarily arrive in
the same chunks as it was sent; you could print that first string
as 11 separate characters, and yet the receiver will likely get just
one full line instead.
Let's look at a simple connection to a known server. Many
systems provide a "daytime" service. This service provides
the time of day in a format similar to the output of the date
command, and was originally used to help synchronize system clocks
on various machines on a network. (These days, nearly all clock
synchronization is performed with NTP, which permits much more detailed
timing information to be accumulated and processed.)
To connect to the daytime service, we need two things: the IP
address on which the service resides, and the port of the service.
The port number is easy -- by agreement, it's port 13 (although
we don't need to know that because Perl can figure it out).
The IP address is a bit trickier, because many modern UNIX systems
have turned off this service, as suggested by the security theory
of "don't run anything unnecessary".
When we connect to the daytime service, there's nothing to
send. The server just notices the connection, immediately sends
the time string to our client, and drops the connection. The code
to connect to our machine's daytime service looks like this:
use strict;
use IO::Socket::INET;
my $client = IO::Socket::INET->new('localhost:daytime')
or die "new: $@";
{
local $/;
$_ = <$client>;
}
if (defined $_) {
print
"response from ",
$client->peerhost,
" port ",
$client->peerport,
" was:\n$_";
}
If you are experimenting with this code, and can't make a connection
to localhost, try the http://www.time.gov machine's
port 13 instead. Don't abuse this address though -- lots
of people share this resource. The output of connecting to http://www.time.gov
looks something like this:
response from 132.163.4.203 port 13 was:
52765 03-05-06 19:48:17 50 0 0 666.7 UTC(NIST) *
The format of the response from a given daytime port may vary from
host to host. The value here is described at:
http://www.boulder.nist.gov/timefreq/service/its.htm
Notice the use of the peeraddr and peerport method calls
against the connection handle. The connection object's method
calls are described in perldoc IO::Socket::INET. Here, we're
asking for the IP address and port number of the other end of the
connection. We can get our own address similarly by calling $client->sockhost
and $client->sockport.
Yes, that was a pretty simple connection. Most of the hard work
of setting up the socket -- specifying the protocol and port
information, and making the connection -- was all done within
the IO::Socket::INET object methods. Also, for daytime,
we didn't have to say anything, just listen.
Let's look at a slightly more difficult application: using
HTTP to talk to a Web server. At its simplest, we fetch a Web page
by connecting to the Web server's port 80, sending it the right
string, and then waiting for a response. To get the top-level page
of a server, we send:
GET / HTTP/1.0
followed by two return/linefeed pairs. Let's get the top-level
page of the http://www.time.gov Web server:
use strict;
use IO::Socket::INET qw(CRLF);
my $client = IO::Socket::INET->new('www.time.gov:http')
or die "new: $@";
print $client "GET / HTTP/1.0", CRLF, CRLF;
print while <$client>;
This program starts the same way; note, however, the extra import
for the CRLF constant, which is just a return/linefeed pair.
(This is not necessarily the same as "\r\n" on some platforms,
hence the constant.) Once the socket connection is open, we push the
text of the HTTP request down the socket, and then start listening
for the response. As the lines are returned, they're printed
to STDOUT. When the remote server closes the connection, that
appears as an I/O error, causing the filehandle read to return undef,
and drops us out of the loop (and out of the program).
In a real application, I wouldn't write code like this, because
we're leaving a lot out, like the host header on the
request, parsing of the headers in the response, and dealing with
Web proxy servers if needed. But on top of code like this, others
have come along to implement the full protocol. In particular, the
HTTP protocol is well managed in the LWP library, so we have
to merely write:
use LWP::Simple;
getprint("http://www.time.gov");
And that's a lot easier to remember. I hope you can see that
there's not really a lot of mysterious stuff going on. We just
identify the host and port, connect to it, send data, and receive
data.
Although I've said a lot this month, there's a lot I
had to leave out for space. For example, with code very similar
to this, you can set up a server that can listen for other client
connections. You can also talk over UDP (datagram) protocol, rather
than TCP (connection) protocol. In fact, Perl's networking
ability gets very close to the flexibility provided by systems implementation
languages, such as C. But that's a story for another time.
Until then, enjoy!
Randal L. Schwartz is a two-decade veteran of the software
industry -- skilled in software design, system administration,
security, technical writing, and training. He has coauthored the
"must-have" standards: Programming Perl, Learning
Perl, Learning Perl for Win32 Systems, and Effective
Perl Programming. He's also a frequent contributor to the
Perl newsgroups, and has moderated comp.lang.perl.announce since
its inception. Since 1985, Randal has owned and operated Stonehenge
Consulting Services, Inc.
|