3.1 - How do I speak { HTTP, POP3, SMTP, FTP, Telnet, NNTP, etc. } with Winsock?
Winsock proper does not provide a way for you to speak these protocols,
because it only deals with the layers underneath these application-level
protocols. However, there are many ways for you to get your program to
speak these protocols.
The easiest method is to use a third-party library. The Resources section lists several of
these.
If you only need to speak the HTTP, FTP or gopher protocols, you can
use the WinInet library exposed by Microsoft's Internet Explorer. Newer
versions of Microsoft's development tools include components that make
accessing WinInet simple.
Finally, you can always roll your own. You should
start by reading the specification for the protocol you
want to implement. Most of the Internet's protocols are
documented in RFCs. The Important RFCs page links to the
most commonly referenced application-level RFCs. The complexity of
the protocols vary widely, and the only way to gauge the difficulty
of implementing the protocol is to read the relevant RFC(s). HTTP, for
example, is a pretty simple protocol, but the authors of its RFC managed
to fill 176 pages talking about it. Most RFCs aren't that pretentious,
luckily.
If you've read the RFC and still can't figure the protocol
out, try asking on Usenet. There are many newsgroups dedicated to
particular application protocols: most are in the comp.protocols.*
hierarchy. Failing that, you can ask in one of the general Winsock and TCP/IP mailing lists
and newsgroups.
3.2 - How can I encrypt my TCP stream with SSL/TLS?
At this time, only Windows NT 4.0 SP4+, Windows 2000 and Windows
CE have a generic built-in SSL mechanism. For other operating systems,
you have the option of WinInet (limited in various ways), or to get a
third-party library.
Windows NT 4.0 SP4+ and Windows 2000 offer SSL through their security
APIs. You can find sample code to show how these mechanisms work in the Win32 Platform
SDK. The SSL samples are underneath the Platform SDK directory in the
"Samples\WinBase\Security\SSL" subdirectory.
Windows CE has a different SSL mechanism. There is an
article in MSDN that describes how to use the
functionality. The article also goes into the WinInet method.
WinInet is a feature in Internet Explorer version 3 and higher that
lets you use some of Internet Explorer's networking functionality in your
own programs. Since IE3 is on Windows 95 OSR2 and newer, and Microsoft
is doing lots to make sure IE remains part of Windows (tic), this may
be a reasonable option for you. The main disadvantages of WinInet's
SSL feature are that it only works with HTTP, and WinInet is not very
flexible. Also, 128-bit IE is not available worldwide. MS Knowledge Base
article Q168151 shows how to use this feature.
3.3 - How do I get my IP address from within a Winsock program?
There are three methods, which each have advantages and disadvantages:
- To get the local IP address if you already have a bound or
connected socket, call
getsockname() on the socket. Note
that if you've just bound that socket to an address (even
a generic address like INADDR_ANY), getsockname() will
return that address.
- To get your address without opening a socket first,
do a
gethostbyname() on the value gethostname()
returns. This will return a list of all the host's
interfaces, as shown in this
example. (See the example page for problems with the
method.)
- The third method only works on Winsock
2. The new
WSAIoctl() API supports the
SIO_GET_INTERFACE_LIST option, and one of the
bits of information returned is the addresses of each of the
network interfaces in the system. [C++
Example] (Again, see the example page for caveats.)
3.4 - What's the proper way to impose a packet scheme on a stream protocol like TCP?
The two most common methods are delimiters and length-prefixing.
An example of delimiters is separating packets with, say, a caret
(^). Naturally your delimiter must never occur in regular data, or you
must have some way of "escaping" delimiter characters.
An example of length-prefixing is prepending a two-byte integer
containing the packet length on every packet. See the FAQ article How to Use TCP Effectively for the proper way to
send integers over the network.
There are hybrid methods, too. The HTTP protocol, for example,
separates header lines with CRLF pairs (a kind of delimiting), but when
an HTTP reply contains a block of binary data, the sever also sends the
Content-length header before it sends the data, which is a kind
of length-prefixing.
I favor simple length-prefixing, because as soon as you read the length
prefix, you know how many more bytes to expect. By contrast, delimiters
require that you blindly read until you find the end of the packet.
3.5 - How do I write my program to work through a firewall?
The SOCKS protocol allows a client program on a protected internal
network to ask the firewall to act as a relay between it and a host
outside the firewall. SOCKS version 4 is the "basic" protocol: most
programs don't need anything more complicated. SOCKS version 5 adds UDP
support, end-to-end encryption, and secure logins. Beware that even if
your program only needs SOCKS v4 features, some firewalls require that
you use a SOCKS5-compatible login to get through.
You can find out more about SOCKS via the following links:
NEC's SOCKSifier (called SOCKSCap) allows almost any Winsock program
to run over a SOCKS firewall with no changes to the application. (There
are other vendors that offer SOCKSifiers, and a few non-Microsoft Winsock
stacks are SOCKSified.) The problem with SOCKSCap is that your users must
set up SOCKSCap by hand, and they must run your program through
the SOCKSifier. It's more user-friendly to implement the SOCKS protocol
within your program.
There is some BSD sockets client-side SOCKS code at the NEC site
that you can easily modify to work with Winsock. This is included as
part of the Unix SOCKS server package as a library that the included
utility programs (rtelnet, rping, etc.) all link to.
SOCKS inherently has trouble with some protocols because of
the way they operate. Examples of this are RealAudio and almost any
"multiuser" protocol, such as multiplayer online games and conferencing
programs. These programs require some kind of intelligent proxy at the
firewall that understands the protocol. Unfortunately, these types
of proxies are only available for well-established protocols like
RealAudio.
3.6 - What if the firewall does not support SOCKS?
Technical people often look at a firewall as a technical solution
to a technical problem. In fact, a firewall is a political construct.
Like a country's border, it exists to protect an organization's turf
from invaders. The permeability of a network firewall depends on the
paranoia level of the people who control the firewall: there's a lot to
be paranoid about in today's Internet.
Many firewalls are set up to block all outbound connections except
to a very few ports. For example, the firewall might be set up to
allow outbound connections on ports 80, 21 and 25 only (web, FTP and
outbound email). If your application uses a port that the firewall blocks,
you will have to persuade the network administrators to open up that
port on their firewall. Network admins are often very territorial
about their firewalls. Getting your way will require a certain amount
of political finesse.
The administrators at such a site have decided that they'd rather
open ports on a case-by-case basis than block ports as new security risks
get discovered. You have to convince the admins to open up your port. I
suggest that you prepare a written argument explaining what your program
does, why it does it, and an analysis of potential risks. Don't fudge
on the argument: be completely honest and open. Present the benefits of
the application separately; you're not trying to convince the admins to
use the program, just to show that the application is useful to users on
their network. Try to get some of those users to speak with the network
admins directly to help you press your case.
Why a "written" argument? Partly because it shows professionalism,
partly because you might not be able to speak with the admins directly,
and partly because it forces you to clarify and organize your thoughts.
If you just call the admin up on the phone and make up your arguments as
you speak with him, you may just come across as argumentative.
Even if you do end up speaking with the administrator directly, having
a written argument to refer to will help you to make your case.
Some people suggest a different tactic: change your program so that it
sends its packets out on a port that's open on even the most restrictive
firewalls. Because some firewalls examine the contents of packets to
make sure they're legitimate, some people even suggest encapsulating
your data in the protocol that usually runs on that port. For example,
you could send your data out on port 80, and put your program data in
the payload area of HTTP packets.
To extend the "politics" analogy, this is the "smuggling" tactic. Just
like smuggling in the real world, it's likely to get you into trouble. You
won't go to jail for it, but you might generate a lot of antagonism
toward your program; network admins have been known to band together
and ban programs that annoy them.
3.7 - I'm writing a server. What's a good network port to use?
If you're writing a server for an existing, popular Internet
protocol, it's already got a port number assigned to it. You can
find the most common of these numbers at the website for the Internet Assigned Numbers Authority
(IANA).
If you're writing a server for a new protocol, there are a few rules
and suggestions you should obey when choosing your server's port:
- Ports 1-1023 are off-limits to people inventing
new protocols. They are reserved by the IANA for new "standard"
protocols. Important protocols like POP3 and HTTP have low numbers
(110 and 80, respectively), but your new K-RAD game server shouldn't. Note
that id Software is going to Hell for using port 666 with their DOOM
network server. They cleaned up their act with Quake, though.
- Ports 1024 through 49151 are Registered Ports, which are a good
range to choose your ports from. Just beware that the entire world is
choosing from ports in this range, so it may make sense for you to register
your port, or at least check the current
list of assigned ports. Just be aware that no one is obligated to
check that list before they make up their app's port number.
- Ports 49152 through 65535 are Dynamic Ports, meaning that operating
systems use ports in this range when choosing random ports. (The
FTP protocol, for example, uses random ports in the data transfer
phase.) This is a poor range to choose ports from, because there's a
fairly decent chance that your program and the OS will fight over a
given port eventually.
- Many OSes pick local ports for client programs from the 1024-5000
range. You would do well to pick server ports higher than 5000, but this
is not as rigid a rule as the previous ones.
- Within the "safe" 5000-49151 range, there are many numbers the IANA
shows as unregistered. Of these, you should avoid port numbers with
patterns to them, or a widely-recognized meaning. People tend to pick
these since they're easy to remember, but this increases the chances of
a collision. Ports 6969, 5150 and 22222 are bad choices, for example.
You should also give some thought to making your program's port
configurable, in case your program is run on a machine where another
server is already using that port. One way to do this is through Winsock's
getservbyname() function: if that function returns a port
number, use that, otherwise use the default port number. Then users
can change your program's port by editing the SERVICES file, located in
%WINSYSDIR%\DRIVERS\ETC on Windows NT/2000 systems and c:\Windows\ on
Win9x machines.
3.8 - What is UDP? What are its limitations?
The User Datagram Protocol is part of the
TCP/IP protocol suite; it is an alternative to TCP.
("TCP/IP" includes UDP, but it can also mean "TCP over IP", so in
some discussions you see the term "UDP/IP".) Winsock gives you a UDP
socket when you pass SOCK_DGRAM as the second argument to
socket() .
UDP is an "unreliable" protocol: the stack does not
make any effort to handle lost, duplicated, or out-of-order packets. UDP
packets are checked for corruption, but a corrupt UDP packet is simply
dropped silently.
The stack will fragment a UDP datagram when it's larger than the
network's MTU. The remote peer's stack will reassemble
the complete datagram from the fragments before it delivers it to the
receiving application. If a fragment is missing or corrupted, the whole
datagram is thrown away. This makes large datagrams impractical: an 8K
UDP datagram will be broken into 6 fragments when sent over Ethernet,
for example. If any of those 6 fragments is lost or corrupted, the stack
throws away the entire 8K datagram.
Datagram loss can also occur within the stack at the sender or the
receiver, usually due to lack of buffer space. It is even possible for
two communicating programs running on the same machine to have data
loss if they use UDP. (This actually happens on Windows under high load
conditions, because it starts dropping datagrams when the stack buffers
get full.) This limits UDP's value as a local IPC mechanism.
If any of these types of loss occur, no notification will be sent to
the sender or receiver, even if the loss happens within the network
stack.
Duplicated datagrams are not dropped: they are delivered to the
receiver. It is up to the application to detect this problem, and it is
the program's choice what to do with the duplicate datagram.
UDP datagrams can be delivered in any order. Datagrams often get
reordered on the network when two datagrams get delivered via different
routes, and the second datagram's route happens to be quicker.
3.9 - What is UDP good for?
From the above discussion, UDP looks pretty
useless, right? Well, it does have a few advantages over reliable
protocols like TCP:
- UDP is a slimmer protocol: its protocol header is fixed
at 8 bytes, whereas TCP's is 20 bytes at minimum and can be
more.
- UDP has no congestion control and no data coalescing. This
eliminates the delays caused by the delayed
ACK and Nagle algorithms. (This is
also a disadvantage in many situations, of course.)
- There is less code in the UDP section of the stack than
the TCP section. This means that there is less latency between
a packet arriving at the network card and being delivered to
the application.
- Only UDP packets can be broadcast
or multicast.
This makes UDP good for applications where timeliness and control is
more important than reliability. Also, some applications are inherently
tolerant of UDP problems: data loss in a streaming video program just
means a frame or two is dropped.
Be careful not to let UDP's advantages blind you to its bad points: too many application writers have started
with UDP, and then later been forced to add reliability features. When
considering UDP, ask yourself whether it would be better to use TCP
from the start than to reinvent it. Or, whether you need to consider
something like RTP (RFC 1889) to handle reliable broadcasts for you.
3.10 - How do I send a broadcast packet?
With the UDP protocol you can send a packet so that all workstations
on the network will see it. (TCP doesn't allow broadcasting.)
To send broadcast packets, you must first enable the
SO_BROADCAST option with the setsockopt()
function. Next you have to figure out the "directed broadcast" address,
which means "send this packet to all stations on this LAN". To construct
the directed broadcast address, use the following C code:
u_long host_addr = inet_addr("172.16.77.88"); // local IP addr
u_long net_mask = inet_addr("255.255.224.0"); // LAN netmask
u_long net_addr = host_addr & net_mask; // 172.16.64.0
u_long dir_bcast_addr = net_addr | (~net_mask); // 172.16.95.255
Potential Problems: Broadcasts can be useful at times,
but keep in mind that this creates a load on all the machines on the
network, even on machines that aren't listening for the packet. This
is because the part of the stack
that can reject the packet is several layers down. As a result,
most routers drop simple broadcast packets, and sometimes
even drop directed broadcasts to nearby networks. (A simple
broadcast is one sent to address 255.255.255.255.) The practical
upshot of this is that sometimes broadcasts won't work at all,
and even when they do work they cause unnecessary loads on the
network. To get around these problems, you may want to consider multicasting
instead.
3.11 - Is Winsock thread-safe?
The Winsock specification does not mandate that a Winsock
implementation be thread-safe, but it does allow an implementor
to create a thread-safe version of Winsock.
Bob Quinn says, on this subject:
- "WinSock, any implementation, is thread safe if the WinSock
implementation developer makes it so (it doesn't just happen)."
- "I don't know of any implementations from Microsoft (or any
other vendors) that are not thread safe."
- "If a WinSock application developer creates a multi-threaded
application that shares sockets among the threads, it is that
developer's responsibility to synchronize activities between
the threads."
By "synchronize activities", I believe Bob means that it may cause
problems if, for example, two threads repeatedly call send()
on the same socket. There is no guarantee in the Winsock specification
about how the data will be interleaved in this situation. Similarly, if
one thread calls closesocket() on a socket, it must somehow
signal other threads using that socket that the socket is now invalid.
Anecdotal evidence suggests that one thread calling send()
and another thread calling recv() on a single socket is safe,
but I have not tested this. Hard information, demonstration code and/or
more anecdotal evidence either way would be appreciated.
Instead of multiple threads accessing a single socket, you may
want to consider setting up a pair of network I/O queues. Then, give
one thread sole ownership of the socket: this thread sends data from
one I/O queue and enqueues received data on the other. Then other
threads can access the queues (with suitable synchronization).
Applications that use some kind of non-synchronous socket typically
have some I/O queue already. Of particular interest in this case is
overlapped I/O or I/O completion ports, because these I/O strategies
are also thread-friendly. You can tell Winsock about several OVERLAPPED
blocks, and Winsock will finish sending one before it moves on to the
next. This means you can keep a chain of these OVERLAPPED blocks, each
perhaps added to the chain by a different thread. Each thread can also
call WSASend() on the block they added, making your main
loop simpler.
3.12 - If two threads in an application call recv() on a socket, will they each get the same data?
No. Winsock does not duplicate data among threads.
Note that if you do call recv() at the same time
on a single socket from two different threads, havoc may result. See
the previous question for more info.
3.13 - Is there any way for two threads to be notified when something happens on a socket?
No. If two threads call WSAAsyncSelect() on
a single socket, only the thread that made the last call to
WSAAsyncSelect() will receive further notification
messages. Similarly, if two threads call WSAEventSelect()
on a socket, only the event object used in the last call will
be signaled when an event occurs on that socket. You also can't
call WSAAsyncSelect() on a socket in one thread and
WSAEventSelect() on that same socket in another thread,
because the calls are mutually exclusive for any single socket. Finally,
you cannot reliably call select() on a single socket from
two threads and get the same notifications in each, because one thread
could clear or cause an event, which would change the events that the
other thread sees.
3.14 - How do I detect if the modem is connected?
It is sometimes useful for a Winsock program to only do its thing
if the computer is already connected to the Internet. The Remote Access
Service (RAS) API gives an application access to the dial-up networking
subsystem. In particular, the RasEnumConnections() call
lets you easily get a list of the connected modems.
RAS/DUN is not installed on all systems. Therefore, it's safest not
to link your application with the rasapi32.dll import library. Instead,
use the LoadLibrary() call to check for rasapi32.dll. If
it succeeds, you can use GetProcAddress() to call the
RasEnumConnections() function. If, on the other hand, the
LoadLibrary() call fails, you know RAS isn't installed,
so there couldn't be a modem connection in the first place.
Warning: Before using
RasEnumConnections() to check for an Internet connection,
keep in mind that many computers are not connected to the Internet through
a modem. Often, they are connected to a LAN, and that LAN is somehow
gatewayed to the Internet. For
these situations, using RAS will fool your program into believing that
the user either has no Internet connection, or that it is never up.
3.15 - How can I get the local user name?
There are a few ways. The easiest is to use the Win32 call
GetUserName() . [C++
Example].
The other way is shown in the Microsoft Knowledge Base article
Q155698. It is much more complex, and it shows two completely
different methods, one for Windows 9x/Windows 3.1 and one for Windows
NT. Unless you need Windows 3.1 support or the LAN Manager domain name
(as opposed to the DNS domain name), I suggest you give this article
a miss.
3.16 - Windows 9x's Dial Up Networking keeps popping up an automatic dial window, even when it isn't necessary. Can I make it stop?
On some Windows 9x systems with more than one network interface,
Dialup Networking (DUN) sometimes pops up an automatic-dial window
even when it is obviously not required. An example of such a setup is a
machine on a LAN that also has a modem for connecting to the Internet.
The most common trigger for the DUN dial window is a Winsock program
calling the gethostbyname() function, which initiates a
DNS lookup. Even if the name is that of a LAN machine and there's a DNS
server on the LAN, DUN will still try to bring up the Internet link to
try that first.
If you try messing with the DNS configuration of a multihomed Win9x box, it's clear
that the network subsystem just isn't designed to support a local DNS
server in addition to a remote one. The best solution, then, is to just
use straight IP addresses, and write your programs to recognize an IP
address, so they don't have to call gethostbyname() .
I've heard that DUN 1.3 and/or the Winsock 2 updates fix this
problem, but other reports say it doesn't help.
3.17 - I've heard that asynchronous sockets are unreliable. Is this true?
Asynchronous sockets are reliable if your program obeys the letter
of the Winsock specification.
Every so often, you hear stories about a program that loses asynch
notification messages. As far as I can tell, it's always due to a bug in
the complainer's program, due to misunderstanding Winsock's parsimonious
notification policy.
Consider the FD_WRITE notification. That only gets sent when a client's
connection is accepted by the remote peer, and from then on only when
output buffer space becomes available after Winsock gives you a
WSAEWOULDBLOCK error. To put
it another way, FD_WRITE only gets sent to say, "Before now, it was not
okay to write data on this socket; now it's okay." The conservative way
to handle this is to always try to send data when you have it, whether
you've received an FD_WRITE or not. You might get a WSAEWOULDBLOCK error,
but that's harmless and easy to handle. Your handler for FD_WRITE then
just tries to send everything queued up until it sends it all or gets
another WSAEWOULDBLOCK.
Win16 message queues are fixed-length and fairly short, so it is
at least possible to lose notifications in 16-bit programs. If Winsock
fails to send you a notification because the message queue is full, it is
supposed to keep trying, but empirical evidence suggests that this does
not always happen. Keep in mind that when we speak of "16-bit Winsock"
we're talking about stacks from a dozen different vendors, each with
many versions spanning many years.
For my own part, I've been using asynchronous sockets almost
exclusively for years now with no problems. Others who've been using
asynchronous notification for years longer than I have agree. If you
believe you're losing notifications, you have to ask yourself whether
it's more likely that we've overlooked a bug in the stack or that there's
a bug in your program.
3.18 - What is the Nagle algorithm?
The Nagle algorithm is an optimization to TCP that makes the stack
wait until all data is acknowledged on the connection before it sends
more data. The exception is that Nagle will not cause the stack to wait
for an ACK if it has enough enqueued data that it can fill a network
frame. (Without this exception, the Nagle algorithm
would effectively disable TCP's sliding window
algorithm.) For a full description of the Nagle algorithm, see RFC 896.
So, you ask, what's the purpose of the Nagle algorithm?
The ideal case in networking is that each program always sends a
full frame of data with each call to send() . That
maximizes the percentage of useful program data in a packet.
The basic TCP and IPv4 headers are 20 bytes each. The
worst case protocol overhead percentage, therefore, is 40/41, or
98%. Since the maximum amount of data in an Ethernet frame is 1500 bytes,
the best case protocol overhead percentage is 40/1500, less than 3%.
While the Nagle algorithm is causing the stack to wait for data to
be ACKed by the remote peer, the local program can make more calls to
send() . Because TCP is a stream protocol,
it can coalesce the data in those send() calls into a single TCP
packet, increasing the percentage of useful data.
Imagine a simple Telnet program: the bulk of a Telnet conversation
consists of sending one character, and receiving an echo of that character
back from the remote host. Without the Nagle algorithm, this results
in TCP's worst case: one byte of user data wrapped in dozens of bytes
of protocol overhead. With the Nagle algorithm enabled, the TCP stack
won't send that one Telnet character out until the previous characters
have all been acknowledged. By then, the user may well have typed another
character or two, reducing the relative protocol overhead.
This simple optimization interacts with other features of the TCP
protocol suite, too:
- Most stacks implement the delayed
ACK algorithm: this causes the remote stack to delay ACKs
under certain circumstances, which allows the local stack a bit
of time to "Nagle" some more bytes into a single packet.
- The Nagle algorithm tends to improve the percentage of useful
data in packets more on slow networks than on fast networks,
because ACKs take longer to come back.
- TCP allows an ACK packet to also contain data. If the local
stack decides it needs to send out an ACK packet and the Nagle
algorithm has caused data to build up in the output buffer,
the enqueued data will go out along with the ACK packet.
The Nagle algorithm is on by default in Winsock, but it can
be turned off on a per-socket basis with the TCP_NODELAY option of
setsockopt() . This option should not be
turned off except in a very few situations.
Beware of depending on the Nagle algorithm too heavily. send()
is a kernel function, so every call to send() takes much more
time than for a regular function call. Your application should coalesce
its own data as much as is practical to minimize the number of calls to
send() .
3.19 - When should I turn off the Nagle algorithm?
Generally, almost never.
Inexperienced Winsockers usually try disabling the Nagle algorithm when
they are trying to impose some kind of packet
scheme on a TCP data stream. That is, they want to be able to send,
say, two packets, one 40 bytes and the other 60, and have the receiver
get a 40-byte packet followed by a separate 60-byte packet. (With the
Nagle algorithm enabled, TCP will often coalesce these two packets
into a single 100 byte packet.) Unfortunately, this is futile, for the
following reasons:
- Even if the sender manages to send its packets individually,
the receiving TCP/IP stack may still coalesce the received packets
into a single packet. This can happen any time the sender can
send data faster than the receiver can deal with it.
- Winsock Layered Service Providers (LSPs) may coalesce or
fragment stream data, especially LSPs that modify the data as it
passes.
- Turning off the Nagle algorithm in a client program will
not affect the way that the server sends packets, and vice versa.
- Routers and other intermediaries on the network can fragment
packets, and there is no guarantee of "proper" reassembly with
stream protocols.
- If packet arrives that is larger than the available space
in the stack's buffers, it may fragment a packet, queuing up
as many bytes as it has buffer space for and discarding the
rest. (The remote peer will resend the remaining data later.)
- Winsock is not required to give you all the data it has
queued on a socket even if your
recv() call gave Winsock
enough buffer space. It may require several calls to get all
the data queued on a socket.
Aside from these problems, disabling the Nagle algorithm almost always
causes a program's throughput to degrade. The only time you should disable
the algorithm is when some other consideration, such as packet timing,
is more important than throughput.
Often, programs that deal with real-time user input will disable
the Nagle algorithm to achieve the snappiest possible response, at the
expense of network bandwidth. Two examples are X Windows servers and
multiplayer network games. In these cases, it is more important that
there be as little delay between packets as possible than it is to
conserve network bandwidth.
For more on this topic, see the Lame List and the FAQ article How to Use TCP Effectively.
3.20 - What is TCP's sliding window?
In a naïve implementation of TCP, every packet is immediately acknowledged
with an ACK packet. Until the ACK arrives from the receiver (in this
naïve implementation, at any rate), the sender does not send another
packet. If the ACK does not arrive within some particular time frame,
the sending stack retransmits the packet.
The problem with this is that all that waiting limits network
throughput drastically. The minimum time between packets with such a
scheme must be at least twice the minimum round trip time for that
network, for the time to send the packet and for the time for the
receiver to send back an ACK. Add in processing time on each end,
temporary hardware faults (e.g. Ethernet collisions), retransmissions,
routing delays, and who knows what else: the stacks end up spending more
time waiting for ACKs than sending data. This is a problem because it
means you can't effectively fill a network pipe with a single socket.
The TCP window fixes this. It allows a sender to have several
unacknowledged packets "in flight" at a time. When the TCP connection is
established, the stacks tell each other how much buffer space they've
allocated for this connection: this is the maximum window size. As the
window fills up, the receiver sends the new size of the window back to
the sender in an ACK packet. This tells the sender when it cannot send
any more data without overflowing the receiver's window. When the sender
sees that the receiver's window is full, it stops sending data until it
gets an ACK saying that space has become available in the window.
"Why is it called a sliding window," you ask? Imagine a TCP data
stream as a long line of bytes. The sliding window is how the sender
sees the receiver's buffer: as a fixed-size "window" sliding along the
stream of bytes. One edge of the window is between the last byte the
receiver has read and the next byte to be read, and the other edge is
between the last byte in the receiver's input buffer and the first byte
to be sent from the sender's output buffer. As the receiver reads bytes
out of the network buffers, the window slides down the stream; any time
it slides into the sender's buffer, the sender sends more data to fill
up the window.
See the next two items for related discussion.
3.21 - What is the silly window syndrome?
The silly window syndrome results when the sender can send data
faster than the reciever can handle it, and the receiver calls
recv() with very small buffer sizes.
The fast sender will quickly fill the receiver's TCP window. The receiver then reads
N bytes, N being a relatively small number compared to the network frame size. A naïve stack will immediately send an ACK to
the sender to tell it that there are now N bytes available in its TCP
window. This will cause the sender to send N bytes of data; since N is
smaller than the frame size, there's relatively more protocol overhead in
the packet compared to a full frame. Because the receiver is slow (and,
in fact, stupid for calling recv() with small buffer sizes)
the TCP window stays very small, and thus hurts throughput because the
ratio of protocol overhead to application data goes up.
The solution to this problem is the delayed
ACK algorithm. This causes the window advertisement ACK to be
delayed a bit, hopefully allowing the slow receiver to read more of
the enqueued data before the ACK goes out. This results in a larger
window advertisement, so the fast sender can send more data in a single
frame.
3.22 - What is the delayed ACK algorithm?
In a simpleminded implementation of TCP, every data packet that comes in is immediately
acknowledged with an ACK packet. (ACKs help to provide the reliability
TCP promises.)
In modern stacks, ACKs are delayed for a short time (up
to 200ms, typically) for three reasons: a) to avoid the silly window syndrome; b) to allow ACKs to
piggyback on a reply frame if one is ready to go when the stack
decides to do the ACK; and c) to allow the stack to send one ACK for
several frames, if those frames arrive within the delay period.
The stack is only allowed to delay ACKs for up to 2 frames of
data.
3.23 - What platform should I deploy my server on?
Your only real choice is Windows NT/2000 Server. It has been shown
that Windows NT Workstation uses an identical kernel to NT Server.
Presumably this is still the case with Windows 2000. However, at startup
time, NT Workstation's kernel cripples itself with respect to NT Server's
run-time behavior.
There are some minor tuning differences that make NT Workstation and
Server more responsive for different tasks, but the important difference
is that NT Workstation's connection
backlog is limited to 5 slots. This means that your program has to
call accept() fast enough that not more than 5 connections build
up in the network stack's connection backlog. The stack rejects new
connections as long as the queue is full. For a well-written server,
this is not normally a problem, but it does mean that a concerted
attack (a SYN flood, for example) can fill the queue, denying service
to legitimate users. NT Server when the dynamic backlog feature
is enabled has an effectively unlimited backlog queue size,
in order to better withstand SYN attacks.
The difference Microsoft wants you to hear about is the one in the
license: you cannot run a server on NT Workstation that accepts more
than 10 connections at once. The kernel does not currently enforce this
limit, but they did during the NT 4.0 beta cycle. (Public outcry caused
Microsoft to remove the kernel enforcement.) Microsoft may well try to
enforce the limit on future products. Microsoft has been known to put
such limits in add-on products like SQL Server, which won't even run on
NT Workstation.
The other alternatives for servers Windows 95/98/ME
are also fatally limited. They share the 5-slot backlog limit of NT
Workstation, for one thing. More importantly, their kernels are obviously
inferior. It's trivial to objectively prove this: just set up a basic
server (any of the ones in the FAQ's basic
Winsock examples section will do fine) and time the connection
accepting speed and the data rates. Then do the same with an NT
Workstation or Server machine on the same network with the same client. A
server running on a Win9x kernel feels intentionally crippled compared
to the same program running on Windows NT.
Win9x has other problems as a server. The most obvious is its lack of
stability. Also, Win9x does not have in-kernel support for overlapped
I/O, and I/O completion ports are totally missing: these features are
required to maximize network bandwidth in high-load situations. As if you
needed more evidence, Win9x has several problems when given more than one
network card, a common technique for increasing throughput with cheap
network technologies. (This is as opposed to putting in a single ATM,
gigabit Ethernet or FDDI card.)
|