Winsock Programmer's FAQ: Intermediate Winsock Issues

Winsock Programmer's FAQ
Section 3: Intermediate Winsock Issues

3.1 - How do I speak { HTTP, POP3, SMTP, FTP, Telnet, NNTP, etc. } with Winsock?

Winsock proper does not provide a way for you to speak these protocols, because it only deals with the layers underneath these application-level protocols. However, there are many ways for you to get your program to speak these protocols.

The easiest method is to use a third-party library. The Resources section lists several of these.

If you only need to speak the HTTP, FTP or gopher protocols, you can use the WinInet library exposed by Microsoft's Internet Explorer. Newer versions of Microsoft's development tools include components that make accessing WinInet simple.

Finally, you can always roll your own. You should start by reading the specification for the protocol you want to implement. Most of the Internet's protocols are documented in RFCs. The Important RFCs page links to the most commonly referenced application-level RFCs. The complexity of the protocols vary widely, and the only way to gauge the difficulty of implementing the protocol is to read the relevant RFC(s). HTTP, for example, is a pretty simple protocol, but the authors of its RFC managed to fill 176 pages talking about it. Most RFCs aren't that pretentious, luckily.

If you've read the RFC and still can't figure the protocol out, try asking on Usenet. There are many newsgroups dedicated to particular application protocols: most are in the comp.protocols.* hierarchy. Failing that, you can ask in one of the general Winsock and TCP/IP mailing lists and newsgroups.

3.2 - How can I encrypt my TCP stream with SSL/TLS?

At this time, only Windows NT 4.0 SP4+, Windows 2000 and Windows CE have a generic built-in SSL mechanism. For other operating systems, you have the option of WinInet (limited in various ways), or to get a third-party library.

Windows NT 4.0 SP4+ and Windows 2000 offer SSL through their security APIs. You can find sample code to show how these mechanisms work in the Win32 Platform SDK. The SSL samples are underneath the Platform SDK directory in the "Samples\WinBase\Security\SSL" subdirectory.

Windows CE has a different SSL mechanism. There is an article in MSDN that describes how to use the functionality. The article also goes into the WinInet method.

WinInet is a feature in Internet Explorer version 3 and higher that lets you use some of Internet Explorer's networking functionality in your own programs. Since IE3 is on Windows 95 OSR2 and newer, and Microsoft is doing lots to make sure IE remains part of Windows (tic), this may be a reasonable option for you. The main disadvantages of WinInet's SSL feature are that it only works with HTTP, and WinInet is not very flexible. Also, 128-bit IE is not available worldwide. MS Knowledge Base article Q168151 shows how to use this feature.

3.3 - How do I get my IP address from within a Winsock program?

There are three methods, which each have advantages and disadvantages:

To get the local IP address if you already have a bound or connected socket, call getsockname() on the socket. Note that if you've just bound that socket to an address (even a generic address like INADDR_ANY), getsockname() will return that address.
To get your address without opening a socket first, do a gethostbyname() on the value gethostname() returns. This will return a list of all the host's interfaces, as shown in this example. (See the example page for problems with the method.)
The third method only works on Winsock 2. The new WSAIoctl() API supports the SIO_GET_INTERFACE_LIST option, and one of the bits of information returned is the addresses of each of the network interfaces in the system. [C++ Example] (Again, see the example page for caveats.)

3.4 - What's the proper way to impose a packet scheme on a stream protocol like TCP?

The two most common methods are delimiters and length-prefixing.

An example of delimiters is separating packets with, say, a caret (^). Naturally your delimiter must never occur in regular data, or you must have some way of "escaping" delimiter characters.

An example of length-prefixing is prepending a two-byte integer containing the packet length on every packet. See the FAQ article How to Use TCP Effectively for the proper way to send integers over the network.

There are hybrid methods, too. The HTTP protocol, for example, separates header lines with CRLF pairs (a kind of delimiting), but when an HTTP reply contains a block of binary data, the sever also sends the Content-length header before it sends the data, which is a kind of length-prefixing.

I favor simple length-prefixing, because as soon as you read the length prefix, you know how many more bytes to expect. By contrast, delimiters require that you blindly read until you find the end of the packet.

3.5 - How do I write my program to work through a firewall?

The SOCKS protocol allows a client program on a protected internal network to ask the firewall to act as a relay between it and a host outside the firewall. SOCKS version 4 is the "basic" protocol: most programs don't need anything more complicated. SOCKS version 5 adds UDP support, end-to-end encryption, and secure logins. Beware that even if your program only needs SOCKS v4 features, some firewalls require that you use a SOCKS5-compatible login to get through.

You can find out more about SOCKS via the following links:

NEC's SOCKS site NEC offers the reference SOCKS implementation (free), a lot of good info, a free "SOCKSifier" for Windows, and some commercial products.
Stardust's SOCKS Technical Resources Lots of good links.
Stardust's SOCKS: The Border Service Enabler technical backgrounder This looks like a good introduction to the SOCKS protocol.

NEC's SOCKSifier (called SOCKSCap) allows almost any Winsock program to run over a SOCKS firewall with no changes to the application. (There are other vendors that offer SOCKSifiers, and a few non-Microsoft Winsock stacks are SOCKSified.) The problem with SOCKSCap is that your users must set up SOCKSCap by hand, and they must run your program through the SOCKSifier. It's more user-friendly to implement the SOCKS protocol within your program.

There is some BSD sockets client-side SOCKS code at the NEC site that you can easily modify to work with Winsock. This is included as part of the Unix SOCKS server package as a library that the included utility programs (rtelnet, rping, etc.) all link to.

SOCKS inherently has trouble with some protocols because of the way they operate. Examples of this are RealAudio and almost any "multiuser" protocol, such as multiplayer online games and conferencing programs. These programs require some kind of intelligent proxy at the firewall that understands the protocol. Unfortunately, these types of proxies are only available for well-established protocols like RealAudio.

3.6 - What if the firewall does not support SOCKS?

Technical people often look at a firewall as a technical solution to a technical problem. In fact, a firewall is a political construct. Like a country's border, it exists to protect an organization's turf from invaders. The permeability of a network firewall depends on the paranoia level of the people who control the firewall: there's a lot to be paranoid about in today's Internet.

Many firewalls are set up to block all outbound connections except to a very few ports. For example, the firewall might be set up to allow outbound connections on ports 80, 21 and 25 only (web, FTP and outbound email). If your application uses a port that the firewall blocks, you will have to persuade the network administrators to open up that port on their firewall. Network admins are often very territorial about their firewalls. Getting your way will require a certain amount of political finesse.

The administrators at such a site have decided that they'd rather open ports on a case-by-case basis than block ports as new security risks get discovered. You have to convince the admins to open up your port. I suggest that you prepare a written argument explaining what your program does, why it does it, and an analysis of potential risks. Don't fudge on the argument: be completely honest and open. Present the benefits of the application separately; you're not trying to convince the admins to use the program, just to show that the application is useful to users on their network. Try to get some of those users to speak with the network admins directly to help you press your case.

Why a "written" argument? Partly because it shows professionalism, partly because you might not be able to speak with the admins directly, and partly because it forces you to clarify and organize your thoughts. If you just call the admin up on the phone and make up your arguments as you speak with him, you may just come across as argumentative. Even if you do end up speaking with the administrator directly, having a written argument to refer to will help you to make your case.

Some people suggest a different tactic: change your program so that it sends its packets out on a port that's open on even the most restrictive firewalls. Because some firewalls examine the contents of packets to make sure they're legitimate, some people even suggest encapsulating your data in the protocol that usually runs on that port. For example, you could send your data out on port 80, and put your program data in the payload area of HTTP packets.

To extend the "politics" analogy, this is the "smuggling" tactic. Just like smuggling in the real world, it's likely to get you into trouble. You won't go to jail for it, but you might generate a lot of antagonism toward your program; network admins have been known to band together and ban programs that annoy them.

3.7 - I'm writing a server. What's a good network port to use?

If you're writing a server for an existing, popular Internet protocol, it's already got a port number assigned to it. You can find the most common of these numbers at the website for the Internet Assigned Numbers Authority (IANA).

If you're writing a server for a new protocol, there are a few rules and suggestions you should obey when choosing your server's port:

Ports 1-1023 are off-limits to people inventing new protocols. They are reserved by the IANA for new "standard" protocols. Important protocols like POP3 and HTTP have low numbers (110 and 80, respectively), but your new K-RAD game server shouldn't. Note that id Software is going to Hell for using port 666 with their DOOM network server. They cleaned up their act with Quake, though.
Ports 1024 through 49151 are Registered Ports, which are a good range to choose your ports from. Just beware that the entire world is choosing from ports in this range, so it may make sense for you to register your port, or at least check the current list of assigned ports. Just be aware that no one is obligated to check that list before they make up their app's port number.
Ports 49152 through 65535 are Dynamic Ports, meaning that operating systems use ports in this range when choosing random ports. (The FTP protocol, for example, uses random ports in the data transfer phase.) This is a poor range to choose ports from, because there's a fairly decent chance that your program and the OS will fight over a given port eventually.
Many OSes pick local ports for client programs from the 1024-5000 range. You would do well to pick server ports higher than 5000, but this is not as rigid a rule as the previous ones.
Within the "safe" 5000-49151 range, there are many numbers the IANA shows as unregistered. Of these, you should avoid port numbers with patterns to them, or a widely-recognized meaning. People tend to pick these since they're easy to remember, but this increases the chances of a collision. Ports 6969, 5150 and 22222 are bad choices, for example.

You should also give some thought to making your program's port configurable, in case your program is run on a machine where another server is already using that port. One way to do this is through Winsock's getservbyname() function: if that function returns a port number, use that, otherwise use the default port number. Then users can change your program's port by editing the SERVICES file, located in %WINSYSDIR%\DRIVERS\ETC on Windows NT/2000 systems and c:\Windows\ on Win9x machines.

3.8 - What is UDP? What are its limitations?

The User Datagram Protocol is part of the TCP/IP protocol suite; it is an alternative to TCP. ("TCP/IP" includes UDP, but it can also mean "TCP over IP", so in some discussions you see the term "UDP/IP".) Winsock gives you a UDP socket when you pass SOCK_DGRAM as the second argument to socket().

UDP is an "unreliable" protocol: the stack does not make any effort to handle lost, duplicated, or out-of-order packets. UDP packets are checked for corruption, but a corrupt UDP packet is simply dropped silently.

The stack will fragment a UDP datagram when it's larger than the network's MTU. The remote peer's stack will reassemble the complete datagram from the fragments before it delivers it to the receiving application. If a fragment is missing or corrupted, the whole datagram is thrown away. This makes large datagrams impractical: an 8K UDP datagram will be broken into 6 fragments when sent over Ethernet, for example. If any of those 6 fragments is lost or corrupted, the stack throws away the entire 8K datagram.

Datagram loss can also occur within the stack at the sender or the receiver, usually due to lack of buffer space. It is even possible for two communicating programs running on the same machine to have data loss if they use UDP. (This actually happens on Windows under high load conditions, because it starts dropping datagrams when the stack buffers get full.) This limits UDP's value as a local IPC mechanism.

If any of these types of loss occur, no notification will be sent to the sender or receiver, even if the loss happens within the network stack.

Duplicated datagrams are not dropped: they are delivered to the receiver. It is up to the application to detect this problem, and it is the program's choice what to do with the duplicate datagram.

UDP datagrams can be delivered in any order. Datagrams often get reordered on the network when two datagrams get delivered via different routes, and the second datagram's route happens to be quicker.

3.9 - What is UDP good for?

From the above discussion, UDP looks pretty useless, right? Well, it does have a few advantages over reliable protocols like TCP:

UDP is a slimmer protocol: its protocol header is fixed at 8 bytes, whereas TCP's is 20 bytes at minimum and can be more.
UDP has no congestion control and no data coalescing. This eliminates the delays caused by the delayed ACK and Nagle algorithms. (This is also a disadvantage in many situations, of course.)
There is less code in the UDP section of the stack than the TCP section. This means that there is less latency between a packet arriving at the network card and being delivered to the application.
Only UDP packets can be broadcast or multicast.

This makes UDP good for applications where timeliness and control is more important than reliability. Also, some applications are inherently tolerant of UDP problems: data loss in a streaming video program just means a frame or two is dropped.

Be careful not to let UDP's advantages blind you to its bad points: too many application writers have started with UDP, and then later been forced to add reliability features. When considering UDP, ask yourself whether it would be better to use TCP from the start than to reinvent it. Or, whether you need to consider something like RTP (RFC 1889) to handle reliable broadcasts for you.

3.10 - How do I send a broadcast packet?

With the UDP protocol you can send a packet so that all workstations on the network will see it. (TCP doesn't allow broadcasting.)

To send broadcast packets, you must first enable the SO_BROADCAST option with the setsockopt() function. Next you have to figure out the "directed broadcast" address, which means "send this packet to all stations on this LAN". To construct the directed broadcast address, use the following C code:

                u_long host_addr = inet_addr("172.16.77.88");   // local IP addr
                u_long net_mask = inet_addr("255.255.224.0");   // LAN netmask
                u_long net_addr = host_addr & net_mask;         // 172.16.64.0
                u_long dir_bcast_addr = net_addr | (~net_mask); // 172.16.95.255

Potential Problems: Broadcasts can be useful at times, but keep in mind that this creates a load on all the machines on the network, even on machines that aren't listening for the packet. This is because the part of the stack that can reject the packet is several layers down. As a result, most routers drop simple broadcast packets, and sometimes even drop directed broadcasts to nearby networks. (A simple broadcast is one sent to address 255.255.255.255.) The practical upshot of this is that sometimes broadcasts won't work at all, and even when they do work they cause unnecessary loads on the network. To get around these problems, you may want to consider multicasting instead.

3.11 - Is Winsock thread-safe?

The Winsock specification does not mandate that a Winsock implementation be thread-safe, but it does allow an implementor to create a thread-safe version of Winsock.

Bob Quinn says, on this subject:

"WinSock, any implementation, is thread safe if the WinSock implementation developer makes it so (it doesn't just happen)."
"I don't know of any implementations from Microsoft (or any other vendors) that are not thread safe."
"If a WinSock application developer creates a multi-threaded application that shares sockets among the threads, it is that developer's responsibility to synchronize activities between the threads."

By "synchronize activities", I believe Bob means that it may cause problems if, for example, two threads repeatedly call send() on the same socket. There is no guarantee in the Winsock specification about how the data will be interleaved in this situation. Similarly, if one thread calls closesocket() on a socket, it must somehow signal other threads using that socket that the socket is now invalid.

Anecdotal evidence suggests that one thread calling send() and another thread calling recv() on a single socket is safe, but I have not tested this. Hard information, demonstration code and/or more anecdotal evidence either way would be appreciated.

Instead of multiple threads accessing a single socket, you may want to consider setting up a pair of network I/O queues. Then, give one thread sole ownership of the socket: this thread sends data from one I/O queue and enqueues received data on the other. Then other threads can access the queues (with suitable synchronization).

Applications that use some kind of non-synchronous socket typically have some I/O queue already. Of particular interest in this case is overlapped I/O or I/O completion ports, because these I/O strategies are also thread-friendly. You can tell Winsock about several OVERLAPPED blocks, and Winsock will finish sending one before it moves on to the next. This means you can keep a chain of these OVERLAPPED blocks, each perhaps added to the chain by a different thread. Each thread can also call WSASend() on the block they added, making your main loop simpler.

3.12 - If two threads in an application call `recv()` on a socket, will they each get the same data?

No. Winsock does not duplicate data among threads.

Note that if you do call recv() at the same time on a single socket from two different threads, havoc may result. See the previous question for more info.

3.13 - Is there any way for two threads to be notified when something happens on a socket?

No. If two threads call WSAAsyncSelect() on a single socket, only the thread that made the last call to WSAAsyncSelect() will receive further notification messages. Similarly, if two threads call WSAEventSelect() on a socket, only the event object used in the last call will be signaled when an event occurs on that socket. You also can't call WSAAsyncSelect() on a socket in one thread and WSAEventSelect() on that same socket in another thread, because the calls are mutually exclusive for any single socket. Finally, you cannot reliably call select() on a single socket from two threads and get the same notifications in each, because one thread could clear or cause an event, which would change the events that the other thread sees.

3.14 - How do I detect if the modem is connected?

It is sometimes useful for a Winsock program to only do its thing if the computer is already connected to the Internet. The Remote Access Service (RAS) API gives an application access to the dial-up networking subsystem. In particular, the RasEnumConnections() call lets you easily get a list of the connected modems.

RAS/DUN is not installed on all systems. Therefore, it's safest not to link your application with the rasapi32.dll import library. Instead, use the LoadLibrary() call to check for rasapi32.dll. If it succeeds, you can use GetProcAddress() to call the RasEnumConnections() function. If, on the other hand, the LoadLibrary() call fails, you know RAS isn't installed, so there couldn't be a modem connection in the first place.

Warning: Before using RasEnumConnections() to check for an Internet connection, keep in mind that many computers are not connected to the Internet through a modem. Often, they are connected to a LAN, and that LAN is somehow gatewayed to the Internet. For these situations, using RAS will fool your program into believing that the user either has no Internet connection, or that it is never up.

3.15 - How can I get the local user name?

There are a few ways. The easiest is to use the Win32 call GetUserName(). [C++ Example].

The other way is shown in the Microsoft Knowledge Base article Q155698. It is much more complex, and it shows two completely different methods, one for Windows 9x/Windows 3.1 and one for Windows NT. Unless you need Windows 3.1 support or the LAN Manager domain name (as opposed to the DNS domain name), I suggest you give this article a miss.

3.16 - Windows 9x's Dial Up Networking keeps popping up an automatic dial window, even when it isn't necessary. Can I make it stop?

On some Windows 9x systems with more than one network interface, Dialup Networking (DUN) sometimes pops up an automatic-dial window even when it is obviously not required. An example of such a setup is a machine on a LAN that also has a modem for connecting to the Internet.

The most common trigger for the DUN dial window is a Winsock program calling the gethostbyname() function, which initiates a DNS lookup. Even if the name is that of a LAN machine and there's a DNS server on the LAN, DUN will still try to bring up the Internet link to try that first.

If you try messing with the DNS configuration of a multihomed Win9x box, it's clear that the network subsystem just isn't designed to support a local DNS server in addition to a remote one. The best solution, then, is to just use straight IP addresses, and write your programs to recognize an IP address, so they don't have to call gethostbyname().

I've heard that DUN 1.3 and/or the Winsock 2 updates fix this problem, but other reports say it doesn't help.

3.17 - I've heard that asynchronous sockets are unreliable. Is this true?

Asynchronous sockets are reliable if your program obeys the letter of the Winsock specification.

Every so often, you hear stories about a program that loses asynch notification messages. As far as I can tell, it's always due to a bug in the complainer's program, due to misunderstanding Winsock's parsimonious notification policy.

Consider the FD_WRITE notification. That only gets sent when a client's connection is accepted by the remote peer, and from then on only when output buffer space becomes available after Winsock gives you a WSAEWOULDBLOCK error. To put it another way, FD_WRITE only gets sent to say, "Before now, it was not okay to write data on this socket; now it's okay." The conservative way to handle this is to always try to send data when you have it, whether you've received an FD_WRITE or not. You might get a WSAEWOULDBLOCK error, but that's harmless and easy to handle. Your handler for FD_WRITE then just tries to send everything queued up until it sends it all or gets another WSAEWOULDBLOCK.

Win16 message queues are fixed-length and fairly short, so it is at least possible to lose notifications in 16-bit programs. If Winsock fails to send you a notification because the message queue is full, it is supposed to keep trying, but empirical evidence suggests that this does not always happen. Keep in mind that when we speak of "16-bit Winsock" we're talking about stacks from a dozen different vendors, each with many versions spanning many years.

For my own part, I've been using asynchronous sockets almost exclusively for years now with no problems. Others who've been using asynchronous notification for years longer than I have agree. If you believe you're losing notifications, you have to ask yourself whether it's more likely that we've overlooked a bug in the stack or that there's a bug in your program.

3.18 - What is the Nagle algorithm?

The Nagle algorithm is an optimization to TCP that makes the stack wait until all data is acknowledged on the connection before it sends more data. The exception is that Nagle will not cause the stack to wait for an ACK if it has enough enqueued data that it can fill a network frame. (Without this exception, the Nagle algorithm would effectively disable TCP's sliding window algorithm.) For a full description of the Nagle algorithm, see RFC 896.

So, you ask, what's the purpose of the Nagle algorithm?

The ideal case in networking is that each program always sends a full frame of data with each call to send(). That maximizes the percentage of useful program data in a packet.

The basic TCP and IPv4 headers are 20 bytes each. The worst case protocol overhead percentage, therefore, is 40/41, or 98%. Since the maximum amount of data in an Ethernet frame is 1500 bytes, the best case protocol overhead percentage is 40/1500, less than 3%.

While the Nagle algorithm is causing the stack to wait for data to be ACKed by the remote peer, the local program can make more calls to send(). Because TCP is a stream protocol, it can coalesce the data in those send() calls into a single TCP packet, increasing the percentage of useful data.

Imagine a simple Telnet program: the bulk of a Telnet conversation consists of sending one character, and receiving an echo of that character back from the remote host. Without the Nagle algorithm, this results in TCP's worst case: one byte of user data wrapped in dozens of bytes of protocol overhead. With the Nagle algorithm enabled, the TCP stack won't send that one Telnet character out until the previous characters have all been acknowledged. By then, the user may well have typed another character or two, reducing the relative protocol overhead.

This simple optimization interacts with other features of the TCP protocol suite, too:

Most stacks implement the delayed ACK algorithm: this causes the remote stack to delay ACKs under certain circumstances, which allows the local stack a bit of time to "Nagle" some more bytes into a single packet.
The Nagle algorithm tends to improve the percentage of useful data in packets more on slow networks than on fast networks, because ACKs take longer to come back.
TCP allows an ACK packet to also contain data. If the local stack decides it needs to send out an ACK packet and the Nagle algorithm has caused data to build up in the output buffer, the enqueued data will go out along with the ACK packet.

The Nagle algorithm is on by default in Winsock, but it can be turned off on a per-socket basis with the TCP_NODELAY option of setsockopt(). This option should not be turned off except in a very few situations.

Beware of depending on the Nagle algorithm too heavily. send() is a kernel function, so every call to send() takes much more time than for a regular function call. Your application should coalesce its own data as much as is practical to minimize the number of calls to send().

3.19 - When should I turn off the Nagle algorithm?

Generally, almost never.

Inexperienced Winsockers usually try disabling the Nagle algorithm when they are trying to impose some kind of packet scheme on a TCP data stream. That is, they want to be able to send, say, two packets, one 40 bytes and the other 60, and have the receiver get a 40-byte packet followed by a separate 60-byte packet. (With the Nagle algorithm enabled, TCP will often coalesce these two packets into a single 100 byte packet.) Unfortunately, this is futile, for the following reasons:

Even if the sender manages to send its packets individually, the receiving TCP/IP stack may still coalesce the received packets into a single packet. This can happen any time the sender can send data faster than the receiver can deal with it.
Winsock Layered Service Providers (LSPs) may coalesce or fragment stream data, especially LSPs that modify the data as it passes.
Turning off the Nagle algorithm in a client program will not affect the way that the server sends packets, and vice versa.
Routers and other intermediaries on the network can fragment packets, and there is no guarantee of "proper" reassembly with stream protocols.
If packet arrives that is larger than the available space in the stack's buffers, it may fragment a packet, queuing up as many bytes as it has buffer space for and discarding the rest. (The remote peer will resend the remaining data later.)
Winsock is not required to give you all the data it has queued on a socket even if your recv() call gave Winsock enough buffer space. It may require several calls to get all the data queued on a socket.

Aside from these problems, disabling the Nagle algorithm almost always causes a program's throughput to degrade. The only time you should disable the algorithm is when some other consideration, such as packet timing, is more important than throughput.

Often, programs that deal with real-time user input will disable the Nagle algorithm to achieve the snappiest possible response, at the expense of network bandwidth. Two examples are X Windows servers and multiplayer network games. In these cases, it is more important that there be as little delay between packets as possible than it is to conserve network bandwidth.

For more on this topic, see the Lame List and the FAQ article How to Use TCP Effectively.

3.20 - What is TCP's sliding window?

In a naïve implementation of TCP, every packet is immediately acknowledged with an ACK packet. Until the ACK arrives from the receiver (in this naïve implementation, at any rate), the sender does not send another packet. If the ACK does not arrive within some particular time frame, the sending stack retransmits the packet.

The problem with this is that all that waiting limits network throughput drastically. The minimum time between packets with such a scheme must be at least twice the minimum round trip time for that network, for the time to send the packet and for the time for the receiver to send back an ACK. Add in processing time on each end, temporary hardware faults (e.g. Ethernet collisions), retransmissions, routing delays, and who knows what else: the stacks end up spending more time waiting for ACKs than sending data. This is a problem because it means you can't effectively fill a network pipe with a single socket.

The TCP window fixes this. It allows a sender to have several unacknowledged packets "in flight" at a time. When the TCP connection is established, the stacks tell each other how much buffer space they've allocated for this connection: this is the maximum window size. As the window fills up, the receiver sends the new size of the window back to the sender in an ACK packet. This tells the sender when it cannot send any more data without overflowing the receiver's window. When the sender sees that the receiver's window is full, it stops sending data until it gets an ACK saying that space has become available in the window.

"Why is it called a sliding window," you ask? Imagine a TCP data stream as a long line of bytes. The sliding window is how the sender sees the receiver's buffer: as a fixed-size "window" sliding along the stream of bytes. One edge of the window is between the last byte the receiver has read and the next byte to be read, and the other edge is between the last byte in the receiver's input buffer and the first byte to be sent from the sender's output buffer. As the receiver reads bytes out of the network buffers, the window slides down the stream; any time it slides into the sender's buffer, the sender sends more data to fill up the window.

See the next two items for related discussion.

3.21 - What is the silly window syndrome?

The silly window syndrome results when the sender can send data faster than the reciever can handle it, and the receiver calls recv() with very small buffer sizes.

The fast sender will quickly fill the receiver's TCP window. The receiver then reads N bytes, N being a relatively small number compared to the network frame size. A naïve stack will immediately send an ACK to the sender to tell it that there are now N bytes available in its TCP window. This will cause the sender to send N bytes of data; since N is smaller than the frame size, there's relatively more protocol overhead in the packet compared to a full frame. Because the receiver is slow (and, in fact, stupid for calling recv() with small buffer sizes) the TCP window stays very small, and thus hurts throughput because the ratio of protocol overhead to application data goes up.

The solution to this problem is the delayed ACK algorithm. This causes the window advertisement ACK to be delayed a bit, hopefully allowing the slow receiver to read more of the enqueued data before the ACK goes out. This results in a larger window advertisement, so the fast sender can send more data in a single frame.

3.22 - What is the delayed ACK algorithm?

In a simpleminded implementation of TCP, every data packet that comes in is immediately acknowledged with an ACK packet. (ACKs help to provide the reliability TCP promises.)

In modern stacks, ACKs are delayed for a short time (up to 200ms, typically) for three reasons: a) to avoid the silly window syndrome; b) to allow ACKs to piggyback on a reply frame if one is ready to go when the stack decides to do the ACK; and c) to allow the stack to send one ACK for several frames, if those frames arrive within the delay period.

The stack is only allowed to delay ACKs for up to 2 frames of data.

3.23 - What platform should I deploy my server on?

Your only real choice is Windows NT/2000 Server. It has been shown that Windows NT Workstation uses an identical kernel to NT Server. Presumably this is still the case with Windows 2000. However, at startup time, NT Workstation's kernel cripples itself with respect to NT Server's run-time behavior.

There are some minor tuning differences that make NT Workstation and Server more responsive for different tasks, but the important difference is that NT Workstation's connection backlog is limited to 5 slots. This means that your program has to call accept() fast enough that not more than 5 connections build up in the network stack's connection backlog. The stack rejects new connections as long as the queue is full. For a well-written server, this is not normally a problem, but it does mean that a concerted attack (a SYN flood, for example) can fill the queue, denying service to legitimate users. NT Server when the dynamic backlog feature is enabled has an effectively unlimited backlog queue size, in order to better withstand SYN attacks.

The difference Microsoft wants you to hear about is the one in the license: you cannot run a server on NT Workstation that accepts more than 10 connections at once. The kernel does not currently enforce this limit, but they did during the NT 4.0 beta cycle. (Public outcry caused Microsoft to remove the kernel enforcement.) Microsoft may well try to enforce the limit on future products. Microsoft has been known to put such limits in add-on products like SQL Server, which won't even run on NT Workstation.

The other alternatives for servers Windows 95/98/ME are also fatally limited. They share the 5-slot backlog limit of NT Workstation, for one thing. More importantly, their kernels are obviously inferior. It's trivial to objectively prove this: just set up a basic server (any of the ones in the FAQ's basic Winsock examples section will do fine) and time the connection accepting speed and the data rates. Then do the same with an NT Workstation or Server machine on the same network with the same client. A server running on a Win9x kernel feels intentionally crippled compared to the same program running on Windows NT.

Win9x has other problems as a server. The most obvious is its lack of stability. Also, Win9x does not have in-kernel support for overlapped I/O, and I/O completion ports are totally missing: these features are required to maximize network bandwidth in high-load situations. As if you needed more evidence, Win9x has several problems when given more than one network card, a common technique for increasing throughput with cheap network technologies. (This is as opposed to putting in a single ATM, gigabit Ethernet or FDDI card.)

<< Information for New Winsockers	Advanced Winsock Issues >>
Last modified on 29 June 2000 at 16:44 UTC-7	Please send corrections to tangent@cyberport.com.

< Go to the main FAQ page

<< Go to my Programming pages

<<< Go to my Home Page