Winsock Programmer's FAQ: Information for New Winsockers

Winsock Programmer's FAQ
Section 2: Information for New Winsockers

2.1 - Are there any sample apps on the Net?

Yes. There are several listed on the Resources page, and the FAQ's Examples section has several more. If you're just getting started with Winsock, you may be especially interested in these samples.

2.2 - Do I need to initialize the WSAData structure before calling WSAStartup?

No, WSAStartup() fills this structure in for you.

2.3 - I'm getting link errors when compiling Winsock programs. What's wrong?

You're most likely not linking with the proper Winsock import library. For 16-bit Windows systems, this is winsock.lib. For 32-bit Windows systems using Winsock 1.1 functions only, it is wsock32.lib. And for programs needing Winsock 2 support, you need to link with ws2_32.lib.

2.4 - If I write a Winsock program, will I be able to communicate with a Unix sockets program?

Absolutely! This common question is the result of confusing protocols with the APIs. Communicating programs need not have been created with the same APIs, as long as they are using the same transport and network protocols.

Before dealing with cross-platform networking, please read the FAQ article "How to Use TCP Effectively". It covers several issues that bite cross-platform programs, like structure padding and data representation.

2.5 - Can I use Winsock with { My Favorite Language }?

Most programming languages these days have some way of accessing Winsock, but Winsock is rarely used directly except from C or C++. There are several reasons for this.

Reason 1: Some languages simply lack the language features to call the Winsock API. Your language needs the following to fully use the Winsock API:

Pointers. (The ability to access a specific piece of memory by its address.)
Bitwise operators. (The ability to change specific bits in a byte.)
Structures or records. (The ability to define a block of memory that is an aggregate of simple data elements, such as two characters followed by a 16-bit integer. This feature must also allow some measure of control as to how the data is laid out in memory.)

Reason 2: Many languages rely on some form of component architecture (e.g. ActiveX) to provide outside services like network access. Often the language environment comes with basic networking components, sufficient for most tasks. If your tool didn't come with the necessary compnents, or the ones it does come with aren't powerful enough, you may be able to find the functionality you need in a third-party library, rather than writing the necessary Winsock code yourself.

Reason 3: Many newer languages especially cross-platform scripting languages include language support for networking. (Examples include Java, Perl, Python and Tcl.) From the programmer's point of view, Winsock is rarely a concern when working in these languages.

For these reasons and others, this FAQ is biased towards C++.

If your language allows direct access to the Winsock API, you may be able to translate the C++ code in the FAQ into equivalent code in your chosen language. However, I recommend that you look for sample code in your chosen language via the Web Pages section of the FAQ, so you can study working code before you begin translating.

2.6 - Are there any tools available for debugging Winsock programs?

There are two categories of debugging tools: network analyzers (colloquially known as "sniffers") and Winsock shims.

Sniffers are usually software packages that run on one of the LAN's workstations and, due to the way typical LANs work, capture all of the traffic going over the LAN. Good sniffers will also decode that traffic by various degrees. One advantage of a sniffer is that it literally sees everything about the conversation, including low level protocol details that aren't available from the Winsock layer. Another is that the good ones are extremely powerful and configurable. For example, some allow you to write "protocol plugins" that will decode any protocol (such as a custom protocol that you've developed).

The disadvantages of sniffers are several:

Software critics call a sniffer "inexpensive" when it costs less than $1000, and they positively gush when they cost less than $500. This is because hardware sniffers which are still around for the really tough problems are about an order of magnitude more expensive.
There are a few truly inexpensive sniffers, but settling for cheap or free necessarily limits your choices, and you generally end up giving up features.
Some sniffers can't sniff conversations going over PPP or other WAN links.
Ethernet switches are beginning to replace simple hubs, because they allow a LAN to carry more traffic at a given speed. One of the few downsides to switched Ethernet is that sniffers can basically only see the traffic directly addressed to the host running the sniffer. (Some call this a feature.) This doesn't make a sniffer totally useless, but it does reduce its potential value.

All that aside, however, sometimes a sniffer is the only way to find out what you want to know about your program's behavior. There are plenty of free packages and plenty of demos available: it's worth it to spend a day investigating your choices. You can start in the FAQ's own Network Sniffers review section.

The other tool category is "Winsock shims." A shim sits between your program and Winsock, usually by "hooking" the Winsock API. These tools are limited to monitoring events on the Winsock layer itself, and can only monitor traffic to or from a single host. (That is, they can't see the "big picture" of simultaneous conversations between many machines, and they can't see details below your application's protocol, like the state of TCP header fields.) Their advantages are that this is usually sufficient, and that a shim will run you $150 or less. There are links to a few shims in the FAQ's Winsock Shims section.

You may also find the FAQ article Debugging TCP useful for some less-automated methods of debugging a TCP program.

Methods That Don't Work: There are a couple of debugging tools that are supposed to work that don't, or are too flaky to deal with. The first is the SO_DEBUG socket option. It simply doesn't work on Microsoft stacks. The other is the Winsock DLL debugging plugin dt_dll.dll; this method is just flaky. Bob Quinn has an article that goes into the details.

2.7 - How do I get a readable error message from a Winsock error number?

The problem with this question is that it assumes that there is a "good" canned error message for every situation. The reality is that many times, you need to know the program's context before you can turn an error value into a meaningful error message. For example, WSAEFAULT can mean "Bad pointer passed," or "Passed buffer too small," or even "That version of the API is not supported." Since the Winsock spec documents the most likely error values that each function will return, you should use this information to construct intelligent error handlers.

Still, sometimes an API call returns something unexpected, so a cryptic error message is better than none at all. In that case, you can just build a stringtable in your resource file mapping error numbers to error messages. There is one such RC file for the Winsock 1.1 error values available here. Alternately, the basic Winsock tutorial programs in the FAQ include a utility module (ws_util.cpp) that defines a function for translating Winsock error numbers into strings.

Note that some people will tell you that the Win32 FormatMessage() API can be coerced into returning error messages for Winsock error numbers. At best, this is undocumented behavior that only works with some implementations of Winsock. I personally have not been able to get it to work, despite significant time devoted to the problem. My advice is that you're much better off spending your time constructing meaningful error messages than chasing something that could never work very well even if it was documented behavior.

2.8 - Winsock keeps returning the error `WSAEWOULDBLOCK`. What's wrong with my program?

Not a thing. WSAEWOULDBLOCK is a perfectly normal occurrence in programs using non-blocking and asynchronous sockets. It's Winsock's way of telling your program "I can't do that right now, because I would have to block to do so."

The next question is, how do you know when it's safe to try again? In the case of asynchronous sockets, Winsock will send you an FD_WRITE message after a failed send() call when it is safe to write; it will send you an FD_READ message after a recv() call when more data arrives on that socket. Similarly, in a non-blocking sockets program that uses select(), the writefds will be set when it's okay to write, and the readfds will be set if there is data to read.

Note that Win9x has a bug where select() can fail to block on a nonblocking socket. It will signal one of the sockets, which will cause your program to call recv() or send() or similar. That function will return WSAEWOULDBLOCK, which can be quite a surprise. So, a program using select() under Win9x has to be able to deal with this error at any time.

This gets to a larger issue: whenever you use some form of nonblocking sockets, you have to be prepared for WSAEWOULDBLOCK at any time. It's simply a matter of defensive programming, just like checking for null pointers.

2.9 - How can I test my Winsock application without setting up a network?

There is a special address called the loopback or localhost address, 127.0.0.1. This lets two programs running on a single machine talk to each other. The server usually listens for connections on all available interfaces, and the client connects to the localhost address. (See the Examples section for basic client and server program code.)

If you have an Internet or LAN connection on your development machine, you're already set up for this.

For machines without networks, you have to set up a "dummy" network. Windows NT/2000 has the "Microsoft Loopback Device" for this very purpose just add this in the network control panel, and you'll be able to use the loopback address.

For Windows 9x, you can try installing Dial Up Networking and pointing it at an unused serial port. This can be quirky, but it's possible to limp by with this method. The main problems are when Dialup Networking (DUN) decides it needs to dial the modem, and finds that there is no modem on the port you chose. To minimize this problem, never use name lookup calls like gethostbyname() and turn off DUN's "automatic dial" feature.

Be warned: behavior through the loopback interface may well be different from behavior on a network, if only because conditions are much simpler within a single machine than over a LAN or WAN. You should try to test your application on a real network, even if you do primary development on a single machine.

2.10 - What's the proper way to close a TCP socket?

The proper sequence for closing a TCP connection is:

Finish sending data.
Call shutdown() with the how parameter set to 1.
Loop on recv() until it returns 0.
Call closesocket().

Skipping the first and third steps above can cause data loss.

Nonblocking or asynchronous sockets complicate the first and third steps. You can either build "finish sending/receiving" logic into your normal I/O loop, or you can temporarily put the socket in blocking mode and do the last bits of I/O that way. The proper choice depends on your program's architecture and requirements.

2.11 - Is it possible to close the connection "abnormally"?

Sure, but it's an evil thing to do. :) The simplest way is to set the SO_LINGER flag to 0 with the setsockopt() call before you call closesocket(). Another method is to call shutdown() with the how parameter set to 2 ("both directions"), possibly followed by a closesocket() call.

"Slamming the connection shut" is only justifiable in a very small number of cases. You must have fairly deep knowledge of the way TCP works before you can properly decide to use this technique. Generally, the perceived need to slam the connection shut comes from a broken program, either yours or the remote peer. I recommend that you try to fix the broken program so you don't have to resort to such a questionable technique.

2.12 - How do I detect when my TCP connection is closed?

All of the I/O strategies discussed in the I/O strategies article have some way of indicating that the connection is closed.

First, keep in mind that TCP is a full-duplex network protocol. That means that you can close the connection half-way and still send data on the other half. An example is a web browser: it sends a short request to the web server, then closes its half of the connection. The web server then sends back the requested data on the other half of the connection, and closes its sending side, which terminates the TCP session.

Normal TCP programs only close the sending half, which the remote peer perceives as the receiving half. So, what you normally want to detect is whether the remote peer closed its sending half, meaning you won't be receiving data from them any more.

With asynchronous sockets, Winsock sends you an FD_CLOSE message when the connection drops. Event objects are similar: the system signals the event object with an FD_CLOSE notification.

With blocking and non-blocking sockets, you probably have a loop that calls recv() on that socket. recv() returns 0 when the remote peer closes the connection. As you would expect, if you are using select(), the SOCKET descriptor in the read_fds parameter gets set when the connection drops. As normal, you'll call recv() and see the 0 return value.

As you might have guessed from the discussion above, it is also possible to close the receiving half of the connection. If the remote peer then tries to send you data, the stack will drop that data on the floor and send a TCP RST to the remote peer.

See below for information on handling abnormal disconnects.

2.13 - How do I detect an abnormal network disconnect?

The previous question deals with detecting when a protocol connection is dropped normally, but what if you want to detect other problems, like unplugged network cables or crashed workstations? In these cases, the failure prevents notifying the remote peer that something is wrong. My feeling is that this is usually a feature, because the broken component might get fixed before anyone notices, so why force everyone to restart?

If you have a situation where you must be able to detect all network failures, you have two options:

The first option is to give the protocol a command/response structure: one host sends a command and expects a prompt response from the other host when the command is received or acted upon. If the response does not arrive, the connection is assumed to be dead, or at least faulty.

The second option is to add an "echo" function to your protocol, where one host (usually the client) is expected to periodically send out an "are you still there?" packet to the other host, which it must promptly acknowledge. If the echo-sending host doesn't receive its response or the receiving host fails to see an echo request for a certain period of time, the program can assume that the connection is bad or the remote host has gone down.

If you choose the "echo" alternative, avoid the temptation to use the ICMP "ping" facility for this. If you did it this way, you would have to send pings from both sides, because Microsoft stacks won't let you see the other side's echo requests, only responses to your own echo requests. Another problem with ping is that it's outside your protocol, so it won't detect a failed TCP connection if the hardware connection remains viable. A final problem with the ping technique is that ICMP is an unreliable protocol: does it make a whole lot of sense to use an unreliable protocol to add an assurance of reliability to another protocol?

Another option you should not bother with is the TCP keepalive mechanism. This is a way to tell the stack to send a packet out over the connection at specific intervals whether there's real data to send or not. If the remote host is up, it will send back a similar reply packet. If the TCP connection is no longer valid (e.g. the remote host has rebooted since the last keepalive), the remote host will send back a reset packet, killing the local host's connection. If the remote host is down, the local host's TCP stack will time out waiting for the reply and kill the connection.

There are two problems with keepalives:

Only Windows 2000 allows you to change the keepalive time on a per-process basis. On older versions of Windows, changing the keepalive time changes it for all applications on the machine that use keepalives. (Changing the keepalive time is almost a necessity since the default is 2 hours.)
Each keepalive packet is 40 bytes of more-or-less useless data, and there's one sent each direction as long as the connection remains valid. Contrast this with a command/response type of protocol, where there is effectively no useless data: all packets are meaningful. In fairness, however, TCP keepalives are less wasteful on Windows 2000 than the "are you still there" strategy above.

Note that different types of networks handle physical disconnection differently. Ethernet, for example, establishes no link-level connection, so if you unplug the network cable, a remote host can't tell that its peer is physically unable to communicate. By contrast, a dropped PPP link causes a detectable failure at the link layer, which propagates up to the Winsock layer for your program to detect.

2.14 - How can I change the timeout for a Winsock function?

Some of the blocking Winsock functions (e.g. connect()) have a timeout embedded into them. The theory behind this is that only the stack has all the information necessary to set a proper timeout. Yet, some people find that the value the stack uses is too long for their application; it can be a minute or longer.

Under Winsock 2, you can set the SO_SNDTIMEO and SO_RCVTIMEO options with setsockopt() to change the timeouts for send() and recv().

Unfortunately, the Winsock spec does not document a way to change many other timeout values, and the above advice doesn't apply to Winsock 1.1.

The solution is to avoid blocking sockets altogether. All of the non-blocking socket methods lend themselves to timeouts:

Non-blocking sockets with select() The fifth parameter to the select() function is a timeout value.
Asynchronous sockets Use the Windows API SetTimer().
Event objects In addition to the Winsock event object, have your networking code also block on a regular Win32 semaphore that is signalled by a separate thread that calls the Win32 Sleep() function.
Waitable Timers These are a new feature in Windows 98 and NT 4.0 SP3 and higher. A waitable timer is an object like a semaphore, except that the OS signals it at a future time that you specify. You create them with the Win32 function CreateWaitableTimers(). So, you could wait on a 5-second timer as well as your event objects; if nothing happens on the sockets within 5 seconds, Windows will signal the timer, thus breaking you out of the WaitForMultipleObjects() call.

Note that with asynchronous and non-blocking sockets, you may be able to avoid handling timeouts altogether. Your program continues working even while Winsock is busy. So, you can leave it up to the user to cancel an operation that's taking too long, or just let Winsock's natural timeout expire rather than taking over this functionality in your code.

2.15 - What is peeking (MSG_PEEK), and why is it bad?

Peeking is looking ahead in the TCP data stream: when you use the MSG_PEEK flag with recv(), it returns bytes from the stack's buffer without removing these from the buffer. (You can also do a form of peeking with the ioctlsocket() option FIONREAD.)

Peeking is essentially never necessary: you can always read data into your own buffers and process it there. This is good, because peeking often causes problems. Indeed, it's so problematic it's earned a place on the Lame List and in Microsoft's Knowledge Base: see article Q192599 for specific info on the problems peeking causes with their Winsock stack.

2.16 - What is out-of-band data (MSG_OOB), and why is it bad?

Out-of-band (OOB) data is like a second data channel. The intent is to use the regular TCP data stream for most data and the OOB stream for "emergency" messages. The telnet protocol uses this for "interrupt" keystrokes like Ctrl-C, so that they don't have to wait on the remote peer to handle regular TCP data before the interrupt occurs. You can send OOB data by passing the MSG_OOB flag to send() and receive it by passing MSG_OOB to recv(). You can also get OOB data by setting the SO_OOBINLINE flag with setsockopt().

OOB data is a useful concept, but unfortunately there are two conflicing interpretations of how OOB data should be handled at the stack level: the original description of OOB in the TCP protocol specification (RFC 793) was superceded by the "host requirements" spec (RFC 1122), but there are still many machines with RFC 793 OOB implementations. Section 3.5 in the Winsock 2 spec (version 2.2.2, as of this writing) discusses OOB, with details on why RFC 793 vs. RFC 1122 is a problem in section 3.5.2.

OOB also isn't a fully functional second data channel: it's rather limited. So, never use OOB except when implementing legacy protocols like telnet which demand it. You can get reliable OOB-like behavior by simply using two data connections: one for normal data, and the second for emergency data.

2.17 - If MSG_PEEK and MSG_OOB are bad, what do I pass for send() and recv()'s flags parameter?

It's perfectly valid to pass 0 for send() and recv()'s flags parameter.

<< General Winsock Information	Intermediate Winsock Issues >>
Last modified on 10 August 2000 at 03:33 UTC-7	Please send corrections to tangent@cyberport.com.

< Go to the main FAQ page

<< Go to my Programming pages

<<< Go to my Home Page