True story --> I've had deadlocks in printf! I had to not use printf in parts of the code! True story --> I've had "weird" program terminations in popen! I rewrote popen so it would call the execve system call in Linux using intel ASM! I then had the same problem with the system call! I rewrote system using ASM also! popen/system/exec/stdio have races in them when mixing threading and forking under linuxthreads (posix supports this btw). True story --> The BSS ended up with the malloc atfork_handler's using linuxthreads! glibc malloc has a race in it for malloc initializion when using threads. True story --> This one SMP box, was using 0x80000000 for the user stack base under intel. Anyway, no problem, except that the code we were using mmap's at 0x77700000! Note that linuxthreads didnt support guardpages on thread stacks at that time too. True story --> OK. It's bad practice, but who says you cant memcpy a pthread_mutex_t? Apparently, Linuxthreads once every 1000 run's! Ever see a segfault on a spinlock? True story --> select returns yes, read NEVER returns! That's what you get for running SMP on Linux 2.2. At least it was reproduceable (after about a year of running fine). True story --> IDS must be secure right!? Not if they use a certain packet capture library in certain version's, which was able to crash our code on an occasion or two! Not to mention the fact, that it's not thread safe.. True story --> Ever see a segfault in a __libc_nanosleep? True story --> Debugging tools are nice arent they? How about with multithreaded programs? hmm.. guess you cant use mcheck in glibc or Electric Fence without a custom patch or two. True story --> Electric Fence is so nice once its working.. However, if the Linux kernel decides to make the number of allowable vm_area_struct's not related to the possible virtual memory, its not exactly "good" lets say. True story --> A popular network mapping tool must use good code right? If 7 meg of memory allocation is nice, and almost none of it free'd, I guess it is good code! How many people remember that when you do something like socklen_t fromlen; foo(&fromlen), fromlen actually has to be initialized? Anyway, socklen_t is another story anyway.. Lots of functions use this type of convention in *BSD sockets. This doesnt really matter anyway if you want your code to work 100% of the time. 100% one day, 0% the next day, and cycling back is good enough for me I say. memset(&sin, 0, sizeof(sin)); is useless right and will work if you set up sin_addr/sin_family/sin_port only? Depends on which OS you use dear chap. Does anyone really follow all those silly C99 standards anyway? Just dont assume snprintf return values stay the same between different versions of glibc.. Lets not even talk about signals and posix threads in Linux. OK. For a couple weeks, we've been getting a HUGE number of crashes, with most of them pointing to some problem handling SIGPIPE (we have lots of coredumps with something like 70,000 stack frames in each backtrace, in of all things a printf.. we sometimes have a few deadlocks writing to stdout also ;-). Probelm 1 is the number of SIGPIPES we were producing. This came from the fact that when we were terminating a program, we didnt have the correct permissions to send a signal to the process group, but we were able to kill our own process, which terminated an ssh connection where the program we wanted to terminate was connected too. Problem 2, is that we were using printf and snprintf in our signal handlers! Yes.. A non re-entrant function, but seriously, 30 core dumps and hangs a day because of a f*cking printf? ;-) BTW, want to find some more signal races in system level code also? Look at linuxthreads and tell me what happens if another signal is delivered during the signal handler wrapper in glibc. Yes.. these races are everywhere, in basically every bit of code in the world.. waiting to crash ;-) --> The next debugging session is from one night of debugging. And BTW, it's Sunday night and the we were meant to have a build on Friday so we could release on Monday (tomorrow)! Debugging buggy debuggers to debug buggy code is fun! --> ----------------------------------------------------- DEBUG:10251:Using default handler on port 31792 DEBUG:10251:DEBUG:decrease_delay: current timer == { 0, 655000 } Error accessing memory address 0x4032c8c0: No such process. (gdb) where warning: Couldn't get registers warning: Couldn't get registers #0 0x40038dcb in __sigsuspend (set=Error accessing memory address 0xbe9ffa18: No such process. ) at ../sysdeps/unix/sysv/linux/sigsuspend.c:48 Error accessing memory address 0x40038d8c: No such process. (gdb) info thread Segmentation fault [root@scan1 bin]# Backtraces point you to the buggy code immediately --> -------------------------------------------------- Interface User Mode Idle Peer Address ] DEBUG:4101:new info created [Switching to Thread 23850] Program received signal SIGSEGV, Segmentation fault. 0x0 in ?? () (gdb) where #0 0x0 in ?? () (gdb) Another _really_ useful backtrace (this one aint so bad though) --> --------------------------------------------------------------- (gdb) where #0 0x400a2f5d in time () from /lib/libc.so.6 #1 0x42ce1f2b in timeout (timez=1015832833, i=6) at tcp_scanner.c:189 #2 0x42ce2774 in mad_scan (mthread=0x446e2fe4, mad=0xbe3ffcc0) at tcp_scanner.c:535 . . (gdb) x/i $pc 0x400a2f5d : mov %edx,%ebx (gdb)