0 Replies Latest reply: Feb 6, 2013 2:51 PM by drizzay
drizzay Level 1 Level 1 (0 points)

Hi all,

 

I have been trying to track down some odd behavior in a multithreaded mac os x program in which the program gets stuck waiting for events on files forever.  I wrote a small C program that exhibits the same problem and I think that this may be causing the problem in my original program, too.

 

The test program is a tiny "web server" and you can find it here: https://github.com/AndreasVoellmy/epollbug/blob/master/kqueueserver.c

 

The test program simply creates NUM_WORKER+1 threads where 1 thread runs a loop to accept new connections and the other NUM_WORKER "worker" threads actually serve the connections. Note that the accept thread distributes the accepted TCP connections among the different workers and the workers serve disjoint sets of connections.  Each worker thread runs a loop that uses kevent64 to wait for any of its sockets to be readable and uses the EV_ONESHOT flag when adding sockets to the watched files.

 

I test the program using the http server benchmarking program weighttp (http://redmine.lighttpd.net/projects/weighttp/wiki). I run it like this:

 

./weighttp -n 100 -c 10 -t 4 -k "10.12.0.8:8080"

 

where the IP address and port are where my test program are runnning. -n is the number of requests to make, -c is the number of concurrent requests to make and -t is the number of threads that weighttp will use to generate the http requests. weighttp makes the requests but gets stuck with some responses never coming back. For example, on a run with the above parameters, I get:

 

weighttp - a lightweight and simple webserver benchmarking tool

 

 

starting benchmark...

spawning thread #1: 3 concurrent requests, 25 total requests

spawning thread #2: 3 concurrent requests, 25 total requests

spawning thread #3: 2 concurrent requests, 25 total requests

spawning thread #4: 2 concurrent requests, 25 total requests

progress:   8% done

progress:  16% done

progress:  24% done

progress:  32% done

progress:  40% done

progress:  48% done

progress:  56% done

progress:  64% done

progress:  72% done

progress:  80% done

progress:  88% done

progress:  96% done

 

 

finished in 728 sec, 373 millisec and 310 microsec, 0 req/s, 0 kbyte/s

requests: 100 total, 100 started, 100 done, 95 succeeded, 5 failed, 0 errored

status codes: 95 2xx, 0 3xx, 0 4xx, 0 5xx

traffic: 30685 bytes total, 16340 bytes http, 14345 bytes data

 

In other words, 5 requests failed to get responses. On the test program (i.e. server) side, after 10 seconds, the test program prints out (using ioctl) which sockets have any data ready. Sure enough, I see that 5 sockets have a request waiting (weighttp requests are 90 bytes long):

 

socket 6 has 90 bytes of data ready

socket 8 has 90 bytes of data ready

socket 10 has 90 bytes of data ready

socket 12 has 90 bytes of data ready

socket 14 has 90 bytes of data ready

 

I would really appreciate getting some advice and help on this problem. I may be misunderstanding the semantics of kqueue operations and flags, or this may be a kernel bug.  By the way, this program is based on a very similar program I wrote, which uncovered a bug in Linux, which has since been fixed (http://lkml.indiana.edu/hypermail/linux/kernel/1301.0/00065.html).  To Linux's credit, the bug only appeared using lots of cores, whereas on os x it appears to occur using just two cores and two threads!

 

Regards,

Andi

 

P.S. in case it helps I also took a "sample" of the kqueueserver program at the point that it was waiting on the sockets while the sockets have data ready. Here is the result:

 

Sampling process 17982 for 3 seconds with 1 millisecond of run time between samples

Sampling completed, processing symbols...

Analysis of sampling kqueueserver (pid 17982) every 1 millisecond

Process:         kqueueserver [17982]

Path:            /Users/andre/repos/epollbug/kqueueserver

Load Address:    0x10b22b000

Identifier:      kqueueserver

Version:         ??? (???)

Code Type:       X86-64 (Native)

Parent Process:  bash [17853]

 

Date/Time:       2013-02-06 17:22:26.909 -0500

OS Version:      Mac OS X 10.8.2 (12C3012)

Report Version:  7

 

Call graph:

    2665 Thread_568458   DispatchQueue_1: com.apple.main-thread  (serial)

    + 2665 start  (in libdyld.dylib) + 1  [0x7fff9099c7e1]

    +   2665 main  (in kqueueserver) + 85  [0x10b22cb15]

    +     2665 acceptLoop  (in kqueueserver) + 334  [0x10b22c6ee]

    +       2665 __accept  (in libsystem_kernel.dylib) + 10  [0x7fff8eff9996]

    2665 Thread_568459

    + 2665 thread_start  (in libsystem_c.dylib) + 13  [0x7fff85c63181]

    +   2665 _pthread_start  (in libsystem_c.dylib) + 327  [0x7fff85c76742]

    +     2665 workerLoop  (in kqueueserver) + 95  [0x10b22c8cf]

    +       2665 kevent64  (in libsystem_kernel.dylib) + 10  [0x7fff8effad2a]

    2665 Thread_568460

      2665 thread_start  (in libsystem_c.dylib) + 13  [0x7fff85c63181]

        2665 _pthread_start  (in libsystem_c.dylib) + 327  [0x7fff85c76742]

          2665 workerLoop  (in kqueueserver) + 95  [0x10b22c8cf]

            2665 kevent64  (in libsystem_kernel.dylib) + 10  [0x7fff8effad2a]

 

Total number in stack (recursive counted multiple, when >=5):

 

Sort by top of stack, same collapsed (when >= 5):

        kevent64  (in libsystem_kernel.dylib)        5330

        __accept  (in libsystem_kernel.dylib)        2665

 

Binary Images:

       0x10b22b000 -        0x10b22cfff +kqueueserver (??? - ???) <10550045-081D-3F86-A765-EE0D08E3F398> /Users/andreas/repos/epollbug/kqueueserver

    0x7fff6ae2b000 -     0x7fff6ae5f93f  dyld (210.2 - ???) <A40597AA-5529-3337-8C09-D8A014EB1578> /usr/lib/dyld

    0x7fff857d2000 -     0x7fff85800ff7  libsystem_m.dylib (??? - ???) <B434BE5C-25AB-3EBD-BAA7-5304B34E3441> /usr/lib/system/libsystem_m.dylib

    0x7fff859c0000 -     0x7fff859c0fff  libkeymgr.dylib (25.0.0 - compatibility 1.0.0) <CC9E3394-BE16-397F-926B-E579B60EE429> /usr/lib/system/libkeymgr.dylib

    0x7fff85ac4000 -     0x7fff85b13ff7  libcorecrypto.dylib (??? - ???) <CE0C29A3-C420-339B-ADAA-52F4683233CC> /usr/lib/system/libcorecrypto.dylib

    0x7fff85c62000 -     0x7fff85d2efe7  libsystem_c.dylib (825.25.0 - compatibility 1.0.0) <8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B> /usr/lib/system/libsystem_c.dylib

    0x7fff8614d000 -     0x7fff86158fff  libsystem_notify.dylib (98.5.0 - compatibility 1.0.0) <C49275CC-835A-3207-AFBA-8C01374927B6> /usr/lib/system/libsystem_notify.dylib

    0x7fff86159000 -     0x7fff86161fff  liblaunch.dylib (442.26.2 - compatibility 1.0.0) <2F71CAF8-6524-329E-AC56-C506658B4C0C> /usr/lib/system/liblaunch.dylib

    0x7fff87246000 -     0x7fff8724dfff  libcopyfile.dylib (89.0.0 - compatibility 1.0.0) <876573D0-E907-3566-A108-577EAD1B6182> /usr/lib/system/libcopyfile.dylib

    0x7fff87437000 -     0x7fff8745cff7  libc++abi.dylib (24.4.0 - compatibility 1.0.0) <E7BD9363-1D25-3551-A68A-2E2FF6ABECD7> /usr/lib/libc++abi.dylib

    0x7fff89df0000 -     0x7fff89df8ff7  libsystem_dnssd.dylib (??? - ???) <F0AE5B8E-C7CC-3393-927F-B65DB04016AA> /usr/lib/system/libsystem_dnssd.dylib

    0x7fff89df9000 -     0x7fff89e61ff7  libc++.1.dylib (65.1.0 - compatibility 1.0.0) <20E31B90-19B9-3C2A-A9EB-474E08F9FE05> /usr/lib/libc++.1.dylib

    0x7fff8a2a7000 -     0x7fff8a3bf92f  libobjc.A.dylib (228.0.0 - compatibility 1.0.0) <90D31928-F48D-3E37-874F-220A51FD9E37> /usr/lib/libobjc.A.dylib

    0x7fff8a8ff000 -     0x7fff8a901fff  libquarantine.dylib (52.0.0 - compatibility 1.0.0) <4BE2E642-A14F-340A-B482-5BD2AEFD9C24> /usr/lib/system/libquarantine.dylib

    0x7fff8af64000 -     0x7fff8af65ff7  libremovefile.dylib (23.1.0 - compatibility 1.0.0) <DBBFAF35-AC78-3856-92F6-6E4FD9DF14A2> /usr/lib/system/libremovefile.dylib

    0x7fff8bc91000 -     0x7fff8bc9ffff  libcommonCrypto.dylib (50000.0.0 - compatibility 1.0.0) <DA98AE56-BC3F-3AD0-9660-8CC79E4AECE6> /usr/lib/system/libcommonCrypto.dylib

    0x7fff8c0db000 -     0x7fff8c111fff  libsystem_info.dylib (??? - ???) <4FFCA242-7F04-365F-87A6-D4EFB89503C1> /usr/lib/system/libsystem_info.dylib

    0x7fff8c61d000 -     0x7fff8c61eff7  libSystem.B.dylib (169.3.0 - compatibility 1.0.0) <365477AB-D641-389D-B8F4-A1FAE9657EEE> /usr/lib/libSystem.B.dylib

    0x7fff8c6a9000 -     0x7fff8c6abff7  libunc.dylib (25.0.0 - compatibility 1.0.0) <92805328-CD36-34FF-9436-571AB0485072> /usr/lib/system/libunc.dylib

    0x7fff8c6ac000 -     0x7fff8c6b2ff7  libunwind.dylib (35.1.0 - compatibility 1.0.0) <21703D36-2DAB-3D8B-8442-EAAB23C060D3> /usr/lib/system/libunwind.dylib

    0x7fff8c9de000 -     0x7fff8ca00ff7  libxpc.dylib (140.41.0 - compatibility 1.0.0) <FAC04D8B-680E-325F-8F0C-DD69859D0E01> /usr/lib/system/libxpc.dylib

    0x7fff8de1f000 -     0x7fff8de25fff  libmacho.dylib (829.0.0 - compatibility 1.0.0) <BF332AD9-E89F-387E-92A4-6E1AB74BD4D9> /usr/lib/system/libmacho.dylib

    0x7fff8efe8000 -     0x7fff8f003ff7  libsystem_kernel.dylib (2050.20.9 - compatibility 1.0.0) <EC0A9F5B-C9F5-336B-A7DD-49A718042F39> /usr/lib/system/libsystem_kernel.dylib

    0x7fff8f02e000 -     0x7fff8f033fff  libcache.dylib (57.0.0 - compatibility 1.0.0) <65187C6E-3FBF-3EB8-A1AA-389445E2984D> /usr/lib/system/libcache.dylib

    0x7fff8f14e000 -     0x7fff8f15cff7  libsystem_network.dylib (??? - ???) <0D99F24E-56FE-380F-B81B-4A4C630EE587> /usr/lib/system/libsystem_network.dylib

    0x7fff8f1aa000 -     0x7fff8f1abfff  libsystem_blocks.dylib (59.0.0 - compatibility 1.0.0) <D92DCBC3-541C-37BD-AADE-ACC75A0C59C8> /usr/lib/system/libsystem_blocks.dylib

    0x7fff8f71c000 -     0x7fff8f731ff7  libdispatch.dylib (228.23.0 - compatibility 1.0.0) <D26996BF-FC57-39EB-8829-F63585561E09> /usr/lib/system/libdispatch.dylib

    0x7fff8f732000 -     0x7fff8f733ff7  libsystem_sandbox.dylib (??? - ???) <3C3B03CF-C525-3CB3-8557-62E91B93AC95> /usr/lib/system/libsystem_sandbox.dylib

    0x7fff8f742000 -     0x7fff8f747fff  libcompiler_rt.dylib (30.0.0 - compatibility 1.0.0) <08F8731D-5961-39F1-AD00-4590321D24A9> /usr/lib/system/libcompiler_rt.dylib

    0x7fff8fd39000 -     0x7fff8fd85ff7  libauto.dylib (??? - ???) <73CDC482-16E3-3FC7-9BB4-FBA2DA44DBC2> /usr/lib/libauto.dylib

    0x7fff9099a000 -     0x7fff9099dff7  libdyld.dylib (210.2.3 - compatibility 1.0.0) <F59367C9-C110-382B-A695-9035A6DD387E> /usr/lib/system/libdyld.dylib

    0x7fff91130000 -     0x7fff91131ff7  libdnsinfo.dylib (453.18.0 - compatibility 1.0.0) <E7595861-ECF9-336E-9901-BED2620FAA80> /usr/lib/system/libdnsinfo.dylib

Sample analysis of process 17982 written to file /dev/stdout


MacBook Pro, OS X Mountain Lion (10.8.2)