So, I used dtrace to diagnose a pretty subtle performance problem with Ecelerity a while ago and just got around to implementing an enhancement to obviate that bottleneck. This would be asynchronous socket shutdowns and closes under situations that were "challenging" before.
I built out a generalized asynchronous socket shutdown and close framework and deployed it throughout the application. On Solaris, our event system now handles the socket() calls, the port_associates() and port_diassociates(), read/write/readv/writev/send/recv/etc. However, now in all possible places we asynchronous the shutdown() and close() to other threads to avoid some minor performance issues. Basically, the same thing the lingerd patch to Apache does.
However... I launched the new code in our test environment and BOOM!
# mdb unix.2 vmcore.2
Loading modules: [ unix krtld genunix specfs ufs ip sctp usba fctl nca
lofs nfs random ipc crypto sppp ]
> ::stack
vpanic(fe96c300)
turnstile_block+0x2ff(d07ea5f0, 0, d5a07700, fec022f8, 0, 0)
mutex_vector_enter+0x2d4(d5a07700)
getf+0x3f(3dc)
port_dissociate_fd+0x42(d3842180, 3dc)
portfs+0x131()
sys_sysenter+0xdc()
and
# mdb unix.5 vmcore.5
Loading modules: [ unix krtld genunix specfs ufs ip sctp usba fctl nca
lofs nfs random ipc crypto sppp ]
> ::stack
vpanic(fe96c300)
turnstile_block+0x2ff(d07be000, 0, d2188348, fec022f8, 0, 0)
mutex_vector_enter+0x2d4()
port_close_pfd+0x2f(d43742c0)
port_close_fd+0x58(d43742c0, 45)
closeandsetf+0x2db(45, 0)
close+0xd()
sys_sysenter+0xdc()
6 panics in 60 minutes. Yikes?! Time to call Sun on the tele.
Tuesday, January 24. 2006 at 02:06 (Link) (Reply)
When I was first porting httpd 2.1-dev to a Solaris Express Build last year to use the Event Ports, I was able to get a panic in the same general area:
list_remove+0x14(300013a3e18, 30001421c20, 0, 0, 4, 300019cfd00)
port_remove_done_event+0x4c(300013a3e18, 3000258a858, 30001421c28, 30001421c20, 0, 12)
port_associate_fd+0x1ac(30001421bc0, 4, 3000258a858, 1, 13ff20, 51)
portfs+0x234(4, a, 3, 0, 1, 13ff20)
syscall_trap32+0xcc(1, a, 4, 3, 1, 13ff20)
I reported it, and they said it was fixed in later versions..... I can only assume you have all the possible patches applied :)
-Paul
Tuesday, January 24. 2006 at 18:16 (Reply)
This is a known problem and will
be fixed shortly. FYI, the bug number is 6357796.
-Prakash.