[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] help with xenstored 'hang'
I was recently struggling with what sounds like a not-too-dissimilar problem while working with a disaggregated version of xenstore. The ultimate solution for me was to disable pthreads in xenstore/libxs. I just commented out the following line in tools/xenstore/Makefile: xs.opic: CFLAGS += -DUSE_PTHREAD After I removed that line and rebuilt and installed xenstore, it worked just fine. I would be curious to know if this also solves your problem. Patrick On 30 June 2010 15:15, Jim Fehlig <jfehlig@xxxxxxxxxx> wrote: > I'm trying to debug an 'xm list' hang on a large (~700 hosts) Xen 3.2 > production installation. ÂThe hang occurs randomly, on a random host. > User has provided cores of xend and xenstored processes when hang > occurs. ÂAfter poking at these cores I have discovered > > In xend process, a thread is blocked on a cond variable, waiting for a > response to XS_TRANSACTION_START from xenstored. A reader thread > responsible for reading from xenstored is blocked on read(2). > > In the xenstored process, the lone thread is blocked on select(2), > waiting for IO. I examined the connections list and see that it contains > a connection for the XS_TRANSACTION_START request. ÂDumping the > connection object: > > (gdb) p *(struct connection *)0x526c70 > $48 = {list = {next = 0x517c30, prev = 0x5151f0}, fd = 13, id = 0, > can_write = > true, in = 0x523600, > out_list = {next = 0x526c98, prev = 0x526c98}, transaction = 0x0, > transaction_list = {next = 0x523560, > prev = 0x523560}, next_transaction_id = 60231445, transaction_started = 1, > domain = 0x0, watches = { > next = 0x51daa0, prev = 0x5267b0}, write = 0x402460 <writefd>, read = > 0x405180 <readfd>} > > Notice transaction_started is set to 1, but out_list is empty. AFAICT, > that means the reply has been sent to xend. The reader thread in xend > should have received the response and signaled the cond variable - > allowing execution to progress. Ultimately, xend would send a > XS_TRANSACTION_END message, freeing the connection object in xenstored > and removing it from connections list. > > Does my understanding of this code sound correct? ÂAnyone have > suggestions or further debugging tips? ÂExamining cores is about my only > debug option as user does not want to deploy debug patches, enable > tracing, etc. across 700 hosts. > > Interestingly, when user strace's or attaches to xenstored process with > gdb, xenstored "awakes", the hung 'xm list' returns, and xenstored > continues normally. ÂA new connection to xenstored (e.g. running xmtop) > seems to poke it along as well. ÂWould a timeout on select(2) in main > loop of xenstored help at all? > > Thanks for any insights! > Jim > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |