[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xen-4.6: xenstored crashes during domain->interface access
On 28.01.2016 10:39, Ian Campbell wrote: > On Thu, 2016-01-28 at 09:50 +0100, Stefan Bader wrote: >> On 26.01.2016 11:58, Stefan Bader wrote: >>> Hi, >>> >>> while playing around with xen-4.6 I stumbled over an odd problem and am >>> wondering whether anybody has seen the same. A method to relatively >>> quickly >>> reproduce this for me seems to: >>> >>> - Start one domU (PV or HVM does not seem to matter) >>> - Repeatedly call xenstore-ls a few times >>> >>> I think I never got beyond 10 repeats when the xenstore-ls call >>> suddenly locks >>> up and xenstored crashes with a SIGBUS error. In the majority of cases >>> (I think >>> I saw one different), the crash happens while accessing conn->domain- >>>> interface >>> in tools/xenstore/xenstored_domain.c:domain_can_read(). >>> Looking at the corefile produced by xenstored I now got at least one >>> case where >>> the pointer still matches the previously mapped value. Though I think I >>> had also >>> at least one run (with less debugging added) where it seemed to be >>> really wrong. >>> There is more info at [1] in case someone is interested. >>> >>> I need to repeat a few more times to see how consistent the whole thing >>> is. Does >>> this happen for anybody else? Any advice what I should look at (in the >>> sense of >>> gathering better data)? >> >> Just as an update and confirmation for Ian and Bastian: Debian testing is >> fine. >> I have not dug into the specifics but its not the Xen package side at all. >> Something in our 4.3 kernel causes this. Unfortunately without any hint in >> dmesg. But since we move to 4.4 soon and I cannot reproduce it with the >> pending >> 4.4 build it seems good enough to me. > > Ah, this is probably fixed by 9c17d96500f78 "xen/gntdev: Grant maps should > not be subject to NUMA balancing" then. Oh right. That sounds very possible. Maybe paired with balancing done even on a non-NUMA system (because I saw the same happen on a non-NUMA host, too). And I cannot remember anytime having this with 4.2, so 4.3 seems to have introduced the additional (or maybe more aggressive) balancing. But the result pretty much was what I saw. That from one second to the next the grant-table page of xenstored for the running domU was invalid. Without the daemon having done any unmap. So yeah, likely the balancing got rid of it. -Stefan > > Ian. > Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |