[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen-4.6: xenstored crashes during domain->interface access

On 28.01.2016 10:39, Ian Campbell wrote:
> On Thu, 2016-01-28 at 09:50 +0100, Stefan Bader wrote:
>> On 26.01.2016 11:58, Stefan Bader wrote:
>>> Hi,
>>> while playing around with xen-4.6 I stumbled over an odd problem and am
>>> wondering whether anybody has seen the same. A method to relatively
>>> quickly
>>> reproduce this for me seems to:
>>> - Start one domU (PV or HVM does not seem to matter)
>>> - Repeatedly call xenstore-ls a few times
>>> I think I never got beyond 10 repeats when the xenstore-ls call
>>> suddenly locks
>>> up and xenstored crashes with a SIGBUS error. In the majority of cases
>>> (I think
>>> I saw one different), the crash happens while accessing conn->domain-
>>>> interface
>>> in tools/xenstore/xenstored_domain.c:domain_can_read().
>>> Looking at the corefile produced by xenstored I now got at least one
>>> case where
>>> the pointer still matches the previously mapped value. Though I think I
>>> had also
>>> at least one run (with less debugging added) where it seemed to be
>>> really wrong.
>>> There is more info at [1] in case someone is interested.
>>> I need to repeat a few more times to see how consistent the whole thing
>>> is. Does
>>> this happen for anybody else? Any advice what I should look at (in the
>>> sense of
>>> gathering better data)?
>> Just as an update and confirmation for Ian and Bastian: Debian testing is 
>> fine.
>> I have not dug into the specifics but its not the Xen package side at all.
>> Something in our 4.3 kernel causes this. Unfortunately without any hint in
>> dmesg. But since we move to 4.4 soon and I cannot reproduce it with the 
>> pending
>> 4.4 build it seems good enough to me.
> Ah, this is probably fixed by 9c17d96500f78 "xen/gntdev: Grant maps should
> not be subject to NUMA balancing" then.

Oh right. That sounds very possible. Maybe paired with balancing done even on a
non-NUMA system (because I saw the same happen on a non-NUMA host, too). And I
cannot remember anytime having this with 4.2, so 4.3 seems to have introduced
the additional (or maybe more aggressive) balancing.
But the result pretty much was what I saw. That from one second to the next the
grant-table page of xenstored for the running domU was invalid. Without the
daemon having done any unmap. So yeah, likely the balancing got rid of it.

> Ian.

Attachment: signature.asc
Description: OpenPGP digital signature

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.