[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] QEMU bumping memory bug analysis
On Tue, 9 Jun 2015, Ian Campbell wrote: > On Tue, 2015-06-09 at 11:14 +0100, Stefano Stabellini wrote: > > I don't think that the current solution is inherently racy. If we are > > really interested in an honest evaluation of the current solution in > > terms of races, I would be happy to do so. > > Consider a domain with 1M of RAM (==target and maxmem for the sake of > argument) and two simultaneous calls to libxl_set_memory_target, both > with relative=0 and enforce=1, one with target=3 and the other with > target=5. > > target=3 call target=5 call > > get ptr.max_memkb, gets 1 > > get ptr.max_memkb, gets 1 > start transaction > do xenstore stuff, seeing target=1, setting target=3 > memorykb = ptr.max_memkb - current_target_memkb + new_target_memkb; > memorykb = 1 - 1 + 3 > xc_setmaxmem(3) > transaction commit (success) > > Now target=maxmem=3 > > start transaction > do xenstore stuff, seeing target=3, setting > target=5 > memorykb = ptr.max_memkb - current_target_memkb > + new_target_memkb; > memorykb = 1 - 3 + 5 > xc_setmaxmem(3) > transaction commit (success) > > Now target=5, maxmem=3. Yes, I see the issue. Pretty nasty! > The obvious fix of moving the get ptr.max_memkb inside the transaction > fails in a different way in the case where the first transaction commit > fails and is retried after the second one, I think. The underlying problem is that xc_domain_setmaxmem only takes absolute values. Retrieving the old maxmem value and setting the new one cannot be done atomically, so they need to be protected by a lock or a transaction. You are right that moving get ptr.max_memkb inside the transaction would only fix the problem if we also properly rolled back the old maxmem value in case of failures. But I don't think it is actually possible because it could race against other libxl_set_memory_target calls. Am I wrong? This would also be a problem before 0c029c4da21. > Prior to 0c029c4da21 "libxl_set_memory_target: retain the same maxmem > offset on top of the current target" this issue didn't exist because > memorykb was just memorykb = new_target_memkb + videoram. It is true that reverting 0c029c4da21 would improve the situation. However I think that you found a problem here that goes beyond 0c029c4da21 :-( > BTW, I noticed some other (unrelated) dubious stuff in there while > looking at this, in particular the setmaxmem is not rolled back if > set_pod_target fails. Yes, you are right! It is also not rolled back in case the xenstore transaction fails with errno != EAGAIN, even before 0c029c4da21. > Also the transiently "wrong" maxmem between the > setmaxmem and a failed transaction getting redone might possibly give > rise to some interesting cases, especially if anything else fails the > second time around the loop. I am thinking that the while idea of an "enforce" option was a bad to begin with. Not only we should revert 0c029c4da21, but actually we should just get rid of "enforce" altogether? I am suggesting it because I don't think that reverting 0c029c4da21 would fix all the issues. In a pre-0c029c4da21 world: target=3 enforce=1 start transaction do xenstore stuff, seeing target=1, setting target=3 memorykb = new_target_memkb + videoram; xc_setmaxmem(3+videoram) transaction commit (fail errno=ESOMETHING) Now target=1, maxmem=3+videoram Unless we say that we don't care about leaving the system in a consistent state in case of failures != EAGAIN. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |