[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.3 + tmem = Xen BUG at domain_page.c:143
>>> On 12.06.13 at 13:00, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote: > create ^ > title it map_domain_page second-stage emergency fallback path never taken > thanks > > On Tue, Jun 11, 2013 at 7:52 PM, konrad wilk <konrad.wilk@xxxxxxxxxx> wrote: >> >>> The BUG_ON() here is definitely valid - a few lines down, after the >>> enclosing if(), we use it in ways that requires this to not have >>> triggered. It basically tells you whether an in range idx was found, >>> which apparently isn't the case here. >>> >>> As I think George already pointed out - printing accum here would >>> be quite useful: It should have at least one of the low 32 bits set, >>> given that dcache->entries must be at most 32 according to the >>> data you already got logged. >> >> >> With extra debugging (see attached patch) >> >> (XEN) domain_page.c:125:d1 mfn: 1eb483, [0]: bffff1ff, ~ffffffff40000e00, >> idx: 9 garbage: 40000e00, inuse: ffffffff >> (XEN) domain_page.c:125:d1 mfn: 1eb480, [0]: fdbfffff, ~ffffffff02400000, >> idx: 22 garbage: 2400000, inuse: ffffffff >> (XEN) domain_page.c:125:d1 mfn: 2067ca, [0]: fffff7ff, ~ffffffff00000800, >> idx: 11 garbage: 800, inuse: ffffffff >> (XEN) domain_page.c:125:d1 mfn: 183642, [0]: ffffffff, ~ffffffff00000000, >> idx: 32 garbage: 0, inuse: ffffffff > > So regardless of the fact that tmem is obviously holding what are > supposed to be short-term references for so long, there is something > that seems not quite right about this failure path. > > It looks like the algorithm is: > 1. Clean the garbage map and update the inuse list > 2. If anything has been cleaned up, use the first not-inuse entry > 3. Otherwise, do something else ("replace a hash entry" -- not sure > exactly what that means). > > What we see above is that this failure path succeeds three times, but > fails the fourth time: there are, in fact, no zero entries after the > garbage clean-up; however, because "inuse" is 32-bit (effectively) and > "accum" is 64-bit, ~inuse always has bits 32-63 set, and so will > always return true and never fall back to the "something else" Right, that's what occurred to me too yesterday, but the again I knew I had seen this code path executed. Now that I look again, I think I understand why: All of my Dom0-s and typical DomU-s have a vCPU count divisible by 4, and with MAPCACHE_VCPU_ENTRIES being 16, the full unsigned long would always be used. > This is probably not something we need to fix for 4.3, but we should > put it on our to-do list. Actually I think we should fix this right away. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |