[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Balloon driver bug in increase_reservation
On Mon, 2 Sep 2013, Ian Campbell wrote: > On Mon, 2013-09-02 at 16:13 +0100, Wei Liu wrote: > > On Mon, Sep 02, 2013 at 04:09:26PM +0100, Ian Campbell wrote: > > > On Mon, 2013-09-02 at 16:04 +0100, Wei Liu wrote: > > > > On Mon, Sep 02, 2013 at 03:48:43PM +0100, Ian Campbell wrote: > > > > > On Mon, 2013-09-02 at 15:43 +0100, Wei Liu wrote: > > > > > > Hi, Stefano > > > > > > > > > > > > I found another bug in the balloon scratch page code. As I didn't > > > > > > follow > > > > > > the discussion on scratch page so I cannot propose a proper fix at > > > > > > the > > > > > > moment. > > > > > > > > > > > > The problem is that in balloon.c:increase_reservation, when a > > > > > > ballooned > > > > > > page is resued, it can have a valid P2M entry pointing to the > > > > > > scratch, > > > > > > hitting the BUG_ON > > > > > > > > > > > > BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) && > > > > > > phys_to_machine_mapping_valid(pfn)); > > > > > > > > > > > > As balloon worker might run by a CPU other then the one that > > > > > > returns the > > > > > > page, checking pfn_to_mfn(pfn) == local_cpu_scratch_page_mfn > > > > > > wouldn't > > > > > > work. Checking pfn_to_mfn(pfn) belongs to the set of all scratch > > > > > > page > > > > > > mfns is not desirable. > > > > > > > > > > This makes me think that whoever suggested that pfn_to_mfn for a > > > > > ballooned page out to return INVALID_MFN was right. > > > > > > > > > > > > > If there are many balloon pages the check can be expensive. > > > > > > IIRC the suggestion was that the p2m for a ballooned out page would > > > contain INVALID_MFN, so the expense is just the lookup you would be > > > doing anyway. > > > > > > > That was David's idea. Stefano was worried that other PVOPS hooks would > > need to know about the MFN. I don't know how much it holds true. Need > > some input from Konrad I think. > > Hrm, aren't those expected to not be operating on ballooned out pages > anyway? Prior to this change those callers would have got INVALID_MFN, I > think? > > My gut feeling is that places which accidentally/unexpectedly have a > ballooned out page in their hand have either a virtual address or an mfn > (e.g. a cr2 fault address) and not a pfn in their hands, and won't in > general be that interested in the pfn and/or are already geared up to > deal with INVALID_MFN because previously that's what they would have > gotten. There's every chance I'm wrong though. I think we should just remove the BUG_ON: - if the guest is autotranslate, then the BUG_ON is already irrelevant; - if the guest is not autotranslate, then we are sure that the p2m of the page is pointing to a scratch page; either way the BUG_ON is wrong. Otherwise could simply implement a is_balloon_scratch_page function that checks whether a given pfn corresponds to any of the scratch pages (it doesn't need to be the scratch page of this cpu). _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |