[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Balloon driver bug in increase_reservation



On Wed, 4 Sep 2013, Wei Liu wrote:
> On Wed, Sep 04, 2013 at 02:15:51PM +0100, Stefano Stabellini wrote:
> > On Mon, 2 Sep 2013, Ian Campbell wrote:
> > > On Mon, 2013-09-02 at 16:13 +0100, Wei Liu wrote:
> > > > On Mon, Sep 02, 2013 at 04:09:26PM +0100, Ian Campbell wrote:
> > > > > On Mon, 2013-09-02 at 16:04 +0100, Wei Liu wrote:
> > > > > > On Mon, Sep 02, 2013 at 03:48:43PM +0100, Ian Campbell wrote:
> > > > > > > On Mon, 2013-09-02 at 15:43 +0100, Wei Liu wrote:
> > > > > > > > Hi, Stefano
> > > > > > > > 
> > > > > > > > I found another bug in the balloon scratch page code. As I 
> > > > > > > > didn't follow
> > > > > > > > the discussion on scratch page so I cannot propose a proper fix 
> > > > > > > > at the
> > > > > > > > moment.
> > > > > > > > 
> > > > > > > > The problem is that in balloon.c:increase_reservation, when a 
> > > > > > > > ballooned
> > > > > > > > page is resued, it can have a valid P2M entry pointing to the 
> > > > > > > > scratch,
> > > > > > > > hitting the BUG_ON
> > > > > > > > 
> > > > > > > >    BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
> > > > > > > >           phys_to_machine_mapping_valid(pfn));
> > > > > > > > 
> > > > > > > > As balloon worker might run by a CPU other then the one that 
> > > > > > > > returns the
> > > > > > > > page, checking pfn_to_mfn(pfn) == local_cpu_scratch_page_mfn 
> > > > > > > > wouldn't
> > > > > > > > work.  Checking pfn_to_mfn(pfn) belongs to the set of all 
> > > > > > > > scratch page
> > > > > > > > mfns is not desirable.
> > > > > > > 
> > > > > > > This makes me think that whoever suggested that pfn_to_mfn for a
> > > > > > > ballooned page out to return INVALID_MFN was right.
> > > > > > > 
> > > > > > 
> > > > > > If there are many balloon pages the check can be expensive.
> > > > > 
> > > > > IIRC the suggestion was that the p2m for a ballooned out page would
> > > > > contain INVALID_MFN, so the expense is just the lookup you would be
> > > > > doing anyway.
> > > > > 
> > > > 
> > > > That was David's idea. Stefano was worried that other PVOPS hooks would
> > > > need to know about the MFN. I don't know how much it holds true. Need
> > > > some input from Konrad I think.
> > > 
> > > Hrm, aren't those expected to not be operating on ballooned out pages
> > > anyway? Prior to this change those callers would have got INVALID_MFN, I
> > > think?
> > > 
> > > My gut feeling is that places which accidentally/unexpectedly have a
> > > ballooned out page in their hand have either a virtual address or an mfn
> > > (e.g. a cr2 fault address) and not a pfn in their hands, and won't in
> > > general be that interested in the pfn and/or are already geared up to
> > > deal with INVALID_MFN because previously that's what they would have
> > > gotten. There's every chance I'm wrong though.
> > 
> > 
> > I think we should just remove the BUG_ON:
> > 
> > - if the guest is autotranslate, then the BUG_ON is already irrelevant;
> > - if the guest is not autotranslate, then we are sure that the p2m of
> >   the page is pointing to a scratch page;
> > 
> > either way the BUG_ON is wrong.
> 
> Is it there to guard other misuse of the function? IOW can we be
> confident that all calls are legit?

the ballooned_pages list would need to be broken for this BUG_ON to
trigger


> Messing up the P2M table looks very hard to debug?
>
> > Otherwise could simply implement a is_balloon_scratch_page function that
> > checks whether a given pfn corresponds to any of the scratch pages (it
> > doesn't need to be the scratch page of this cpu).
> 
> That's quite expensive IMHO, especially when you have lots of CPU's and
> lots of ballooned pages.

as a compromise we could run a check on the pfn only on debug runs?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.