[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] QEMU bumping memory bug analysis



On Fri, 5 Jun 2015, Wei Liu wrote:
> On Fri, Jun 05, 2015 at 06:10:17PM +0100, Stefano Stabellini wrote:
> > On Fri, 5 Jun 2015, Wei Liu wrote:
> > > Hi all
> > > 
> > > This bug is now considered a blocker for 4.6 release.
> > > 
> > > The premises of the problem remain the same (George's translated
> > > version):
> > > 
> > > 1. QEMU may need extra pages from Xen to implement option ROMS, and so at
> > >    the moment it calls set_max_mem() to increase max_pages so that it can
> > >    allocate more pages to the guest.  libxl doesn't know what max_pages a
> > >    domain needs prior to qemu start-up.
> > > 
> > > 2. Libxl doesn't know max_pages even after qemu start-up, because there
> > >    is no mechanism to communicate between qemu and libxl.
> > 
> > I might not know what is the right design for the overall solution, but
> > I do know that libxl shouldn't have its own state tracking for
> > max_pages, because max_pages is kept, maintained and enforced by Xen.
> > 
> > Ian might still remember, but at the beginning of the xl/libxl project,
> > we had few simple design principles. One of which was that we should not
> > have two places where we keep track of the same thing. If Xen keeps
> > track of something, libxl should avoid it.
> > 
> > In this specific case, libxl can ask Xen at any time what max_pages is
> > for the domain, so I don't think that libxl should store it or have its
> > own tracking for it.
> > 
> > Even if QEMU called into libxl to change max_pages, I don't think that
> > libxl should store max_pages anywhere. It is already stored in Xen and
> > can be retrieved at any time.
> > 
> 
> I think you're talking about keep track of that record permanently. I
> only care about getting the right value at the right time and transfer
> it to the other end. Getting the value whenever needed is OK.
> 
> > 
> > > 3. QEMU calls xc_domain_setmaxmem to increase max_pages by N pages.
> > >    Those pages are only accounted for in the hypervisor.  Libxl
> > >    (currently) does not extract that value from the hypervisor.
> > > 
> > > Several solutions were proposed:
> > > 
> > > 1. Add a new record type in libxc migration stream and call setmaxmem
> > >    in the middle of xc migration stream.
> > > 
> > > Main objections are calling xc_domain_setmaxmem in the middle of xc
> > > migration stream is layer violation. Also this prevents us from
> > > disaggregating domain construction to a less privileged domain.
> > 
> > It seems to me that max_pages is one of the memory properties of the
> > domain, so it should be saved and restored together with the rest of
> > memory.
> > 
> 
> #1 was actually referring to Don's patch. When processing libxc record
> in the middle of the stream, we should not alter the size of memory.
> It's not safe because we don't know whether that record comes early
> enough before we exceed the limit.
> 
> The only safe way of doing it is to mandate that specific record at
> the beginning of libxc stream, which might have other implications.

I see, that makes sense


> > 
> > > 2. Use libxl toolstack save restore blob to tranmit max pages
> > >    information to remote end.
> > > 
> > > This is considered a bodge and has been proven not to work because
> > > toolstack blob restore happens after xc_domain_restore.
> > 
> > Saving and restoring max_pages in libxl seems to me like a layering
> > violation. I would avoid 2. 3. 4. and 5.
> > 
> 
> No, not really. If we follow the principle of "libxl is the
> arbitrator". It needs to have thorough information on every aspect of
> the domain and set up limit.

I am not sure I buy this "libxl is the arbitrator" concept. I am not
seeing libxl as adding much value in this context.

 
> > > 3. Add a libxl layer that wraps necessary information, take over
> > >    Andrew's work on libxl migration v2.  Having a libxl layer that's not
> > >    part of migration v2 is a waste of effort.
> > > 
> > > There are several obstacles for libxl migration v2 at the moment. Libxl
> > > layer in migration v2 still has unresolved issues. It has
> > > inter-dependency with Remus / COLO.
> > > 
> > > Most importantly it doesn't inherently solve the problem. It still
> > > requires the current libxl JSON blob to contain information about max
> > > pages (or information used to derive max pages).
> > > 
> > > Andrew, correct me if I'm wrong.
> > > 
> > > 4. Add a none user configurable field in current libxl JSON structure to
> > >    record max pages information.
> > > 
> > > This is not desirable. All fields in libxl JSON should be user
> > > configurable.
> > > 
> > > 5. Add a user configurable field in current libxl JSON structure to
> > >    record how much more memory this domain needs. Admin is required to
> > >    fill in that value manually. In the mean time we revert the change in
> > >    QEMU and declare QEMU with that change buggy.
> > 
> > QEMU 2.3.0 was released with that change in it, so it is not quite
> > possible to revert it. Also I think it is the right change for QEMU.
> > 
> 
> It has security implications. Here is my reply copied from my mail to
> Ian:
> 
> I'm considering removing xc_domain_setmaxmem needs regardless of this
> bug because that's going to cause problem in QEMU upstream stubdom with
> strict XSM policy and deprivileged QEMU (may not have privilege to call
> setmaxmem).

QEMU running in the stubdom should be able to set the maxmem for its
target domain, but not for the others.


> The security implication as it is now is big enough. One malicious guest
> that controls QEMU has a vector to DoS hypervisor by setting its own
> max_pages to -1;

Is that an issue in the implementation of XEN_DOMCTL_max_mem on the Xen
side?  Could you please give me a bit more info on this security issue?


> > 
> > > No response to this so far. But in fact I consider this the most viable
> > > solution.
> > > 
> > > It's a simple enough solution that is achievable within 4.6 time frame.
> > > It doesn't prevent us from doing useful work in the future
> > > (disaggregated architecture with stricter security policy). It provides
> > > a way to work around buggy QEMU (admin sets that value to prevent QEMU
> > > from bumping memory limit). It's orgthogonal to migration v2 which means
> > > it won't be blocked by migration v2 or block migration v2.
> > > 
> > > I tend to go with solution 5. Speak up if you don't agree with my
> > > analysis or you think I miss some aspects.
> > 
> > I disagree
> > 
> > 
> > > For long term we need to:
> > > 
> > > 1. Establish libxl as the arbitrator how much pages a domain can have.
> > >    Anything else doing stuff behind arbitrator's back is considered
> > >    buggy. This principle probably apply to other aspects of a domain as
> > >    well.
> > 
> > I disagree that libxl should be the arbitrator of a property that is
> > stored, maintained and enforced by Xen. Xen should be the arbitrator.
> > 
> 
> But it would be a burden for Xen once the arbitration goes beyond the
> issue we discuss here. It certainly doesn't have as much as information
> libxl has. In the long run we would still end up having libxl doing most
> of the work so we might as well establish the principle now.

I would like to see some concrete examples of how libxl is adding value
in this context instead of just adding layers of indirection.

 
> > 
> > > 2. Work out a solution communicate between QEMU and libxl. This can be
> > >    expanded to cover other components in a Xen setup, but currently we
> > >    only have QEMU.
> > 
> > Even if QEMU called into libxl to change maxmem, I don't think that
> > libxl should store maxmem anywhere. It is already stored in Xen.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.