[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.7 crash



On Tue, Jun 14, 2016 at 09:38:22AM -0400, Aaron Cornelius wrote:
> On 6/14/2016 9:26 AM, Aaron Cornelius wrote:
> >On 6/14/2016 9:15 AM, Wei Liu wrote:
> >>On Tue, Jun 14, 2016 at 09:11:47AM -0400, Aaron Cornelius wrote:
> >>>On 6/9/2016 7:14 AM, Ian Jackson wrote:
> >>>>Aaron Cornelius writes ("Re: [Xen-devel] Xen 4.7 crash"):
> >>>>>I am not that familiar with the xenstored code, but as far as I can tell
> >>>>>the grant mapping will be held by the xenstore until the xs_release()
> >>>>>function is called (which is not called by libxl, and I do not
> >>>>>explicitly call it in my software, although I might now just to be
> >>>>>safe), or until the last reference to a domain is released and the
> >>>>>registered destructor (destroy_domain), set by talloc_set_destructor(),
> >>>>>is called.
> >>>>
> >>>>I'm not sure I follow.  Or maybe I disagree.  ISTM that:
> >>>>
> >>>>The grant mapping is released by destroy_domain, which is called via
> >>>>the talloc destructor as a result of talloc_free(domain->conn) in
> >>>>domain_cleanup.  I don't see other references to domain->conn.
> >>>>
> >>>>domain_cleanup calls talloc_free on domain->conn when it sees the
> >>>>domain marked as dying in domain_cleanup.
> >>>>
> >>>>So I still think that your acl reference ought not to keep the grant
> >>>>mapping alive.
> >>>
> >>>It took a while to complete the testing, but we've finished trying to
> >>>reproduce the error using oxenstored instead of the C xenstored.  When the
> >>>condition occurs that caused the error with the C xenstored (on
> >>>4.7.0-rc4/8478c9409a2c6726208e8dbc9f3e455b76725a33), oxenstored does not
> >>>cause the crash.
> >>>
> >>>So for whatever reason, it would appear that the C xenstored does keep the
> >>>grant allocations open, but oxenstored does not.
> >>>
> >>
> >>Can you provide some easy to follow steps to reproduce this issue?
> >>
> >>AFAICT your environment is very specialised, but we should be able to
> >>trigger the issue with plan xenstore-* utilities?
> >
> >I am not sure if the plain xenstore-* utilities will work, but here are
> >the steps to follow:
> >
> >1. Create a non-standard xenstore path: /tool/test
> >2. Create a domU (mini-os/mirage/something small)
> >3. Add the new domU to the /tool/test permissions list (I'm not 100%
> >sure how to do this with the xenstore-* utilities)
> >    a. call xs_get_permissions()
> >    b. realloc() the permissions block to add the new domain
> >    c. call xs_set_permissions()
> >4. Delete the domU from step 2
> >5. Repeat steps 2-4
> >
> >Eventually the xs_set_permissions() function will return an E2BIG error
> >because the list of domains has grown too large.  Sometime after that is
> >when the crash occurs with the C xenstored and the 4.7.0-rc4 version of
> >Xen.  It usually takes around 1200 or so iterations for the crash to occur.
> 
> After writing up those steps I suddenly realized that I think I have a bug
> in my test that might have been causing the bug in the first place. Once I
> get errors returned from xs_set_permissions() I was not properly cleaning up
> the created domains.  So I think this was just a simple case of VMID
> exhaustion by creating more than 255 domUs at the same time.
> 
> In which case this is completely unrelated to xenstore holding on to grant
> allocations, and the C xenstore most likely behaves correctly.
> 

OK, so I will treat this issue as resolved for now. Let us know if you
discover something new.

Wei.

> - Aaron Cornelius
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.