[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.7 crash
On Mon, Jun 06, 2016 at 03:05:47PM +0100, Julien Grall wrote: > (CC Ian, Stefano and Wei) > > Hello Aaron, > > On 06/06/16 14:58, Aaron Cornelius wrote: > >On 6/2/2016 5:07 AM, Julien Grall wrote: > >>Hello Aaron, > >> > >>On 02/06/2016 02:32, Aaron Cornelius wrote: > >>>This is with a custom application, we use the libxl APIs to interact > >>>with Xen. Domains are created using the libxl_domain_create_new() > >>>function, and domains are destroyed using the libxl_domain_destroy() > >>>function. > >>> > >>>The test in this case creates a domain, waits a minute, then > >>>deletes/creates the next domain, waits a minute, and so on. So I > >>>wouldn't be surprised to see the VMID occasionally indicate there are 2 > >>>active domains since there could be one being created and one being > >>>destroyed in a very short time. However, I wouldn't expect to ever have > >>>256 domains. > >> > >>Your log has: > >> > >>(XEN) grant_table.c:3288:d0v1 Grant release (0) ref:(9) flags:(2) dom:(0) > >>(XEN) grant_table.c:3288:d0v1 Grant release (1) ref:(11) flags:(2) > >>dom:(0) > >> > >>Which suggest that some grants are still mapped in DOM0. > >> > >>> > >>>The CubieTruck only has 2GB of RAM, I allocate 512MB for dom0 which > >>>means that only 48 of the the Mirage domains (with 32MB of RAM) would > >>>work at the same time anyway. Which doesn't account for the various > >>>inter-domain resources or the RAM used by Xen itself. > >> > >>All the pages who belongs to the domain could have been freed except the > >>one referenced by DOM0. So the footprint of this domain will be limited > >>at the time. > >> > >>I would recommend you to check how many domain are running at this time > >>and if DOM0 effectively released all the resources. > >> > >>>If the p2m_teardown() function checked for NULL it would prevent the > >>>crash, but I suspect Xen would be just as broken since all of my > >>>resources have leaked away. More broken in fact, since if the board > >>>reboots at least the applications will restart and domains can be > >>>recreated. > >>> > >>>It certainly appears that some resources are leaking when domains are > >>>deleted (possibly only on the ARM or ARM32 platforms). We will try to > >>>add some debug prints and see if we can discover exactly what is > >>>going on. > >> > >>The leakage could also happen from DOM0. FWIW, I have been able to cycle > >>2000 guests over the night on an ARM platforms. > >> > > > >We've done some more testing regarding this issue. And further testing > >shows that it doesn't matter if we delete the vchans before the domains > >are deleted. Those appear to be cleaned up correctly when the domain is > >destroyed. > > > >What does stop this issue from happening (using the same version of Xen > >that the issue was detected on) is removing any non-standard xenstore > >references before deleting the domain. In this case our application > >allocates permissions for created domains to non-standard xenstore > >paths. Making sure to remove those domain permissions before deleting > >the domain prevents this issue from happening. > > I am not sure to understand what you mean here. Could you give a quick > example? > > > > >It does not appear to matter if we delete the standard domain xenstore > >path (/local/domain/<id>) since libxl handles removing this path when > >the domain is destroyed. > > > >Based on this I would guess that the xenstore is hanging onto the VMID. > This is a somewhat strange conclusion. I guess the root cause is still unclear at this point. Is it possible that something else what rely on those xenstore node to free up resources? Wei. > Regards, > > -- > Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |