[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.7 crash
On 01/06/2016 23:18, Julien Grall wrote: > Hi Andrew, > > On 01/06/2016 22:24, Andrew Cooper wrote: >> On 01/06/2016 21:45, Aaron Cornelius wrote: >>>> >>>>> However, since I only have 1 domain active at a time, I'm not sure >>>>> why I >>>> should run out of VM IDs. >>>> >>>> Sounds like a VMID resource leak. Check to see whether it is freed >>>> properly >>>> in domain_destroy(). >>>> >>>> ~Andrew >>> That would be my assumption. But as far as I can tell, >>> arch_domain_destroy() calls pwm_teardown() which calls >>> p2m_free_vmid(), and none of the functionality related to freeing a >>> VM ID appears to have changed in years. >> >> The VMID handling looks suspect. It can be called repeatedly during >> domain destruction, and it will repeatedly clear the same bit out of the >> vmid_mask. > > Can you explain how the p2m_free_vmid can be called multiple time? > > We have the following path: > arch_domain_destroy -> p2m_teardown -> p2m_free_vmid. > > And I can find only 3 call of arch_domain_destroy we should only be > done once per domain. > > If arch_domain_destroy is called multiple time, p2m_free_vmid will not > be the only place where Xen will be in trouble. You are correct. I was getting my phases of domain destruction mixed up. arch_domain_destroy() is strictly once, after the RCU reference of the domain has dropped to 0. > >> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c >> index 838d004..7adb39a 100644 >> --- a/xen/arch/arm/p2m.c >> +++ b/xen/arch/arm/p2m.c >> @@ -1393,7 +1393,10 @@ static void p2m_free_vmid(struct domain *d) >> struct p2m_domain *p2m = &d->arch.p2m; >> spin_lock(&vmid_alloc_lock); >> if ( p2m->vmid != INVALID_VMID ) >> - clear_bit(p2m->vmid, vmid_mask); >> + { >> + ASSERT(test_and_clear_bit(p2m->vmid, vmid_mask)); >> + p2m->vmid = INVALID_VMID; >> + } >> >> spin_unlock(&vmid_alloc_lock); >> } >> >> Having said that, I can't explain why that bug would result in the >> symptoms you are seeing. It is also possibly that your issue is memory >> corruption from a separate source. >> >> Can you see about instrumenting p2m_alloc_vmid()/p2m_free_vmid() (with >> vmid_alloc_lock held) to see which vmid is being allocated/freed ? >> After the initial boot of the system, you should see the same vmid being >> allocated and freed for each of your domains. > > Looking quickly at the log, the domain is dom1101. However, the number > maximum number of VMID supported is 256, so the exhaustion might be a > race somewhere. > > I would be interested to get a reproducer. I wrote a script to cycle a > domain (create/domain) in loop, and I have not seen any issue after > 1200 cycles (and counting). Given that my previous thought was wrong, I am going to suggest that some other form of memory corruption is a more likely cause. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |