[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] null domains after xl destroy
On 13/05/17 06:02, Glenn Enright wrote: > On 09/05/17 21:24, Roger Pau Monné wrote: >> On Mon, May 08, 2017 at 11:10:24AM +0200, Juergen Gross wrote: >>> On 04/05/17 00:17, Glenn Enright wrote: >>>> On 04/05/17 04:58, Steven Haigh wrote: >>>>> On 04/05/17 01:53, Juergen Gross wrote: >>>>>> On 03/05/17 12:45, Steven Haigh wrote: >>>>>>> Just wanted to give this a little nudge now people seem to be >>>>>>> back on >>>>>>> deck... >>>>>> >>>>>> Glenn, could you please give the attached patch a try? >>>>>> >>>>>> It should be applied on top of the other correction, the old debug >>>>>> patch should not be applied. >>>>>> >>>>>> I have added some debug output to make sure we see what is happening. >>>>> >>>>> This patch is included in kernel-xen-4.9.26-1 >>>>> >>>>> It should be in the repos now. >>>>> >>>> >>>> Still seeing the same issue. Without the extra debug patch all I see in >>>> the logs after destroy is this... >>>> >>>> xen-blkback: xen_blkif_disconnect: busy >>>> xen-blkback: xen_blkif_free: delayed = 0 >>> >>> Hmm, to me it seems as if some grant isn't being unmapped. >>> >>> Looking at gnttab_unmap_refs_async() I wonder how this is supposed to >>> work: >>> >>> I don't see how a grant would ever be unmapped in case of >>> page_count(item->pages[pc]) > 1 in __gnttab_unmap_refs_async(). All it >>> does is deferring the call to the unmap operation again and again. Or >>> am I missing something here? >> >> No, I don't think you are missing anything, but I cannot see how this >> can be >> solved in a better way, unmapping a page that's still referenced is >> certainly >> not the best option, or else we risk triggering a page-fault elsewhere. >> >> IMHO, gnttab_unmap_refs_async should have a timeout, and return an >> error at >> some point. Also, I'm wondering whether there's a way to keep track of >> who has >> references on a specific page, but so far I haven't been able to >> figure out how >> to get this information from Linux. >> >> Also, I've noticed that __gnttab_unmap_refs_async uses page_count, >> shouldn't it >> use page_ref_count instead? >> >> Roger. >> > > In case it helps, I have continued to work on this. I notices processed > left behind (under 4.9.27). The same issue is ongoing. > > # ps auxf | grep [x]vda > root 2983 0.0 0.0 0 0 ? S 01:44 0:00 \_ > [1.xvda1-1] > root 5457 0.0 0.0 0 0 ? S 02:06 0:00 \_ > [3.xvda1-1] > root 7382 0.0 0.0 0 0 ? S 02:36 0:00 \_ > [4.xvda1-1] > root 9668 0.0 0.0 0 0 ? S 02:51 0:00 \_ > [6.xvda1-1] > root 11080 0.0 0.0 0 0 ? S 02:57 0:00 \_ > [7.xvda1-1] > > # xl list > Name ID Mem VCPUs State Time(s) > Domain-0 0 1512 2 r----- 118.5 > (null) 1 8 4 --p--d 43.8 > (null) 3 8 4 --p--d 6.3 > (null) 4 8 4 --p--d 73.4 > (null) 6 8 4 --p--d 14.7 > (null) 7 8 4 --p--d 30 > > Those all have... > > [root 11080]# cat wchan > xen_blkif_schedule > > [root 11080]# cat stack > [<ffffffff814eaee8>] xen_blkif_schedule+0x418/0xb40 > [<ffffffff810a0555>] kthread+0xe5/0x100 > [<ffffffff816f1c45>] ret_from_fork+0x25/0x30 > [<ffffffffffffffff>] 0xffffffffffffffff And found another reference count bug. Would you like to give the attached patch (to be applied additionally to the previous ones) a try? Juergen Attachment:
0003-xen-blkback-don-t-use-xen_blkif_get-in-xen-blkback-k.patch _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |