[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] null domains after xl destroy
On 2017-05-16 10:49, Glenn Enright wrote: On 15/05/17 21:57, Juergen Gross wrote:On 13/05/17 06:02, Glenn Enright wrote:On 09/05/17 21:24, Roger Pau Monné wrote:On Mon, May 08, 2017 at 11:10:24AM +0200, Juergen Gross wrote:On 04/05/17 00:17, Glenn Enright wrote:On 04/05/17 04:58, Steven Haigh wrote:On 04/05/17 01:53, Juergen Gross wrote:On 03/05/17 12:45, Steven Haigh wrote:Just wanted to give this a little nudge now people seem to be back on deck...Glenn, could you please give the attached patch a try?It should be applied on top of the other correction, the old debugpatch should not be applied.I have added some debug output to make sure we see what is happening.This patch is included in kernel-xen-4.9.26-1 It should be in the repos now.Still seeing the same issue. Without the extra debug patch all I see inthe logs after destroy is this... xen-blkback: xen_blkif_disconnect: busy xen-blkback: xen_blkif_free: delayed = 0Hmm, to me it seems as if some grant isn't being unmapped.Looking at gnttab_unmap_refs_async() I wonder how this is supposed towork: I don't see how a grant would ever be unmapped in case ofpage_count(item->pages[pc]) > 1 in __gnttab_unmap_refs_async(). All it does is deferring the call to the unmap operation again and again. Oram I missing something here?No, I don't think you are missing anything, but I cannot see how thiscan be solved in a better way, unmapping a page that's still referenced is certainlynot the best option, or else we risk triggering a page-fault elsewhere.IMHO, gnttab_unmap_refs_async should have a timeout, and return an error atsome point. Also, I'm wondering whether there's a way to keep track ofwho has references on a specific page, but so far I haven't been able to figure out how to get this information from Linux. Also, I've noticed that __gnttab_unmap_refs_async uses page_count, shouldn't it use page_ref_count instead? Roger.In case it helps, I have continued to work on this. I notices processedleft behind (under 4.9.27). The same issue is ongoing. # ps auxf | grep [x]vda root 2983 0.0 0.0 0 0 ? S 01:44 0:00 \_ [1.xvda1-1] root 5457 0.0 0.0 0 0 ? S 02:06 0:00 \_ [3.xvda1-1] root 7382 0.0 0.0 0 0 ? S 02:36 0:00 \_ [4.xvda1-1] root 9668 0.0 0.0 0 0 ? S 02:51 0:00 \_ [6.xvda1-1] root 11080 0.0 0.0 0 0 ? S 02:57 0:00 \_ [7.xvda1-1] # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 1512 2 r----- 118.5 (null) 1 8 4 --p--d 43.8 (null) 3 8 4 --p--d 6.3 (null) 4 8 4 --p--d 73.4 (null) 6 8 4 --p--d 14.7 (null) 7 8 4 --p--d 30 Those all have... [root 11080]# cat wchan xen_blkif_schedule [root 11080]# cat stack [<ffffffff814eaee8>] xen_blkif_schedule+0x418/0xb40 [<ffffffff810a0555>] kthread+0xe5/0x100 [<ffffffff816f1c45>] ret_from_fork+0x25/0x30 [<ffffffffffffffff>] 0xffffffffffffffffAnd found another reference count bug. Would you like to give theattached patch (to be applied additionally to the previous ones) a try?JuergenThis seems to have solved the issue in 4.9.28, with all three patches applied. Awesome! On my main test machine I can no longer replicate what I was originally seeing, and in dmesg I now see this flow... xen-blkback: xen_blkif_disconnect: busy xen-blkback: xen_blkif_free: delayed = 1 xen-blkback: xen_blkif_free: delayed = 0xl list is clean, xenstore looks right. No extraneous processes left over.Thankyou Juergen, so much. Really appreciate your persistence with this. Anything I can do to help push this upstream please let me know. Feel free to add a reported-by line with my name if you think it appropriate. This is good news.Juergen, Can I request a full patch set posted to the list (plz CC me) - and I'll ensure we can build the kernel with all 3 (?) patches applied and test properly. I'll build up a complete kernel with those patches and give a tested-by if all goes well. -- Steven Haigh Email: netwiz@xxxxxxxxx Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |