Xen project Mailing List

Re: [Xen-devel] [PATCH] mmap_vmcore: skip non-ram pages reported by hypervisors

To: David Vrabel <david.vrabel@xxxxxxxxxx>

From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>

Date: Wed, 09 Jul 2014 11:17:13 +0200

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Michael Holzheu <holzheu@xxxxxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, Vivek Goyal <vgoyal@xxxxxxxxxx>

Delivery-date: Wed, 09 Jul 2014 09:17:44 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

David Vrabel <david.vrabel@xxxxxxxxxx> writes: > On 07/07/14 21:33, Andrew Morton wrote: >> On Mon, 7 Jul 2014 17:05:49 +0200 Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> >> wrote: >> >>> we have a special check in read_vmcore() handler to check if the page was >>> reported as ram or not by the hypervisor (pfn_is_ram()). However, when >>> vmcore is read with mmap() no such check is performed. That can lead to >>> unpredictable results, e.g. when running Xen PVHVM guest memcpy() after >>> mmap() on /proc/vmcore will hang processing HVMMEM_mmio_dm pages creating >>> enormous load in both DomU and Dom0. > > Does make forward progress though? Or is it ending up in a repeatedly > retrying the same instruction? If memcpy is using SSE2 optimization 16-byte 'movdqu' instruction never finishes (repeatedly retrying to issue two 8-byte requests to qemu-dm). qemu-dm decides that it's hitting 'Neither RAM nor known MMIO space' and returns 8 0xff bytes for both of this requests (I was testing with qemu-traditional). > > Is it failing on a ballooned page in a RAM region? Or is mapping non-RAM > regions as well? I wasn't using ballooning, it happens that oldmem has several (two in my test) pages which are HVMMEM_mmio_dm but qemu-dm considers them being neither ram nor mmio. > >>> Fix the issue by mapping each non-ram page to the zero page. Keep direct >>> path with remap_oldmem_pfn_range() to avoid looping through all pages on >>> bare metal. >>> >>> The issue can also be solved by overriding remap_oldmem_pfn_range() in >>> xen-specific code, as remap_oldmem_pfn_range() was been designed for. >>> That, however, would involve non-obvious xen code path for all x86 builds >>> with CONFIG_XEN_PVHVM=y and would prevent all other hypervisor-specific >>> code on x86 arch from doing the same override. > > The oldmem_pfn_is_ram() is Xen-specific but this problem (ballooned > pages) must be common to KVM. How does KVM handle this? Is far as I'm concearned the issue was never hit with KVM. I *think* the issue has something to do with the conjunction of 16-byte 'movdqu' emulation for io pages in xen hypervisor, 8-byte event channel requests and qemu-traditional. But even if it gets fixed on hypervisor side I believe fixing the issue kernel-side still worth it as there are non-fixed hypervisors out there (e.g. AWS EC2). > > David -- Vitaly _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.