Xen project Mailing List

Re: [Xen-devel] Linux kernel tmem regression v4.1 -> v4.4

To: James Dingwall <james-xen@xxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

From: Juergen Gross <jgross@xxxxxxxx>

Date: Thu, 28 Sep 2017 15:31:51 +0200

Delivery-date: Thu, 28 Sep 2017 16:01:35 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 28/09/17 10:42, James Dingwall wrote: > Hi, > > I am trying to migrate my domU instances from v4.1.44 to v4.4.88 and it > seems that whether or not e820_host = 1 in the domU configuration is the > cause of the following stack trace. Please note I have #define MC_DEBUG > 1 in arch/x86/xen/multicall.c so the failed hypervisor call is logged. > I'm unsure which side of the kernel/xen boundary this really falls. > > Sep 25 22:02:50 [kernel] 1 multicall(s) failed: cpu 0 > Sep 25 22:02:50 [kernel] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted > 4.4.88 #157 > Sep 25 22:02:50 [kernel] Workqueue: events balloon_process > Sep 25 22:02:50 [kernel] 0000000000000000 ffff88001e31fa78 > ffffffff812f9a28 ffff88001f80a220 > Sep 25 22:02:50 [kernel] ffff88001f80a238 ffff88001e31fab0 > ffffffff81004d79 0000000000115bb7 > Sep 25 22:02:50 [kernel] ffff88001f80a270 ffff88001f80b330 > ffff880195bb7000 0000000000000000 > Sep 25 22:02:50 [kernel] Call Trace: > Sep 25 22:02:50 [kernel] [<ffffffff812f9a28>] dump_stack+0x61/0x7e > Sep 25 22:02:50 [kernel] [<ffffffff81004d79>] xen_mc_flush+0xfd/0x1a0 > Sep 25 22:02:50 [kernel] [<ffffffff81006be5>] xen_alloc_pte+0x176/0x18e > Sep 25 22:02:50 [kernel] [<ffffffff8154521b>] phys_pmd_init+0x23c/0x2af > Sep 25 22:02:50 [kernel] [<ffffffff8154549b>] phys_pud_init+0x20d/0x2d4 > Sep 25 22:02:50 [kernel] [<ffffffff81546022>] > kernel_physical_mapping_init+0x15e/0x233 > Sep 25 22:02:50 [kernel] [<ffffffff81542694>] > init_memory_mapping+0x1c7/0x264 > Sep 25 22:02:50 [kernel] [<ffffffff810411be>] arch_add_memory+0x50/0xda > Sep 25 22:02:50 [kernel] [<ffffffff81543191>] > add_memory_resource+0x9c/0x12d > Sep 25 22:02:50 [kernel] [<ffffffff8137462f>] > reserve_additional_memory+0x125/0x16b > Sep 25 22:02:50 [kernel] [<ffffffff8137482d>] balloon_process+0x1b8/0x2c5 > Sep 25 22:02:50 [kernel] [<ffffffff8107df27>] ? > __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e > Sep 25 22:02:50 [kernel] [<ffffffff81060c18>] process_one_work+0x19d/0x2a9 > Sep 25 22:02:50 [kernel] [<ffffffff8106162a>] worker_thread+0x27d/0x36e > Sep 25 22:02:50 [kernel] [<ffffffff810613ad>] ? rescuer_thread+0x2a2/0x2a2 > Sep 25 22:02:50 [kernel] [<ffffffff8106575b>] kthread+0xda/0xe2 > Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ? > kthread_worker_fn+0x13f/0x13f > Sep 25 22:02:50 [kernel] [<ffffffff8154c57f>] ret_from_fork+0x3f/0x70 > Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ? > kthread_worker_fn+0x13f/0x13f > Sep 25 22:02:50 [kernel] call 1/2: op=14 arg=[ffff880115bb7000] > result=0_xen_alloc_pte+0x81/0x18e > Sep 25 22:02:50 [kernel] call 2/2: op=26 arg=[ffff88001f80b330] > result=-1_xen_alloc_pte+0xd7/0x18e > Sep 25 22:02:50 [kernel] ------------[ cut here ]------------ > > > xen version is 4.8.1-r3 from Gentoo, dom0 is 4.1.44. I have seen the > same trace logged in an Ubuntu 16.04 guest with a 4.4 kernel. I don't > have a specific test case which triggers this but it will usually appear > within 24 hours but it depends on how much work the domU has been > performing (so probably how much ballooning it has been doing). Setting > e820_host = 0 in the config seems to prevent this happening. > > In the kernel git log v4.1.44..v4.4.89 -- :/arch/x86/xen/mmu.c shows > some commits which seem to relate to the failed hypervisor operation and > working round the e820 map. I have not done a bisect to try and isolate > this more definitively. I suspect this could be a more general balloon > issue but perhaps is revealed with tmem more easily as the rate of > ballooning up/down is higher than occasional manual changes. > > This is the guest /proc/iomem with e820_host = 0: > > KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017 > TMEM MODULE PARAMS: > /sys/module/tmem/parameters/cleancache: Y > /sys/module/tmem/parameters/frontswap: Y > /sys/module/tmem/parameters/selfballooning: Y > /sys/module/tmem/parameters/selfshrinking: Y > KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192 > real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem > /proc/iomem: > 00000000-00000fff : reserved > 00001000-0009ffff : System RAM > 000a0000-000fffff : reserved > 000f0000-000fffff : System ROM > 00100000-3fffffff : System RAM > 01000000-015509ad : Kernel code > 015509ae-01807ebf : Kernel data > 01914000-019c1fff : Kernel bss > fee00000-fee00fff : Local APIC > > And with e820_host = 1: > > KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017 > TMEM MODULE PARAMS: > /sys/module/tmem/parameters/cleancache: Y > /sys/module/tmem/parameters/frontswap: Y > /sys/module/tmem/parameters/selfballooning: Y > /sys/module/tmem/parameters/selfshrinking: Y > KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192 > real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem > /proc/iomem: > 00000000-00000fff : reserved > 00001000-0009ffff : System RAM > 000a0000-000fffff : reserved > 000f0000-000fffff : System ROM > 00100000-1fffffff : System RAM > 01000000-015509ad : Kernel code > 015509ae-01807ebf : Kernel data > 01914000-019c1fff : Kernel bss > 20000000-d7feffff : Unusable memory > d7ff0000-d7ffdfff : ACPI Tables > d7ffe000-d7ffffff : ACPI Non-volatile Storage > fee00000-fee00fff : Local APIC > 100000000-11fffffff : System RAM > > > If other information about the environment is useful please let me know. Cc-ing Konrad, who should be much more familiar with tmem than I am. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.