[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Linux kernel tmem regression v4.1 -> v4.4



Hi,

I am trying to migrate my domU instances from v4.1.44 to v4.4.88 and it seems that whether or not e820_host = 1 in the domU configuration is the cause of the following stack trace. Please note I have #define MC_DEBUG 1 in arch/x86/xen/multicall.c so the failed hypervisor call is logged. I'm unsure which side of the kernel/xen boundary this really falls.

Sep 25 22:02:50 [kernel] 1 multicall(s) failed: cpu 0
Sep 25 22:02:50 [kernel] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 4.4.88 #157
Sep 25 22:02:50 [kernel] Workqueue: events balloon_process
Sep 25 22:02:50 [kernel] 0000000000000000 ffff88001e31fa78 ffffffff812f9a28 ffff88001f80a220 Sep 25 22:02:50 [kernel] ffff88001f80a238 ffff88001e31fab0 ffffffff81004d79 0000000000115bb7 Sep 25 22:02:50 [kernel] ffff88001f80a270 ffff88001f80b330 ffff880195bb7000 0000000000000000
Sep 25 22:02:50 [kernel] Call Trace:
Sep 25 22:02:50 [kernel]  [<ffffffff812f9a28>] dump_stack+0x61/0x7e
Sep 25 22:02:50 [kernel]  [<ffffffff81004d79>] xen_mc_flush+0xfd/0x1a0
Sep 25 22:02:50 [kernel]  [<ffffffff81006be5>] xen_alloc_pte+0x176/0x18e
Sep 25 22:02:50 [kernel]  [<ffffffff8154521b>] phys_pmd_init+0x23c/0x2af
Sep 25 22:02:50 [kernel]  [<ffffffff8154549b>] phys_pud_init+0x20d/0x2d4
Sep 25 22:02:50 [kernel] [<ffffffff81546022>] kernel_physical_mapping_init+0x15e/0x233 Sep 25 22:02:50 [kernel] [<ffffffff81542694>] init_memory_mapping+0x1c7/0x264
Sep 25 22:02:50 [kernel]  [<ffffffff810411be>] arch_add_memory+0x50/0xda
Sep 25 22:02:50 [kernel] [<ffffffff81543191>] add_memory_resource+0x9c/0x12d Sep 25 22:02:50 [kernel] [<ffffffff8137462f>] reserve_additional_memory+0x125/0x16b Sep 25 22:02:50 [kernel] [<ffffffff8137482d>] balloon_process+0x1b8/0x2c5 Sep 25 22:02:50 [kernel] [<ffffffff8107df27>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e Sep 25 22:02:50 [kernel] [<ffffffff81060c18>] process_one_work+0x19d/0x2a9
Sep 25 22:02:50 [kernel]  [<ffffffff8106162a>] worker_thread+0x27d/0x36e
Sep 25 22:02:50 [kernel] [<ffffffff810613ad>] ? rescuer_thread+0x2a2/0x2a2
Sep 25 22:02:50 [kernel]  [<ffffffff8106575b>] kthread+0xda/0xe2
Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ? kthread_worker_fn+0x13f/0x13f
Sep 25 22:02:50 [kernel]  [<ffffffff8154c57f>] ret_from_fork+0x3f/0x70
Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ? kthread_worker_fn+0x13f/0x13f Sep 25 22:02:50 [kernel] call 1/2: op=14 arg=[ffff880115bb7000] result=0_xen_alloc_pte+0x81/0x18e Sep 25 22:02:50 [kernel] call 2/2: op=26 arg=[ffff88001f80b330] result=-1_xen_alloc_pte+0xd7/0x18e
Sep 25 22:02:50 [kernel] ------------[ cut here ]------------


xen version is 4.8.1-r3 from Gentoo, dom0 is 4.1.44. I have seen the same trace logged in an Ubuntu 16.04 guest with a 4.4 kernel. I don't have a specific test case which triggers this but it will usually appear within 24 hours but it depends on how much work the domU has been performing (so probably how much ballooning it has been doing). Setting e820_host = 0 in the config seems to prevent this happening.

In the kernel git log v4.1.44..v4.4.89 -- :/arch/x86/xen/mmu.c shows some commits which seem to relate to the failed hypervisor operation and working round the e820 map. I have not done a bisect to try and isolate this more definitively. I suspect this could be a more general balloon issue but perhaps is revealed with tmem more easily as the rate of ballooning up/down is higher than occasional manual changes.

This is the guest /proc/iomem with e820_host = 0:

KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
TMEM MODULE PARAMS:
/sys/module/tmem/parameters/cleancache: Y
/sys/module/tmem/parameters/frontswap: Y
/sys/module/tmem/parameters/selfballooning: Y
/sys/module/tmem/parameters/selfshrinking: Y
KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192 real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
/proc/iomem:
00000000-00000fff : reserved
00001000-0009ffff : System RAM
000a0000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-3fffffff : System RAM
  01000000-015509ad : Kernel code
  015509ae-01807ebf : Kernel data
  01914000-019c1fff : Kernel bss
fee00000-fee00fff : Local APIC

And with e820_host = 1:

KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
TMEM MODULE PARAMS:
/sys/module/tmem/parameters/cleancache: Y
/sys/module/tmem/parameters/frontswap: Y
/sys/module/tmem/parameters/selfballooning: Y
/sys/module/tmem/parameters/selfshrinking: Y
KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192 real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
/proc/iomem:
00000000-00000fff : reserved
00001000-0009ffff : System RAM
000a0000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-1fffffff : System RAM
  01000000-015509ad : Kernel code
  015509ae-01807ebf : Kernel data
  01914000-019c1fff : Kernel bss
20000000-d7feffff : Unusable memory
d7ff0000-d7ffdfff : ACPI Tables
d7ffe000-d7ffffff : ACPI Non-volatile Storage
fee00000-fee00fff : Local APIC
100000000-11fffffff : System RAM


If other information about the environment is useful please let me know.

Thanks,
James

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.