[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Linux kernel tmem regression v4.1 -> v4.4
Hi,I am trying to migrate my domU instances from v4.1.44 to v4.4.88 and it seems that whether or not e820_host = 1 in the domU configuration is the cause of the following stack trace. Please note I have #define MC_DEBUG 1 in arch/x86/xen/multicall.c so the failed hypervisor call is logged. I'm unsure which side of the kernel/xen boundary this really falls. Sep 25 22:02:50 [kernel] 1 multicall(s) failed: cpu 0Sep 25 22:02:50 [kernel] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 4.4.88 #157 Sep 25 22:02:50 [kernel] Workqueue: events balloon_processSep 25 22:02:50 [kernel] 0000000000000000 ffff88001e31fa78 ffffffff812f9a28 ffff88001f80a220 Sep 25 22:02:50 [kernel] ffff88001f80a238 ffff88001e31fab0 ffffffff81004d79 0000000000115bb7 Sep 25 22:02:50 [kernel] ffff88001f80a270 ffff88001f80b330 ffff880195bb7000 0000000000000000 Sep 25 22:02:50 [kernel] Call Trace: Sep 25 22:02:50 [kernel] [<ffffffff812f9a28>] dump_stack+0x61/0x7e Sep 25 22:02:50 [kernel] [<ffffffff81004d79>] xen_mc_flush+0xfd/0x1a0 Sep 25 22:02:50 [kernel] [<ffffffff81006be5>] xen_alloc_pte+0x176/0x18e Sep 25 22:02:50 [kernel] [<ffffffff8154521b>] phys_pmd_init+0x23c/0x2af Sep 25 22:02:50 [kernel] [<ffffffff8154549b>] phys_pud_init+0x20d/0x2d4Sep 25 22:02:50 [kernel] [<ffffffff81546022>] kernel_physical_mapping_init+0x15e/0x233 Sep 25 22:02:50 [kernel] [<ffffffff81542694>] init_memory_mapping+0x1c7/0x264 Sep 25 22:02:50 [kernel] [<ffffffff810411be>] arch_add_memory+0x50/0xdaSep 25 22:02:50 [kernel] [<ffffffff81543191>] add_memory_resource+0x9c/0x12d Sep 25 22:02:50 [kernel] [<ffffffff8137462f>] reserve_additional_memory+0x125/0x16b Sep 25 22:02:50 [kernel] [<ffffffff8137482d>] balloon_process+0x1b8/0x2c5 Sep 25 22:02:50 [kernel] [<ffffffff8107df27>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e Sep 25 22:02:50 [kernel] [<ffffffff81060c18>] process_one_work+0x19d/0x2a9 Sep 25 22:02:50 [kernel] [<ffffffff8106162a>] worker_thread+0x27d/0x36eSep 25 22:02:50 [kernel] [<ffffffff810613ad>] ? rescuer_thread+0x2a2/0x2a2 Sep 25 22:02:50 [kernel] [<ffffffff8106575b>] kthread+0xda/0xe2Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ? kthread_worker_fn+0x13f/0x13f Sep 25 22:02:50 [kernel] [<ffffffff8154c57f>] ret_from_fork+0x3f/0x70Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ? kthread_worker_fn+0x13f/0x13f Sep 25 22:02:50 [kernel] call 1/2: op=14 arg=[ffff880115bb7000] result=0_xen_alloc_pte+0x81/0x18e Sep 25 22:02:50 [kernel] call 2/2: op=26 arg=[ffff88001f80b330] result=-1_xen_alloc_pte+0xd7/0x18e Sep 25 22:02:50 [kernel] ------------[ cut here ]------------xen version is 4.8.1-r3 from Gentoo, dom0 is 4.1.44. I have seen the same trace logged in an Ubuntu 16.04 guest with a 4.4 kernel. I don't have a specific test case which triggers this but it will usually appear within 24 hours but it depends on how much work the domU has been performing (so probably how much ballooning it has been doing). Setting e820_host = 0 in the config seems to prevent this happening. In the kernel git log v4.1.44..v4.4.89 -- :/arch/x86/xen/mmu.c shows some commits which seem to relate to the failed hypervisor operation and working round the e820 map. I have not done a bisect to try and isolate this more definitively. I suspect this could be a more general balloon issue but perhaps is revealed with tmem more easily as the rate of ballooning up/down is higher than occasional manual changes. This is the guest /proc/iomem with e820_host = 0: KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017 TMEM MODULE PARAMS: /sys/module/tmem/parameters/cleancache: Y /sys/module/tmem/parameters/frontswap: Y /sys/module/tmem/parameters/selfballooning: Y /sys/module/tmem/parameters/selfshrinking: YKERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192 real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem /proc/iomem: 00000000-00000fff : reserved 00001000-0009ffff : System RAM 000a0000-000fffff : reserved 000f0000-000fffff : System ROM 00100000-3fffffff : System RAM 01000000-015509ad : Kernel code 015509ae-01807ebf : Kernel data 01914000-019c1fff : Kernel bss fee00000-fee00fff : Local APIC And with e820_host = 1: KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017 TMEM MODULE PARAMS: /sys/module/tmem/parameters/cleancache: Y /sys/module/tmem/parameters/frontswap: Y /sys/module/tmem/parameters/selfballooning: Y /sys/module/tmem/parameters/selfshrinking: YKERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192 real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem /proc/iomem: 00000000-00000fff : reserved 00001000-0009ffff : System RAM 000a0000-000fffff : reserved 000f0000-000fffff : System ROM 00100000-1fffffff : System RAM 01000000-015509ad : Kernel code 015509ae-01807ebf : Kernel data 01914000-019c1fff : Kernel bss 20000000-d7feffff : Unusable memory d7ff0000-d7ffdfff : ACPI Tables d7ffe000-d7ffffff : ACPI Non-volatile Storage fee00000-fee00fff : Local APIC 100000000-11fffffff : System RAM If other information about the environment is useful please let me know. Thanks, James _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |