[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Linux kernel tmem regression v4.1 -> v4.4
On 28/09/17 15:31, Juergen Gross wrote: > On 28/09/17 10:42, James Dingwall wrote: >> Hi, >> >> I am trying to migrate my domU instances from v4.1.44 to v4.4.88 and it >> seems that whether or not e820_host = 1 in the domU configuration is the >> cause of the following stack trace. Please note I have #define MC_DEBUG >> 1 in arch/x86/xen/multicall.c so the failed hypervisor call is logged. >> I'm unsure which side of the kernel/xen boundary this really falls. >> >> Sep 25 22:02:50 [kernel] 1 multicall(s) failed: cpu 0 >> Sep 25 22:02:50 [kernel] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted >> 4.4.88 #157 >> Sep 25 22:02:50 [kernel] Workqueue: events balloon_process >> Sep 25 22:02:50 [kernel] 0000000000000000 ffff88001e31fa78 >> ffffffff812f9a28 ffff88001f80a220 >> Sep 25 22:02:50 [kernel] ffff88001f80a238 ffff88001e31fab0 >> ffffffff81004d79 0000000000115bb7 >> Sep 25 22:02:50 [kernel] ffff88001f80a270 ffff88001f80b330 >> ffff880195bb7000 0000000000000000 >> Sep 25 22:02:50 [kernel] Call Trace: >> Sep 25 22:02:50 [kernel] [<ffffffff812f9a28>] dump_stack+0x61/0x7e >> Sep 25 22:02:50 [kernel] [<ffffffff81004d79>] xen_mc_flush+0xfd/0x1a0 >> Sep 25 22:02:50 [kernel] [<ffffffff81006be5>] xen_alloc_pte+0x176/0x18e >> Sep 25 22:02:50 [kernel] [<ffffffff8154521b>] phys_pmd_init+0x23c/0x2af >> Sep 25 22:02:50 [kernel] [<ffffffff8154549b>] phys_pud_init+0x20d/0x2d4 >> Sep 25 22:02:50 [kernel] [<ffffffff81546022>] >> kernel_physical_mapping_init+0x15e/0x233 >> Sep 25 22:02:50 [kernel] [<ffffffff81542694>] >> init_memory_mapping+0x1c7/0x264 >> Sep 25 22:02:50 [kernel] [<ffffffff810411be>] arch_add_memory+0x50/0xda >> Sep 25 22:02:50 [kernel] [<ffffffff81543191>] >> add_memory_resource+0x9c/0x12d >> Sep 25 22:02:50 [kernel] [<ffffffff8137462f>] >> reserve_additional_memory+0x125/0x16b >> Sep 25 22:02:50 [kernel] [<ffffffff8137482d>] balloon_process+0x1b8/0x2c5 >> Sep 25 22:02:50 [kernel] [<ffffffff8107df27>] ? >> __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e >> Sep 25 22:02:50 [kernel] [<ffffffff81060c18>] process_one_work+0x19d/0x2a9 >> Sep 25 22:02:50 [kernel] [<ffffffff8106162a>] worker_thread+0x27d/0x36e >> Sep 25 22:02:50 [kernel] [<ffffffff810613ad>] ? rescuer_thread+0x2a2/0x2a2 >> Sep 25 22:02:50 [kernel] [<ffffffff8106575b>] kthread+0xda/0xe2 >> Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ? >> kthread_worker_fn+0x13f/0x13f >> Sep 25 22:02:50 [kernel] [<ffffffff8154c57f>] ret_from_fork+0x3f/0x70 >> Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ? >> kthread_worker_fn+0x13f/0x13f >> Sep 25 22:02:50 [kernel] call 1/2: op=14 arg=[ffff880115bb7000] >> result=0_xen_alloc_pte+0x81/0x18e >> Sep 25 22:02:50 [kernel] call 2/2: op=26 arg=[ffff88001f80b330] >> result=-1_xen_alloc_pte+0xd7/0x18e >> Sep 25 22:02:50 [kernel] ------------[ cut here ]------------ >> >> >> xen version is 4.8.1-r3 from Gentoo, dom0 is 4.1.44. I have seen the >> same trace logged in an Ubuntu 16.04 guest with a 4.4 kernel. I don't >> have a specific test case which triggers this but it will usually appear >> within 24 hours but it depends on how much work the domU has been >> performing (so probably how much ballooning it has been doing). Setting >> e820_host = 0 in the config seems to prevent this happening. >> >> In the kernel git log v4.1.44..v4.4.89 -- :/arch/x86/xen/mmu.c shows >> some commits which seem to relate to the failed hypervisor operation and >> working round the e820 map. I have not done a bisect to try and isolate >> this more definitively. I suspect this could be a more general balloon >> issue but perhaps is revealed with tmem more easily as the rate of >> ballooning up/down is higher than occasional manual changes. >> >> This is the guest /proc/iomem with e820_host = 0: >> >> KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017 >> TMEM MODULE PARAMS: >> /sys/module/tmem/parameters/cleancache: Y >> /sys/module/tmem/parameters/frontswap: Y >> /sys/module/tmem/parameters/selfballooning: Y >> /sys/module/tmem/parameters/selfshrinking: Y >> KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192 >> real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem >> /proc/iomem: >> 00000000-00000fff : reserved >> 00001000-0009ffff : System RAM >> 000a0000-000fffff : reserved >> 000f0000-000fffff : System ROM >> 00100000-3fffffff : System RAM >> 01000000-015509ad : Kernel code >> 015509ae-01807ebf : Kernel data >> 01914000-019c1fff : Kernel bss >> fee00000-fee00fff : Local APIC >> >> And with e820_host = 1: >> >> KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017 >> TMEM MODULE PARAMS: >> /sys/module/tmem/parameters/cleancache: Y >> /sys/module/tmem/parameters/frontswap: Y >> /sys/module/tmem/parameters/selfballooning: Y >> /sys/module/tmem/parameters/selfshrinking: Y >> KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192 >> real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem >> /proc/iomem: >> 00000000-00000fff : reserved >> 00001000-0009ffff : System RAM >> 000a0000-000fffff : reserved >> 000f0000-000fffff : System ROM >> 00100000-1fffffff : System RAM >> 01000000-015509ad : Kernel code >> 015509ae-01807ebf : Kernel data >> 01914000-019c1fff : Kernel bss >> 20000000-d7feffff : Unusable memory >> d7ff0000-d7ffdfff : ACPI Tables >> d7ffe000-d7ffffff : ACPI Non-volatile Storage >> fee00000-fee00fff : Local APIC >> 100000000-11fffffff : System RAM >> >> >> If other information about the environment is useful please let me know. > > Cc-ing Konrad, who should be much more familiar with tmem than I am. Strange, in my sent folder Konrad was still on Cc: Trying again. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |