[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Linux kernel tmem regression v4.1 -> v4.4



On 28/09/17 15:31, Juergen Gross wrote:
> On 28/09/17 10:42, James Dingwall wrote:
>> Hi,
>>
>> I am trying to migrate my domU instances from v4.1.44 to v4.4.88 and it
>> seems that whether or not e820_host = 1 in the domU configuration is the
>> cause of the following stack trace.  Please note I have #define MC_DEBUG
>> 1 in arch/x86/xen/multicall.c so the failed hypervisor call is logged. 
>> I'm unsure which side of the kernel/xen boundary this really falls.
>>
>> Sep 25 22:02:50 [kernel] 1 multicall(s) failed: cpu 0
>> Sep 25 22:02:50 [kernel] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted
>> 4.4.88 #157
>> Sep 25 22:02:50 [kernel] Workqueue: events balloon_process
>> Sep 25 22:02:50 [kernel]  0000000000000000 ffff88001e31fa78
>> ffffffff812f9a28 ffff88001f80a220
>> Sep 25 22:02:50 [kernel]  ffff88001f80a238 ffff88001e31fab0
>> ffffffff81004d79 0000000000115bb7
>> Sep 25 22:02:50 [kernel]  ffff88001f80a270 ffff88001f80b330
>> ffff880195bb7000 0000000000000000
>> Sep 25 22:02:50 [kernel] Call Trace:
>> Sep 25 22:02:50 [kernel]  [<ffffffff812f9a28>] dump_stack+0x61/0x7e
>> Sep 25 22:02:50 [kernel]  [<ffffffff81004d79>] xen_mc_flush+0xfd/0x1a0
>> Sep 25 22:02:50 [kernel]  [<ffffffff81006be5>] xen_alloc_pte+0x176/0x18e
>> Sep 25 22:02:50 [kernel]  [<ffffffff8154521b>] phys_pmd_init+0x23c/0x2af
>> Sep 25 22:02:50 [kernel]  [<ffffffff8154549b>] phys_pud_init+0x20d/0x2d4
>> Sep 25 22:02:50 [kernel]  [<ffffffff81546022>]
>> kernel_physical_mapping_init+0x15e/0x233
>> Sep 25 22:02:50 [kernel]  [<ffffffff81542694>]
>> init_memory_mapping+0x1c7/0x264
>> Sep 25 22:02:50 [kernel]  [<ffffffff810411be>] arch_add_memory+0x50/0xda
>> Sep 25 22:02:50 [kernel]  [<ffffffff81543191>]
>> add_memory_resource+0x9c/0x12d
>> Sep 25 22:02:50 [kernel]  [<ffffffff8137462f>]
>> reserve_additional_memory+0x125/0x16b
>> Sep 25 22:02:50 [kernel]  [<ffffffff8137482d>] balloon_process+0x1b8/0x2c5
>> Sep 25 22:02:50 [kernel]  [<ffffffff8107df27>] ?
>> __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
>> Sep 25 22:02:50 [kernel]  [<ffffffff81060c18>] process_one_work+0x19d/0x2a9
>> Sep 25 22:02:50 [kernel]  [<ffffffff8106162a>] worker_thread+0x27d/0x36e
>> Sep 25 22:02:50 [kernel]  [<ffffffff810613ad>] ? rescuer_thread+0x2a2/0x2a2
>> Sep 25 22:02:50 [kernel]  [<ffffffff8106575b>] kthread+0xda/0xe2
>> Sep 25 22:02:50 [kernel]  [<ffffffff81065681>] ?
>> kthread_worker_fn+0x13f/0x13f
>> Sep 25 22:02:50 [kernel]  [<ffffffff8154c57f>] ret_from_fork+0x3f/0x70
>> Sep 25 22:02:50 [kernel]  [<ffffffff81065681>] ?
>> kthread_worker_fn+0x13f/0x13f
>> Sep 25 22:02:50 [kernel]   call  1/2: op=14 arg=[ffff880115bb7000]
>> result=0_xen_alloc_pte+0x81/0x18e
>> Sep 25 22:02:50 [kernel]   call  2/2: op=26 arg=[ffff88001f80b330]
>> result=-1_xen_alloc_pte+0xd7/0x18e
>> Sep 25 22:02:50 [kernel] ------------[ cut here ]------------
>>
>>
>> xen version is 4.8.1-r3 from Gentoo, dom0 is 4.1.44.  I have seen the
>> same trace logged in an Ubuntu 16.04 guest with a 4.4 kernel.  I don't
>> have a specific test case which triggers this but it will usually appear
>> within 24 hours but it depends on how much work the domU has been
>> performing (so probably how much ballooning it has been doing).  Setting
>> e820_host = 0 in the config seems to prevent this happening.
>>
>> In the kernel git log v4.1.44..v4.4.89 -- :/arch/x86/xen/mmu.c shows
>> some commits which seem to relate to the failed hypervisor operation and
>> working round the e820 map.  I have not done a bisect to try and isolate
>> this more definitively.  I suspect this could be a more general balloon
>> issue but perhaps is revealed with tmem more easily as the rate of
>> ballooning up/down is higher than occasional manual changes.
>>
>> This is the guest /proc/iomem with e820_host = 0:
>>
>> KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
>> TMEM MODULE PARAMS:
>> /sys/module/tmem/parameters/cleancache: Y
>> /sys/module/tmem/parameters/frontswap: Y
>> /sys/module/tmem/parameters/selfballooning: Y
>> /sys/module/tmem/parameters/selfshrinking: Y
>> KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
>> real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
>> /proc/iomem:
>> 00000000-00000fff : reserved
>> 00001000-0009ffff : System RAM
>> 000a0000-000fffff : reserved
>>   000f0000-000fffff : System ROM
>> 00100000-3fffffff : System RAM
>>   01000000-015509ad : Kernel code
>>   015509ae-01807ebf : Kernel data
>>   01914000-019c1fff : Kernel bss
>> fee00000-fee00fff : Local APIC
>>
>> And with e820_host = 1:
>>
>> KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
>> TMEM MODULE PARAMS:
>> /sys/module/tmem/parameters/cleancache: Y
>> /sys/module/tmem/parameters/frontswap: Y
>> /sys/module/tmem/parameters/selfballooning: Y
>> /sys/module/tmem/parameters/selfshrinking: Y
>> KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
>> real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
>> /proc/iomem:
>> 00000000-00000fff : reserved
>> 00001000-0009ffff : System RAM
>> 000a0000-000fffff : reserved
>>   000f0000-000fffff : System ROM
>> 00100000-1fffffff : System RAM
>>   01000000-015509ad : Kernel code
>>   015509ae-01807ebf : Kernel data
>>   01914000-019c1fff : Kernel bss
>> 20000000-d7feffff : Unusable memory
>> d7ff0000-d7ffdfff : ACPI Tables
>> d7ffe000-d7ffffff : ACPI Non-volatile Storage
>> fee00000-fee00fff : Local APIC
>> 100000000-11fffffff : System RAM
>>
>>
>> If other information about the environment is useful please let me know.
> 
> Cc-ing Konrad, who should be much more familiar with tmem than I am.

Strange, in my sent folder Konrad was still on Cc:

Trying again.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.