[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] xen domU stall on 4.12.1
Hello,
I've tried upgrading one of my long running Xen-dom0 machines from Xen 4.11.3 to 4.12.1. It's been working fine for several days, but after that one of the domUs failed the monitoring checks and it was impossible to access it via ssh. From the monitoring it's visible that the load just starts to grow linearly (the memory consumption grows too as new monitoring processes are spawned and don't finish) and the machine is simply stuck. It happened 3 times during the last 3 weeks. The Dom0 is working fine, just one of the domUs is stuck (always the same domU is stuck). Xen got upgraded 19.12.2019, the first lockup happened 26.12.2019, then 28.12.2019 and 5.1.2020. The domU kernel log is full of these messages: Jan 5 13:19:20 kernel: [680493.141103] INFO: rcu_sched detected stalls on CPUs/tasks: Jan 5 13:19:20 kernel: [680493.141107] (detected by 12, t=147012 jiffies, g=72555998, c=72555997, q=89937) Jan 5 13:19:20 kernel: [680493.141112] All QSes seen, last rcu_sched kthread activity 147012 (4975178416-4975031404), jiffies_till_next_fqs=3, root ->qsmask 0x0 Jan 5 13:19:20 kernel: [680493.141114] php-fpm R running task 14024 17581 2249 0x00000000 Jan 5 13:19:20 kernel: [680493.141120] Call Trace: Jan 5 13:19:20 kernel: [680493.141124] <IRQ> Jan 5 13:19:20 kernel: [680493.141131] sched_show_task.cold+0xb4/0xcb Jan 5 13:19:20 kernel: [680493.141135] rcu_check_callbacks.cold+0x36d/0x3ba Jan 5 13:19:20 kernel: [680493.141138] update_process_times+0x24/0x60 Jan 5 13:19:20 kernel: [680493.141143] tick_sched_handle+0x30/0x50 Jan 5 13:19:20 kernel: [680493.141145] tick_sched_timer+0x30/0x70 Jan 5 13:19:20 kernel: [680493.141147] ? tick_sched_do_timer+0x40/0x40 Jan 5 13:19:20 kernel: [680493.141149] __hrtimer_run_queues+0xbc/0x1f0 Jan 5 13:19:20 kernel: [680493.141153] hrtimer_interrupt+0xa0/0x1d0 Jan 5 13:19:20 kernel: [680493.141158] xen_timer_interrupt+0x1e/0x30 Jan 5 13:19:20 kernel: [680493.141162] __handle_irq_event_percpu+0x3d/0x160 Jan 5 13:19:20 kernel: [680493.141164] handle_irq_event_percpu+0x1c/0x60 Jan 5 13:19:20 kernel: [680493.141168] handle_percpu_irq+0x32/0x50 Jan 5 13:19:20 kernel: [680493.141171] generic_handle_irq+0x1f/0x30 Jan 5 13:19:20 kernel: [680493.141175] __evtchn_fifo_handle_events+0x13f/0x150 Jan 5 13:19:20 kernel: [680493.141181] __xen_evtchn_do_upcall+0x53/0x90 Jan 5 13:19:20 kernel: [680493.141186] xen_evtchn_do_upcall+0x22/0x40 Jan 5 13:19:20 kernel: [680493.141191] xen_hvm_callback_vector+0x85/0x90 Jan 5 13:19:20 kernel: [680493.141192] </IRQ> Jan 5 13:19:20 kernel: [680493.141194] RIP: 0033:0x56398dc8a959 Jan 5 13:19:20 kernel: [680493.141195] RSP: 002b:00007ffdd588d3d0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c Jan 5 13:19:20 kernel: [680493.141197] RAX: 0000000000000060 RBX: 00007f6ea3aa02e0 RCX: 0000000000000000 Jan 5 13:19:20 kernel: [680493.141198] RDX: 00007f6ea3aa02a0 RSI: 00007ffdd588d3d8 RDI: 00007ffdd588d3e0 Jan 5 13:19:20 kernel: [680493.141199] RBP: 00007f6ea3a9b5b0 R08: 00007f6ea49be770 R09: 00007f6ea483cdc0 Jan 5 13:19:20 kernel: [680493.141200] R10: 00007f6eae520a40 R11: 00007f6eae4933c0 R12: 00005639902892a0 Jan 5 13:19:20 kernel: [680493.141201] R13: 0000000000000000 R14: 00007f6eae41e930 R15: 00007f6ea7077138 Jan 5 13:19:20 kernel: [680493.141204] rcu_sched kthread starved for 147012 jiffies! g72555998 c72555997 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x200 ->cpu=3 Jan 5 13:19:20 kernel: [680493.141205] rcu_sched R15016 8 2 0x80000000 Jan 5 13:19:20 kernel: [680493.141210] Call Trace: Jan 5 13:19:20 kernel: [680493.141215] ? __schedule+0x24e/0x710 Jan 5 13:19:20 kernel: [680493.141216] schedule+0x2d/0x80 Jan 5 13:19:20 kernel: [680493.141219] schedule_timeout+0x16c/0x340 Jan 5 13:19:20 kernel: [680493.141221] ? call_timer_fn+0x130/0x130 Jan 5 13:19:20 kernel: [680493.141222] rcu_gp_kthread+0x486/0xd60 Jan 5 13:19:20 kernel: [680493.141224] kthread+0xfd/0x130 Jan 5 13:19:20 kernel: [680493.141226] ? force_qs_rnp+0x170/0x170 Jan 5 13:19:20 kernel: [680493.141227] ? __kthread_parkme+0x90/0x90 Jan 5 13:19:20 kernel: [680493.141228] ret_from_fork+0x35/0x40 Xen-dom0 is running kernel 4.14.158 and these xen command line options: GRUB_CMDLINE_XEN="dom0_mem=4G gnttab_max_frames=256 ucode=scan loglvl=all guest_loglvl=all console_to_ring console_timestamps=date conring_size=1m smt=true iommu=no-intremap" Xen-domU config: name = "machine" kernel = "kernel-4.14.159-gentoo-xen" memory = 10000 vcpus = 16 vif = [ '' ] disk = [ '...root,raw,xvda,rw', '...opt,raw,xvdc,rw', '...home,raw,xvdb,rw', '...tmp,raw,xvdd,rw', '...var,raw,xvde,rw', ] extra = "root=/dev/xvda net.ifnames=0 console=ttyS0 console=ttyS0,38400n8" type = "hvm" sdl = 0 vnc = 0 serial='pty' xen_platform_pci=1 max_grant_frames = 256 I've had issues like this in the past with the grant frames (basically this issue https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=880554), maybe some other value needs to be raised too? Thanks, Tomas _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |