[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Help with: Xen BUG at timer.c:189



Am Mittwoch, 7. Juni 2017, 04:28:33 schrieb Jan Beulich:
> >>> On 07.06.17 at 11:28, <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> > Am Freitag, 19. Mai 2017, 06:41:36 schrieb Jan Beulich:
> >> >>> On 19.05.17 at 11:52, <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> >> > I'am struggling with a hypervisor panic. The hypervisor version is 4.4.3,
> >> > yes I know - very old ;-), but the affected code hasn't much changed.
> >> 
> >> Well, at the very least I'd expect you to base your code on
> >> 4.4.4 plus the about 199 backports we've added there over time
> >> (assuming yours is based on ours), some of which are scheduler
> >> ones.
> > 
> > Don't you have older versions in the field?
> 
> We certainly do, but when people run into problems first thing
> support is supposed to do is tell them to update.
> 
> >> > The panic was never seen before and occured while booting the system,
> >> > only dom0 was running.
> >> > 
> >> > (XEN) Xen BUG at timer.c:189
> >> > (XEN) ----[ Xen-4.4.3_02-26.2_x3  x86_64  debug=n  Tainted:    C ]----
> >> > (XEN) CPU:    14
> >> > (XEN) RIP:    e008:[<ffff82d08012c3e6>] remove_entry+0x26/0x80
> >> > (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
> >> > (XEN) rax: 00000000000000bc   rbx: ffff830839bcd5c8   rcx: 
> >> > ffff82d080343a80
> >> > (XEN) rdx: 000000000000d5f0   rsi: 0000002887fa0000   rdi: 
> >> > ffff830839bcd5c8
> >> > (XEN) rbp: 000000000000d5f0   rsp: ffff830839bbfe20   r8:  
> >> > 0000002887211398
> >> > (XEN) r9:  000000288721132f   r10: 00000000bf8b2994   r11: 
> >> > ffff82d08011cfa0
> >> > (XEN) r12: ffff830839bcd5c8   r13: ffff82d080343a80   r14: 
> >> > 000000000000d5f0
> >> > (XEN) r15: ffff82d080343a80   cr0: 000000008005003b   cr4: 
> >> > 00000000001526f0
> >> > (XEN) cr3: 0000000075679000   cr2: 000000000069b8ec
> >> > (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> >> > (XEN) Xen stack trace from rsp=ffff830839bbfe20:
> >> > (XEN)    ffff830839bcb020 ffff82d08012d3a8 0000002887fa0000 
> >> > 0000000000000286
> >> > (XEN)    ffff830839bcd5f0 ffff830839bcd9c0 0000002887c1915d 
> >> > 000000000000000e
> >> > (XEN)    ffff82d080343a70 0000000000000002 ffff830839bcda30 
> >> > ffff82d0801affd0
> >> > (XEN)    000000010000000e 00000028872113f4 ffffffffffffffff 
> >> > ffff82d08030fc00
> >> > (XEN)    ffff82d0802ffc00 ffff82d080189a1c 0000000000000000 
> >> > 0000000000000000
> >> > (XEN)    ffffffffffffffff 0000107800076e92 ffff82d0801292aa 
> >> > ffff830839bb8000
> >> > (XEN)    ffff8300754fd000 ffff82d08031abe0 0000000000000000 
> >> > ffff830839bcb048
> >> > (XEN)    0000000000000000 ffff82d080161ad6 0000000000000001 
> >> > 0000000000000000
> >> > (XEN)    0000000000000000 0000000000000000 0000000000000000 
> >> > 0000000000000009
> >> > (XEN)    ffff8803dda20010 0000000000000246 0000000000000000 
> >> > 00000000fffffffe
> >> > (XEN)    0000000000000000 0000000000000000 ffffffff800033aa 
> >> > 0000000000000000
> >> > (XEN)    0000000000000000 0000000000000001 0000010000000000 
> >> > ffffffff800033aa
> >> > (XEN)    000000000000e033 0000000000000246 ffff8803dda21f08 
> >> > 000000000000e02b
> >> > (XEN)    0000000000000000 0000000000000000 0000000000000000 
> >> > 0000000000000000
> >> > (XEN)    000000000000000e ffff8300754fd000 00000037b9887700 
> >> > 0000000000000000
> >> > (XEN) Xen call trace:
> >> > (XEN)    [<ffff82d08012c3e6>] remove_entry+0x26/0x80
> >> > (XEN)    [<ffff82d08012d3a8>] set_timer+0x178/0x230
> >> > (XEN)    [<ffff82d0801affd0>] mwait_idle+0x270/0x370
> >> > (XEN)    [<ffff82d080189a1c>] local_time_calibration+0x19c/0x1a0
> >> > (XEN)    [<ffff82d0801292aa>] __do_softirq+0x6a/0xa0
> >> > (XEN)    [<ffff82d080161ad6>] idle_loop+0x16/0x50
> >> > (XEN) 
> >> > (XEN) 
> >> > (XEN) ****************************************
> >> > (XEN) Panic on CPU 14:
> >> > (XEN) Xen BUG at timer.c:189
> >> > (XEN) ****************************************
> >> > (XEN) 
> >> > (XEN) Reboot in five seconds...
> >> > (XEN) Executing kexec image on cpu14
> >> > (XEN) Shot down all CPUs
> >> > 
> >> > This is the call of BUG() in remove_entry() in case timer->status is not
> >> > TIMER_STATUS_in_heap or TIMER_STATUS_in_list.
> >> > 
> >> > With the address of the timer from the vmcore:
> >> > crash> struct timer ffff830839bcd5c8
> >> > struct timer {
> >> >   expires = 0,
> >> >   {
> >> >     heap_offset = 968676784, 
> >> >     list_next = 0xffff830839bcd5b0, 
> >> >     inactive = {
> >> >       next = 0xffff830839bcd5b0, 
> >> >       prev = 0x200200200000050
> >> >     }
> >> >   }, 
> >> >   function = 0xffff830839bcd5e0, 
> >> >   data = 0xffff830839bcd5e0, 
> >> >   cpu = 54768, 
> >> >   status = 188 '\274'
> >> > }
> >> > 
> >> > This looks like the structure was overwritten.
> >> > 
> >> > The call path seems to be:
> >> >  mwait_idle()
> >> >    sched_tick_resume()
> >> >      csched_tick_resume()
> >> >        set_timer()
> >> >          remove_timer()
> >> > 
> >> > In sched_tick_resume():
> >> >  unsigned int cpu = smp_processor_id();
> >> > 
> >> > I extracted struct cpu_info from the stack and found:
> >> > crash> cpu_info ffff830839bbff18
> >> > struct cpu_info {
> >> >   ...
> >> >   processor_id = 14, 
> >> >   current_vcpu = 0xffff8300754fd000, 
> >> >   per_cpu_offset = 0x37b9887700,
> >> >   ...
> >> > 
> >> > And from the code in sched_credit.c:  spc = CSCHED_PCPU(cpu);
> >> > I calculated spc:
> >> > 
> >> > crash> csched_pcpu 0xffff830839be2b40
> >> > struct csched_pcpu {
> >> >   runq = {
> >> >     next = 0xffff830839be2b40, 
> >> >     prev = 0xffff830839be2b40
> >> >   }, 
> >> >   runq_sort_last = 0, 
> >> >   ticker = {
> >> >     expires = 174080000000, 
> >> >     {
> >> >       heap_offset = 1, 
> >> >       list_next = 0x100100100000001, 
> >> >       inactive = {
> >> >         next = 0x100100100000001, 
> >> >         prev = 0x200200200200200
> >> >       }
> >> >     }, 
> >> >     function = 0xffff82d08011f200 <csched_tick>, 
> >> >     data = 0xe, 
> >> >     cpu = 14, 
> >> >     status = 3 '\003'
> >> >   }, 
> >> >   tick = 0, 
> >> >   idle_bias = 47, 
> >> >   balance_mask = 0xffff830839be2bb0
> >> > }
> >> > 
> >> > And this ticker looks ok!
> >> > So the ticker address extracted from the vmcore in remove_entry()
> >> > is not the same as the expected one read from the code.
> >> 
> >> Right, and looking through raw stack contents I can't seem to
> >> find any slot holding 0xffff830839be????. If this was a hypervisor
> >> binary we delivered, I would even have gone and checked which
> >> slots correspond to saved copies of pointers derived from what
> >> was supposedly passed into set_timer() (rooted at
> >> CSCHED_PCPU(14)), but this looks to be a crash with a binary
> >> you've produced yourself. Quite likely corruption starts elsewhere.
> > 
> > I disassembled the affected code and found that the stack is not used that 
> > much.
> 
> Right, hence my reference to saved pointers (i.e. callee saved
> registers put on the stack in function prologues).
> 
> > The only usable value is ffff830839bcd5f0.
> 
> There are a few more ffff830839bc???? ones, and it is at least
> interesting that CPU14's stack is at ffff830839bb???? and the
> uncorrupted struct csched_pcpu is at ffff830839be????.

What happens on the stack:

mwait_idle()
  sub    $0x88,%rsp
  ...
  callq  0xffff82d0801253f0 <sched_tick_resume>
    (in sched_tick_resume)
                            -> ffff82d0801affd0 -> symbol mwait_idle+624
    ...
    jmp *%r11       -> csched_tick_resume
      (in csched_tick_resume)
      sub    $0x18,%rsp
      ...
      add    $0x18,%rsp
      jmpq   0xffff82d08012d230 <set_timer>
        (in set_timer)
        sp = sp - 0x48
        mov %r12,0x28(%rsp) -> ffff830839bcda30 -> from mwait_idle()
        mov %rbx,0x18(%rsp) -> 0000000000000002
        mov %rbp,0x20(%rsp) -> ffff82d080343a70 -> per_cpu__tasklet_work_to_do
        mov %r13,0x30(%rsp) -> 000000000000000e
        mov %r14,0x38(%rsp) -> 0000002887c1915d
        mov %r15,0x40(%rsp) -> ffff830839bcd9c0 -> from mwait_idle() -> struct 
acpi_processor_cx*

        mov %rax,0x10(%rsp) -> ffff830839bcd5f0 -> address &(timer->cpu)
        pushfq 
        popq   0x8(%rsp)    -> 0000000000000286
        mov %rsi,(%rsp)     -> 0000002887fa0000 -> argument "expires"

        call remove_entry()
          (in remove_entry)
                            -> ffff82d08012d3a8 -> symbol set_timer+376
          push   %rbx       -> ffff830839bcb020


The spc from CSCHED_PCPU(cpu) is fully handled in registers.
Below is the complete stack. cpu_info starts at ffff830839bbff18.

ffff830839bbfe20:  ffff830839bcb020 set_timer+376    
ffff830839bbfe30:  0000002887fa0000 0000000000000286 
ffff830839bbfe40:  ffff830839bcd5f0 ffff830839bcd9c0 
ffff830839bbfe50:  0000002887c1915d 000000000000000e 
ffff830839bbfe60:  per_cpu__tasklet_work_to_do 0000000000000002 
ffff830839bbfe70:  ffff830839bcda30 mwait_idle+624   
ffff830839bbfe80:  000000010000000e 00000028872113f4 
ffff830839bbfe90:  ffffffffffffffff softirq_handlers 
ffff830839bbfea0:  irq_stat         local_time_calibration+412 
ffff830839bbfeb0:  0000000000000000 0000000000000000 
ffff830839bbfec0:  ffffffffffffffff 0000107800076e92 
ffff830839bbfed0:  __do_softirq+106 ffff830839bb8000 
ffff830839bbfee0:  ffff8300754fd000 __per_cpu_offset 
ffff830839bbfef0:  0000000000000000 ffff830839bcb048 
ffff830839bbff00:  0000000000000000 idle_loop+22     
ffff830839bbff10:  0000000000000001 0000000000000000 
ffff830839bbff20:  0000000000000000 0000000000000000 
ffff830839bbff30:  0000000000000000 0000000000000009 
ffff830839bbff40:  ffff8803dda20010 0000000000000246 
ffff830839bbff50:  0000000000000000 00000000fffffffe 
ffff830839bbff60:  0000000000000000 0000000000000000 
ffff830839bbff70:  ffffffff800033aa 0000000000000000 
ffff830839bbff80:  0000000000000000 0000000000000001 
ffff830839bbff90:  0000010000000000 ffffffff800033aa 
ffff830839bbffa0:  000000000000e033 0000000000000246 
ffff830839bbffb0:  ffff8803dda21f08 000000000000e02b 
ffff830839bbffc0:  0000000000000000 0000000000000000 
ffff830839bbffd0:  0000000000000000 0000000000000000 
ffff830839bbffe0:  000000000000000e ffff8300754fd000 
ffff830839bbfff0:  00000037b9887700 0000000000000000 
ffff830839bc0000:  0000000000000000 0000000000000000

In ffff830839bbffe0 we can see cpu number 0xe. All seem to be fine.

> 
> Also this value is surprisingly close to the garbage values in the
> apparently corrupted struct timer. Can you rule out that you've
> had a stack overflow somewhere (iirc these lead to immediate
> crashes only when using debug builds)?

As mentioned I analyzed the csched_pcpu areas of all processors.
csched_pcpu of cpu 15 is ffff830839bcd550 - ffff830839bcd59f and
looks ok. The wrong timer address ffff830839bcd5c8 is shortly behind.

IMHO not the stack is the problem but the address of struct timer given to
set_timer().
But this address is calculated with the cpu number from smp_processor_id(),
which looks ok, and CSCHED_PCPU(cpu) looking ok too.
So I have no idea.

> 
> > And this is &timer->cpu used
> > (and moved to the stack) in timer_lock() in the macro timer_lock_irqsave().
> > Calculating back to the struct timer this is our wrong address from the 
> > panic.
> > This would mean set_timer() was already called with the wrong address.
> > 
> > On the other side I had a look at all 48 CSCHED_PCPU(cpu) areas and all 
> > seems
> > to be fine. At least ticker->cpu correspondends always to the cpu used in
> > __per_cpu_offset[cpu].
> > 
> > So we have a really short path where the error can happen I think:
> > 
> > csched_tick_resume(const struct scheduler *ops, unsigned int cpu)
> >   ...
> >   spc = CSCHED_PCPU(cpu);
> >   ...
> >   set_timer(&spc->ticker, ...);
> > 
> > As already said above cpu is from smp_processor_id() in csched_tick_resume()
> > and looks fine. Very strange!
> 
> Indeed. Has this been seen more than once, i.e. is there hope
> that if you added some debugging code it could be caught earlier?

As earlier mentioned this was never seen before and it happened unfortunately
on a customer system.

Dietmar.

> 
> Jan
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.