Re: [Xen-devel] kernel panic with no call trace

Thank you for your replay even if this is quite late.
As you mention, I know there is an error (or some errors) but I cannot guess where it is, so that I want to know where I should start debugging from.
However, although I'm using serial console, I could get not enough clues only from the kernel log:
1) I could get what line and file caused the panic by using the call trace
2) What linear address brings about this situation; Faulting linear address

I think, literally, the 'Faulting linear address' is key point because I heard that it represents bad pointer.
With the pointer(it is just address and I cannot infer what it does mean), is there any way to figure out its real data or line in C source code?
If you have any other approach that can be used in some cases like this, could you please give me the guide?

Below is the kernel log from serial console:

(XEN) ----[ Xen-4.5.0  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    2
(XEN) RIP:    e008:[<ffff82d080120973>] csched_schedule+0x373/0x1180
(XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
(XEN) rax: 00000000ffffffff   rbx: ffff830087ffa000   rcx: ffff830461d20000
(XEN) rdx: ffff830088002c98   rsi: ffff830461d20000   rdi: 0000000000000000
(XEN) rbp: ffff830461ce2ae0   rsp: ffff830461d27d10   r8:  0000001e582339ec
(XEN) r9:  0000000000000004   r10: 000000000000003c   r11: 0000000000000004
(XEN) r12: 0000000000000001   r13: ffff82d0803f26a0   r14: ffff830461c53000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000003526f0
(XEN) cr3: 0000000086077000   cr2: ffff830088002c98
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff830461d27d10:
(XEN)    ffff830461d03950 ffff82d0804081e0 ffff830461c74068 ffff830461d27de0
(XEN)    ffff830461c24c30 ffff830461cec800 ffff82d0804081e0 0000000600000002
(XEN)    ffff830461ce29d0 ffff830461d20000 ffff82d0804081e0 ffff830461d3a720
(XEN)    0000000000000002 ffff830461d3a700 00ffffc000000000 ffff830461d27dd0
(XEN)    ffff830461d27e68 ffff82d080408180 0000001e5c106499 0000000001c9c380
(XEN)    0000000000000000 0000000000000000 ffff8300864e3000 ffff8302e1596fb0
(XEN)    ffff830461d27dd0 ffff830461d27dd0 000000000000004b 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff830461d3a738 ffff8300864e3000
(XEN)    ffff82d0804081e0 ffff830461d2e068 0000001e5c106499 ffff830461d2e060
(XEN)    ffff82d0803f26a0 ffff82d080128cb3 0000001e00000000 ffff830461d2e080
(XEN)    ffff82d080279944 ffff82d08015f295 0000001e5c0504ce ffff830461d3ad80
(XEN)    0000001e5c1054ba ffff82d08012f64e ffff82d0803f26a0 00000000ffffffff
(XEN)    ffff82d0803df880 0000000000000001 ffff82d0803df780 ffffffffffffffff
(XEN)    ffff830461d20000 ffff82d08012c03c ffffffffffffffff 00000000ffffffff
(XEN)    ffff830461d20000 ffff830461d2e068 0000001e5b762541 ffff830461d2e060
(XEN)    ffff82d0803f26a0 ffff82d080162e3a 0000000000000000 ffff8300864e3000
(XEN)    ffff8300864e3000 ffff8800f8bbbfd8 0000000000000000 ffff8800f8bbbfd8
(XEN)    0000000000000003 ffff8800f8bbbec0 0000000000000000 0000000000000246
(XEN)    0000000000007ff0 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffffffff810013aa 0000000000000001 0000000000000000 0000000000000001
(XEN) Xen call trace:
(XEN)    [<ffff82d080120973>] csched_schedule+0x373/0x1180
(XEN)    [<ffff82d080128cb3>] schedule+0xf3/0x590
(XEN)    [<ffff82d08015f295>] reprogram_timer+0x75/0xe0
(XEN)    [<ffff82d08012f64e>] timer_softirq_action+0x13e/0x210
(XEN)    [<ffff82d08012c03c>] __do_softirq+0x7c/0xd0
(XEN)    [<ffff82d080162e3a>] idle_loop+0x3a/0x70
(XEN) Pagetable walk from ffff830088002c98:
(XEN)  L4[0x106] = 0000000086075063 ffffffffffffffff
(XEN)  L3[0x002] = 0000000086071063 ffffffffffffffff
(XEN)  L2[0x040] = 0000000000000000 ffffffffffffffff
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff830088002c98
(XEN) ****************************************
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

I hope your help.


On Wed, Sep 6, 2017 at 4:45 PM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
On 06/09/2017 03:39, Minjun Hong wrote:
> Hello~~
> I'm struggling to resolve a kernel panic problem during developing
> scheduler code.
> But I have not made any progress since I can not get any meaningful
> information from the serial log.
> When the panic occurred, always there is no call trace and only panic
> notification like following:
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) cpu:20, vcpu:20 in csched_schedule(1891)
> (XEN) cpu:21, vcpu:21 in csched_schedule(1891)
> (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(1907)
> (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(1907)
> (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(1907)
> (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(1907)
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: ffff830078efcc98
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> I'm using Xen-4.5.0 on my server having 2 Intel Xeon E5-2620 v4 cpus,
> 128 GB RAM(16 GB DDR4 * 4) and 1 TB HDD and, using Ubuntu 14.04 LTS.
> Is there any method to make the call trace show up or 
> when there is no call trace, please tell me from where I should start
> to debug.
> Thanks in advance and I wait for your comments.

There is a call trace, but as you've clearly added printk()'s to the
scheduler, the calltrace will be getting lost in the spew of logging

>From what you've printed, you've fallen over a bad pointer which isn't
present, although the offset into the directmap does look semi
plausible.  Either way, you've got memory corruption of some kind.


