Re: [Xen-devel] BUG: unable to handle kernel NULL pointer dereference - xen_spin_lock_flags

On 14/02/12 10:09, Ian Campbell wrote:
> 1On Tue, 2012-02-14 at 00:54 +0000, Christopher S. Aker wrote:
>> On Feb 13, 2012, at 5:41 PM, Christopher S. Aker wrote:
>>> Network stress testing (iperf, UDP, bidirectional) the above stack reliably 
>>> BUGs on both 3.0.4 and 3.2.5, and on both IGB or e1000e NICs with the 
>>> following:
>>> BUG: unable to handle kernel NULL pointer dereference at 00000474
>>> IP: [<c100e967>] xen_spin_lock_flags+0x27/0x70
>> This happens regardless of CONFIG_PARAVIRT_SPINLOCKS enabled or disabled.
> I think that rules out the recent pv spinlock bug (fixed by
> 7a7546b377bdaa25ac77f33d9433c59f259b9688, in various stable trees
> What line of code does that IP correspond to within xen_spin_lock_flags?
> Likewise the one in xen_netbk_schedule_xenvif from the stack.
> I suspect this must be &netbk->net_schedule_list_lock but I don't see
> how that can ever be NULL nor does the offset appear to be 0x474, at
> least in my tree -- although that may depend on debug options.
> Are you rebooting guests or plug/unplugging vifs while this is going on?
> What about hotplugging CPUs (dom0 in particular)?
> Does this happen as soon as the test starts or does it work for a bit
> before failing?
> Ian.

I dont know if this is related, but it looks very similar to a bug a
friend of mine encountered.  I tried to investigate but got nowhere.

Panic can be found: http://pastebin.com/ExCwhzpy

The panic looks as if it is on the same logical instruction.  (There is
a 32/64bit difference which would likely explain the out-by-one byte
reference for the dereference.)

The difference here is this bug is from the ext4 path, indicating that
it might be a spinlock problem rather than a network problem (of course,
assuming that this is infact the same bug)

