[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen PVM: Strange lockups when running PostgreSQL load



>>> On 19.10.12 at 10:33, Stefan Bader <stefan.bader@xxxxxxxxxxxxx> wrote:
> On 19.10.2012 10:06, Jan Beulich wrote:
>>>>> On 18.10.12 at 22:52, Stefan Bader <stefan.bader@xxxxxxxxxxxxx> wrote:
>>> Actually I begin to suspect that it could be possible that I just 
>>> overlooked 
> 
>>> the
>>> most obvious thing. Provoking question: are we sure we are on the same page
>>> about the purpose of the spin_lock_flags variant of the pv lock ops 
>>> interface?
>>>
>>> I begin to suspect that it really is not for giving a chance to re-enable
>>> interrupts. Just what it should be used for I am not clear. Anyway it seems 
>>> all
>>> other places more or less ignore the flags and map themselves back to an
>>> ignorant version of spinlock.
>>> Also I believe that the only high level function that would end up in 
>>> passing
>>> any flags, would be the spin_lock_irqsave one. And I am pretty sure that 
>>> this
>>> one will expect interrupts to stay disabled.
>> 
>> No - the only requirement here is that from the point on where
>> the lock is owned interrupt must remain disabled. Re-enabling
>> intermediately is quite okay (and used to be done by the
>> native kernel prior to the conversion to ticket locks iirc).
> 
> Though it seems rather dangerous then. Don't remember the old code, but imo 
> it
> always opens up a (even microscopic) window to unexpected interruptions.

There just can't be unexpected interruptions. Whenever interrupts
are enabled, it is expected that they can occur.

>>> So I tried below approach and that seems to be surviving the previously 
>>> breaking
>>> testcase for much longer than anything I tried before.
>> 
>> If that indeed fixes your problem, then (minus eventual problems
>> with the scope of the interrupts-enabled window) this rather
>> points at a bug in the users of the spinlock interfaces.
> 
> I would be pragmatic here, none of the other current implementations seem to
> re-enable interrupts and so this only affects xen pv.

I don't think you really checked - the first arch I looked at (s390,
as being the most obvious one to look at when it comes to
virtualization) quite prominently re-enableds interrupts in
arch_spin_lock_wait_flags().

> And how much really is
> gained from enabling it compared to the risk of being affected by something 
> that nobody else will be?

The main difference between the native and virtualized cases is
that the period of time you spend waiting for the lock to become
available is pretty much unbounded (even more so without ticket
locks), and keeping interrupts disabled for such an extended
period of time is just going to ask for other problems.

And note that this isn't the case just for Xen PV - all virtualization
scenarios suffer from that.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.