[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] POD: soft lockups in dom0 kernel


  • To: David Vrabel <david.vrabel@xxxxxxxxxx>
  • From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
  • Date: Fri, 06 Dec 2013 14:52:47 +0100
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>
  • Delivery-date: Fri, 06 Dec 2013 13:52:51 +0000
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Cc:Subject:Date:Message-ID: User-Agent:In-Reply-To:References:MIME-Version: Content-Transfer-Encoding:Content-Type; b=GxFx9mt/Bo3NXK6A3v2+iyd0KGHHWFwKX6ipWmiR7HNwYjTEH1RoASFY jB7lwLA4p4PKx+3Or+XXVzmD6z1B1HSqtm/+IJspWttgCfJLkFYERVQWS m9MDuAKldeMtylYFnm6l1ptLuKqfQwhy5DXOOUXHagqAAvDjYeuJSY8do EltnGIusxqouvd5s617GSkmANZD5CpMzAFVXKovanmBzdnIuTSaO48lll kMnHFmijoxVltBhHUBO2nXpKskLeU;
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>

Am Freitag 06 Dezember 2013, 12:00:02 schrieb David Vrabel:
> On 06/12/13 11:30, Jan Beulich wrote:
> >>>> On 06.12.13 at 12:07, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:
> >> We do not want to disable the soft lockup detection here as it has found
> >> a bug.  We can't have tasks that are unschedulable for minutes as it
> >> would only take a handful of such tasks to hose the system.
> > 
> > My understanding is that the soft lockup detection is what its name
> > says - a mechanism to find cases where the kernel software locked
> > up. Yet that's not the case with long running hypercalls.
> 
> Well ok, it's not a lockup in the kernel but it's still a task that
> cannot be descheduled for minutes of wallclock time.  This is still a
> bug that needs to be fixed.
> 
> >> We should put an explicit preemption point in.  This will fix it for the
> >> CONFIG_PREEMPT_VOLUNTARY case which I think is the most common
> >> configuration.  Or perhaps this should even be a cond_reched() call to
> >> fix it for fully non-preemptible as well.
> > 
> > How do you imagine to do this? When the hypervisor preempts a
> > hypercall, all the kernel gets to see is that it drops back into the
> > hypercall page, such that the next thing to happen would be
> > re-execution of the hypercall. You can't call anything at that point,
> > all that can get run here are interrupts (i.e. event upcalls). Or do
> > you suggest to call cond_resched() from within
> > __xen_evtchn_do_upcall()?
> 
> I've not looked at how.
> 
> > And even if you do - how certain is it that what gets its continuation
> > deferred won't interfere with other things the kernel wants to do
> > (since if you'd be doing it that way, you'd cover all hypercalls at
> > once, not just those coming through privcmd, and hence you could
> > end up with partially completed multicalls or other forms of batching,
> > plus you'd need to deal with possibly active lazy modes).
> 
> I would only do this for hypercalls issued by the privcmd driver.

But I also got soft lockups when unmapping a bigger chunk of guest memory
(our BS2000 OS) in the dom0 kernel via vunmap(). This calls in the end
HYPERVISOR_update_va_mapping() and may take a very long time.
From a kernel module I found no solution to split the virtual address area to
be able to call schedule(). Because all needed kernel functions are not
exported to be usable in modules. The only possible solution was to turn of
the soft lockup detection.

Dietmar.

> 
> David
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.