[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] IOMMU: don't disable bus mastering on faults for devices used by Xen or Dom0



At 12:51 +0000 on 06 Nov (1352206269), Jan Beulich wrote:
> >>> On 06.11.12 at 13:06, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:
> > On Tue, 2012-11-06 at 09:44 +0000, Tim Deegan wrote:
> >> > In the context of analyzing the situation described in
> >> > "iommu=dom0-passthrough behavior"
> >> > (http://lists.xen.org/archives/html/xen-devel/2012-11/msg00140.html)
> >> > I suppressed the IOMMU setup for some device in Dom0, and
> >> > was quite puzzled to find that only a single fault would occur.
> >> 
> >> I think it would be better to allow some small number of faults per
> >> device before disabling it rather than give dom0 carte blanche.
> >> 
> >> This check is really there to stop a mad device from hosing the system
> >> rather than to contain a malicious OS, and a properly out-of-control
> >> device needs to be stopped or it will livelock Xen with iommu faults.
> >> In a uniprocessor system, dom0 might never get the chance to fix it.
> >> 
> > Right. But moving the fault handling code to softirq should have already
> > helped solving/mitigating that, hasn't it?
> 
> It helps keeping Xen alive, but doesn't for any specific domain
> (including Dom0).

Indeed.  (Intel) IOMMU interrupts are suppressed until the softirq
handler acknowledges the error, but if the softirq handler doesn't
disable the device, it will take another IOMMU interrupt immediately.
I thought the AMD side behaved eth same but clearly not -- I'll try to
take a look at that later in the week.

> > When implementing and testing that, I wasn't able to reproduce any
> > livelock situation (although I can't exclude that to be at least partly
> > due to my inexperience, especially at the time, with I/O
> > virtualization)... Jan, have you (after killing the 'disable
> > bus-mastering part' of course)?
> 
> No, I haven't - you'd have to have a device that doesn't stop I/O
> after a finite amount was done (or program one that way, e.g. by
> handing it a cyclic list of SG descriptors or alike). Wasn't it at your
> (Citrix) end that the problem was actually observed/reported?

IIRC it was someone at Intel who reported it.  I never saw it myself.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.