[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown



>>> On 02.04.15 at 18:49, <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> On Wed, 25 Mar 2015, Jan Beulich wrote:
>> When a device gets detached from a guest, pciback will clear its
>> command register, thus disabling both memory and I/O decoding. The
>> disabled memory decoding, however, has an effect on the MSI-X table
>> accesses the hypervisor does: These won't have the intended effect
>> anymore. Even worse, for PCIe devices (but not SR-IOV virtual
>> functions) such accesses may (will?) be treated as Unsupported
>> Requests, causing respective errors to be surfaced, potentially in the
>> form of NMIs that may be fatal to the hypervisor or Dom0 is different
>> ways. Hence rather than carrying out these accesses, we should avoid
>> them where we can, and use alternative (e.g. PCI config space based)
>> mechanisms to achieve at least the same effect.
> 
> I don't think that it is a good idea for both Xen and Linux to access
> the command register simultaneously.  Working around Linux in Xen
> doesn't sound like an optimal solution.   Maybe we could just fix the
> pciback and that would be enough.

I'm afraid that would just eliminate the specific case, but not the
general issue. While we trust Dom0 to not do outright bad things,
the hypervisor should still avoid doing things that can go wrong
due to the state a device is put (or left) in by Dom0.

> In any case we should make it clear somewhere who is supposed to write
> to the command register (and other PCI reigsters) at any given time,
> otherwise it would be very easy for a new kernel update to break the
> hypervisor and we wouldn't even know why it happened.

We should, but at this point in time this is going to be rather
problematic. Such a separation of responsibilities should have been
done before all the pass-through code got written.

>> @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_
>>          }
>>          break;
>>      case PCI_CAP_ID_MSIX:
>> -    {
>> -        int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
>> -        writel(flag, entry->mask_base + offset);
>> -        readl(entry->mask_base + offset);
>> -        break;
>> -    }
>> +        if ( likely(memory_decoded(pdev)) )
>> +        {
>> +            writel(flag, entry->mask_base + 
>> PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
>> +            readl(entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
>> +            break;
>> +        }
>> +        if ( flag )
>> +        {
>> +            u16 control;
>> +            domid_t domid = pdev->domain->domain_id;
>> +
>> +            control = pci_conf_read16(seg, bus, slot, func,
>> +                                      
>> msix_control_reg(entry->msi_attrib.pos));
>> +            if ( control & PCI_MSIX_FLAGS_MASKALL )
>> +                break;
>> +            pci_conf_write16(seg, bus, slot, func,
>> +                             msix_control_reg(entry->msi_attrib.pos),
>> +                             control | PCI_MSIX_FLAGS_MASKALL);
> 
> How is that going to interact with Linux (Dom0) writing to the command
> register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for
> devices assigned to HVM guests. Could this cause any conflicts?

I don't see the relation to the comment register here - all quoted code
only deals with the MSI-X control register.

> One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but
> as a matter of fact QEMU has done that for years and we cannot break
> that behaviour without introducing regressions.  In fact as it stands
> QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM
> guests, not Xen.
> 
> Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed
> through devices to running domains?  I think that might be a good enough
> separation of responsibilities between Xen and QEMU.

No, the bits has to be under hypervisor control (just like any other
hardware bits controlling interrupts). Dealing with this is the subject
of patches I have ready, but didn't post yet.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.