[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0



On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote:
>On Thu, Aug 31, 2017 at 04:45:23PM +0800, Chao Gao wrote:
>> On Thu, Aug 31, 2017 at 10:03:19AM +0100, Roger Pau Monne wrote:
>> >On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote:
>> >> On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote:
>> >> >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote:
>> >> >> > From: Roger Pau Monne [mailto:roger.pau@xxxxxxxxxx]
>> >> >> > Sent: Friday, August 25, 2017 9:59 PM
>> >> >> > 
>> >> >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
>> >> >> > > >>> On 25.08.17 at 14:15, <roger.pau@xxxxxxxxxx> wrote:
>> >> >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
>> >> >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@xxxxxxxxxx> wrote:
>> >> >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
>> >> >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@xxxxxxxxxx> wrote:
>> >> >> > > >> >> > --- a/xen/arch/x86/dom0_build.c
>> >> >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c
>> >> >> > > >> >> > @@ -440,6 +440,10 @@ int __init
>> >> >> > dom0_setup_permissions(struct domain *d)
>> >> >> > > >> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, 
>> >> >> > > >> >> > mfn);
>> >> >> > > >> >> >      }
>> >> >> > > >> >> >
>> >> >> > > >> >> > +    /* For PVH prevent access to the MMCFG areas. */
>> >> >> > > >> >> > +    if ( dom0_pvh )
>> >> >> > > >> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
>> >> >> > > >> >>
>> >> >> > > >> >> What about ones reported by Dom0 later on? Which then raises 
>> >> >> > > >> >> the
>> >> >> > > >> >> question whether ...
>> >> >> > > >> >
>> >> >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved
>> >> >> > handler.
>> >> >> > > >> > But since you propose to do white listing, I guess it doesn't 
>> >> >> > > >> > matter
>> >> >> > > >> > that much anymore.
>> >> >> > > >>
>> >> >> > > >> Well, a fundamental question is whether white listing would 
>> >> >> > > >> work in
>> >> >> > > >> the first place. I could see room for severe problems e.g. with 
>> >> >> > > >> ACPI
>> >> >> > > >> methods wanting to access MMIO that's not described by any PCI
>> >> >> > > >> devices' BARs. Typically that would be regions in the chipset 
>> >> >> > > >> which
>> >> >> > > >> firmware is responsible for configuring/managing, the addresses 
>> >> >> > > >> of
>> >> >> > > >> which can be found/set in custom config space registers.
>> >> >> > > >
>> >> >> > > > The question would also be what would Xen allow in such 
>> >> >> > > > white-listing.
>> >> >> > > > Obviously you can get to map the same using both white-list and
>> >> >> > > > black-listing (see below).
>> >> >> > >
>> >> >> > > Not really - what you've said there regarding MMCFG regions is
>> >> >> > > a clear indication that we should _not_ map reserved regions, i.e.
>> >> >> > > it would need to be full white listing with perhaps just the PCI
>> >> >> > > device BARs being handled automatically.
>> >> >> > 
>> >> >> > I've tried just mapping the BARs and that sadly doesn't work, the box
>> >> >> > hangs after the IOMMU is enabled:
>> >> >> > 
>> >> >> > [...]
>> >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5
>> >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6
>> >> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
>> >> >> > 
>> >> >> > I will park this ATM and leave it for the Intel guys to diagnose.
>> >> >> > 
>> >> >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
>> >> >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset.
>> >> >> > 
>> >> >> 
>> >> >> +Chao who can help check whether we have such a box at hand.
>> >> >> 
>> >> >> btw please also give your BIOS version.
>> >> >
>> >> >It's a Precision T3600 BIOS A14.
>> >> 
>> >> Hi, Roger.
>> >> 
>> >> I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and
>> >
>> >The ones I've seen issues with are Sandy Bridge or Nehalem, can you
>> >find some of this hardware?
>> 
>> As I expected, I was removed from recipents :(, which made me
>> hard to notice your replies in time. 
>
>Sorry, I have no idea why my MUA does that, it seems to be able to
>deal fine with other recipients.
>
>> Yes. I will. But may take some time (for even Ivy Bridge is rare).
>> 
>> >
>> >I haven't tested Ivy Bridge, but all Haswell boxes I've tested seem to
>> >work just fine.
>> 
>> The reason why I chose Ivy Bridge partly is you said you found this bug on
>> almost pre-haswell box.
>
>I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge
>(in fact I didn't even know about Ivy Bridge, that's why I said all
>pre-Haswell).
>
>In fact I'm now trying with a Nehalem processor that seem to work, so
>whatever this issue is it certainly doesn't affect all models or
>chipsets.

Hi, Roger.

Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690
2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang.

I also tested on Haswell and found RMRRs in dmar are incorrect on my
haswell. The e820 on that machine is:
(XEN) [    0.000000] Xen-e820 RAM map:
(XEN) [    0.000000]  0000000000000000 - 000000000009a400 (usable)
(XEN) [    0.000000]  000000000009a400 - 00000000000a0000 (reserved)
(XEN) [    0.000000]  00000000000e0000 - 0000000000100000 (reserved)
(XEN) [    0.000000]  0000000000100000 - 000000006ff84000 (usable)
(XEN) [    0.000000]  000000006ff84000 - 000000007ac51000 (reserved)
(XEN) [    0.000000]  000000007ac51000 - 000000007b681000 (ACPI NVS)
(XEN) [    0.000000]  000000007b681000 - 000000007b7cf000 (ACPI data)
(XEN) [    0.000000]  000000007b7cf000 - 000000007b800000 (usable)
(XEN) [    0.000000]  000000007b800000 - 0000000090000000 (reserved)
(XEN) [    0.000000]  00000000fed1c000 - 00000000fed20000 (reserved)
(XEN) [    0.000000]  00000000ff400000 - 0000000100000000 (reserved)
(XEN) [    0.000000]  0000000100000000 - 0000002080000000 (usable)

And the RMRRs in DMAR are:
(XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
(XEN) [    0.000000] [VT-D] endpoint: 0000:05:00.0
(XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723b4000
end_addr 7a3f3fff
(XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
(XEN) [    0.000000] [VT-D] endpoint: 0000:00:1d.0
(XEN) [    0.000000] [VT-D] endpoint: 0000:00:1a.0
(XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723ac000
end_addr 723aefff
(Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0
are USB controllers.)

After DMA remapping is enabled, two DMA translation faults are reported
by VT-d:
(XEN) [    9.547924] [VT-D]iommu_enable_translation: iommu->reg =
ffff82c00021b000
(XEN) [    9.550620] [VT-D]iommu_enable_translation: iommu->reg =
ffff82c00021d000
(XEN) [    9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary
Pending Fault
(XEN) [    9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0]
fault addr 7a3f5000, iommu reg = ffff82c00021d000
(XEN) [    9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [    9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn
7a3f5
(XEN) [    9.561179]     root_entry[00] = 107277c001
(XEN) [    9.562447]     context[d0] = 2_1072c06001
(XEN) [    9.563776]     l4[000] = 9c0000202f171107
(XEN) [    9.565125]     l3[001] = 9c0000202f152107
(XEN) [    9.566483]     l2[1d1] = 9c000010727ce107
(XEN) [    9.567821]     l1[1f5] = 8000000000000000
(XEN) [    9.569168]     l1[1f5] not present
(XEN) [    9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0]
fault addr 7a3f4000, iommu reg = ffff82c00021d000
(XEN) [    9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [    9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn
7a3f4
(XEN) [    9.575819]     root_entry[00] = 107277c001
(XEN) [    9.577129]     context[e8] = 2_1072c06001
(XEN) [    9.578439]     l4[000] = 9c0000202f171107
(XEN) [    9.579778]     l3[001] = 9c0000202f152107
(XEN) [    9.581111]     l2[1d1] = 9c000010727ce107
(XEN) [    9.582482]     l1[1f4] = 8000000000000000
(XEN) [    9.583812]     l1[1f4] not present
(XEN) [   10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address
(XEN) [   10.521499] Failed to load Dom0 kernel
(XEN) [   10.532171] 
(XEN) [   10.535464] ****************************************
(XEN) [   10.542636] Panic on CPU 0:
(XEN) [   10.547394] Could not set up DOM0 guest OS
(XEN) [   10.553605] ****************************************

The fault address the devices failed to access is marked as reserved in
e820 and isn't reserved for the devices according to the RMRRs in DMAR.
So I think we can draw a conclusion that some existing BIOSs don't
expose correct RMRR to OS by DMAR. And we need a workaround such as
iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0
and pvh dom0.

As to the machine hang Roger observed, I have no idea on the cause. Roger,
have you ever seen the VT-d on that machine reporting a DMA
translation fault? If not, can you create one fault in native? I think
this can tell us whether the hardware's fault report function works well
or there are some bugs in Xen code. What is your opinion on this trial?

Thanks
chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.