[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm



On Mon, Dec 08, 2014 at 11:16:07AM +0800, Chen, Tiejun wrote:
> 
> On 2014/12/3 3:39, Konrad Rzeszutek Wilk wrote:
> >On Mon, Dec 01, 2014 at 05:24:20PM +0800, Tiejun Chen wrote:
> >>This should be based on a new parameter globally, 'pci_rdmforce'.
> >>
> >>pci_rdmforce = 1 => Of course this should be 0 by default.
> >>
> >>'1' means we should force check to reserve all ranges. If failed
> >>VM wouldn't be created successfully. This also can give user a
> >>chance to work well with later hotplug, even if not a device
> >>assignment while creating VM.
> >>
> >>But we can override that by one specific pci device:
> >>
> >>pci = ['AA:BB.CC,rdmforce=0/1]
> >>
> >>But this 'rdmforce' should be 1 by default since obviously any
> >>passthrough device always need to do this. Actually no one really
> >>want to set as '0' so it may be unnecessary but I'd like to leave
> >>this as a potential approach.
> >>
> >>So this domctl provides an approach to control how to populate
> >>reserved device memory by tools.
> >>
> >>Note we always post a message to user about this once we owns
> >>RMRR.
> >>
> >>Signed-off-by: Tiejun Chen <tiejun.chen@xxxxxxxxx>
> >>---
> >>  docs/man/xl.cfg.pod.5              |  6 +++++
> >>  docs/misc/vtd.txt                  | 15 ++++++++++++
> >>  tools/libxc/include/xenctrl.h      |  6 +++++
> >>  tools/libxc/xc_domain.c            | 28 +++++++++++++++++++++++
> >>  tools/libxl/libxl_create.c         |  3 +++
> >>  tools/libxl/libxl_dm.c             | 47 
> >> ++++++++++++++++++++++++++++++++++++++
> >>  tools/libxl/libxl_internal.h       |  4 ++++
> >>  tools/libxl/libxl_types.idl        |  2 ++
> >>  tools/libxl/libxlu_pci.c           |  2 ++
> >>  tools/libxl/xl_cmdimpl.c           | 10 ++++++++
> >
> >In the past we had split the hypervisor and the
> >toolstack patches in two. So that one could focus
> >on the hypervisor ones first, and then in another
> >patch on the toolstack.
> >
> 
> Yes.
> 
> >But perhaps this was intended to be in one patch?
> 
> This change also involve docs so its little bit harder to understand the
> whole page if we split this.
> 
> >
> >>  xen/drivers/passthrough/pci.c      | 39 +++++++++++++++++++++++++++++++
> >>  xen/drivers/passthrough/vtd/dmar.c |  8 +++++++
> >>  xen/include/asm-x86/hvm/domain.h   |  4 ++++
> >
> >I don't see ARM here? Should there be an ARM variant of this? If not
> 
> ARM doesn't need this feature.
> 
> >should the toolstack ones only run under x86?
> 
> And I think this shouldn't broken current ARM path as well. I mean this
> would return simply since ARM hasn't such a hypercall handler.
> 
> >
> >>  xen/include/public/domctl.h        | 21 +++++++++++++++++
> >>  xen/xsm/flask/hooks.c              |  1 +
> >>  15 files changed, 196 insertions(+)
> >>
> >>diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> >>index 622ea53..9adc41e 100644
> >>--- a/docs/man/xl.cfg.pod.5
> >>+++ b/docs/man/xl.cfg.pod.5
> >>@@ -645,6 +645,12 @@ dom0 without confirmation.  Please use with care.
> >>  D0-D3hot power management states for the PCI device. False (0) by
> >>  default.
> >>
> >>+=item B<rdmforce=BOOLEAN>
> >>+
> >>+(HVM/x86 only) Specifies that the VM would force to check and try to
> >
> >s/force/forced/
> 
> I guess you're saying 'be forced'.
> 
> >>+reserve all reserved device memory, like RMRR, associated to the PCI
> >>+device. False (0) by default.
> >
> >Not sure I understand. How would the VM be forced to do this? Or is
> >it that the hvmloader would force to do this? And if it fails (as you
> 
> Yes.
> 
> >say 'try') ? What then?
> 
> In most cases we can reserve these regions but if these RMRR regions overlap
> with some fixed range, like guest BIOS, we can't succeed in this case.
> 
> >
> >>+
> >>  =back
> >>
> >>  =back
> >>diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
> >>index 9af0e99..23544d5 100644
> >>--- a/docs/misc/vtd.txt
> >>+++ b/docs/misc/vtd.txt
> >>@@ -111,6 +111,21 @@ in the config file:
> >>  To override for a specific device:
> >>    pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
> >>
> >>+RDM, 'reserved device memory', for PCI Device Passthrough
> >>+---------------------------------------------------------
> >>+
> >>+The BIOS controls some devices in terms of some reginos of memory used for
> >
> >Could you elaborate what 'some devices' are? Network cards? GPUs? What
> >are the most commons ones.
> 
> Some legacy USB device to perform PS2 emulation, and GPU has a stolen memory
> as I remember.
> 
> >
> >s/reginos/regions/
> 
> Fixed.
> 
> >
> >And by regions you mean BAR regions?
> 
> No. I guess you want to know some background about RMRR :)
> 
> There's a good brief description in Linux:
> 
> What is RMRR?
> -------------
> 
> There are some devices the BIOS controls, for e.g USB devices to perform
> PS2 emulation. The regions of memory used for these devices are marked
> reserved in the e820 map. When we turn on DMA translation, DMA to those
> regions will fail. Hence BIOS uses RMRR to specify these regions along with
> devices that need to access these regions. OS is expected to setup
> unity mappings for these regions for these devices to access these regions.
> 
> >
> >>+these devices. This kind of region should be reserved before creating a VM
> >>+to make sure they are not occupied by RAM/MMIO to conflict, and also we can
> >
> >You said 'This' but here you are using the plural ' are'. IF you want it 
> >plural
> >it needs to be 'These regions'
> 
> Thanks for your correction.
> 
> >>+create necessary IOMMU table successfully.
> >>+
> >>+To enable this globally, add "pci_rdmforce" in the config file:
> >>+
> >>+   pci_rdmforce = 1         (default is 0)
> >
> >The guest config file? Or /etc/xen/xl.conf ?
> 
> The guest config file. Here I just follow something about 'pci_msitranslate'
> since they have that usage in common.
> 
> >
> >>+
> >>+Or just enable for a specific device:
> >>+   pci = [ '01:00.0,rdmforce=1', '03:00.0' ]
> >>+
> >>
> >>  Caveat on Conventional PCI Device Passthrough
> >>  ---------------------------------------------
> >>diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> 
> [snip]
> 
> >>--- a/xen/drivers/passthrough/pci.c
> >>+++ b/xen/drivers/passthrough/pci.c
> >>@@ -34,6 +34,7 @@
> >>  #include <xen/tasklet.h>
> >>  #include <xsm/xsm.h>
> >>  #include <asm/msi.h>
> >>+#include <xen/stdbool.h>
> >>
> >>  struct pci_seg {
> >>      struct list_head alldevs_list;
> >>@@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
> >>          }
> >>          break;
> >>
> >>+    case XEN_DOMCTL_set_rdm:
> >>+    {
> >>+        struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
> >>+        struct xen_guest_pcidev_info *pcidevs = NULL;
> >>+        struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);
> >>+
> >>+        if ( d == NULL )
> >>+            return -ESRCH;
> >>+
> >
> >What if this is called on an PV domain?
> 
> Currently we just support this in HVM, so I'd like to add this,
> 
>          if ( d == NULL )
>              return -ESRCH;
> 
> +        ASSERT( is_hvm_domain(d) );
> +

No. Please don't crash the hypervisor.

Just return -ENOSYS or such when done for PV guests.

> 
> >
> >You are also missing the XSM checks.
> 
> Just see this below.
> 
> >
> >What if this is called multiple times. Is it OK to over-ride
> >the 'pci_force' or should it stick once?
> 
> It should be fine since just xc/hvmloader need such an information while
> creating a VM.
> 
> And especially, currently we just call this one time to set. So why we need
> to call this again and again? I think if anyone want to extend such a case
> you're worrying, he should know any effect before he take a action, right?

Program defensively and also think about preemption. If this call end up
being preempted you might need to call it again. Or if the third-party
toolstack use this operation and call this with wacky values?
> 
> >
> >
> >>+        d->arch.hvm_domain.pci_force =
> >>+                            xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;
> >
> >Won't we crash here if this is called for PV guests?
> 
> After the line, 'ASSERT( is_hvm_domain(d) );', is added, this problem should
> be gone.

No it won't be. You will just crash the hypervisor.

Please please put yourself in the mind that the toolstack can (and will)
have bugs.
> 
> >
> >>+        d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;
> >
> >What if the 'num_pcidevs' has some bogus value. You need to check for that.
> 
> This value is grabbed from that existing interface, assign_device, so I mean
> this is already checked.
> 
> >
> >
> >>+        d->arch.hvm_domain.pcidevs = NULL;
> >
> >Please first free it. It might be that the toolstack
> >is doing this a couple of times. You don't want to leak memory.
> >
> 
> Okay,
> 
> +        if ( d->arch.hvm_domain.pcidevs )
> +            xfree(d->arch.hvm_domain.pcidevs);
> 
> >
> >>+
> >>+        if ( xdsr->num_pcidevs )
> >>+        {
> >>+            pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
> >>+                                    xdsr->num_pcidevs);
> >>+            if ( pcidevs == NULL )
> >>+            {
> >>+                rcu_unlock_domain(d);
> >>+                return -ENOMEM;
> >
> >But you already have set 'num_pcidevs' to some value. This copying/check
> >should be done before you modify 'd->arch.hvm_domain'...
> 
> This makes sense so I'll move down this fragment.
> 
> >>+            }
> >>+
> >>+            if ( copy_from_guest(pcidevs, xdsr->pcidevs,
> >>+                                 xdsr->num_pcidevs*sizeof(*pcidevs)) )
> >>+            {
> >>+                xfree(pcidevs);
> >>+                rcu_unlock_domain(d);
> >
> >Ditto. You need to do these checks before you modify 'd->arch.hvm_domain'.
> >
> >>+                return -EFAULT;
> >>+            }
> >>+        }
> >>+
> >>+        d->arch.hvm_domain.pcidevs = pcidevs;
> >>+        rcu_unlock_domain(d);
> >>+    }
> >>+        break;
> >>+
> >>      case XEN_DOMCTL_assign_device:
> >>          if ( unlikely(d->is_dying) )
> >>          {
> >>diff --git a/xen/drivers/passthrough/vtd/dmar.c 
> >>b/xen/drivers/passthrough/vtd/dmar.c
> >>index 1152c3a..5e41e7a 100644
> >>--- a/xen/drivers/passthrough/vtd/dmar.c
> >>+++ b/xen/drivers/passthrough/vtd/dmar.c
> >>@@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
> >>                          "  RMRR region: base_addr %"PRIx64
> >>                          " end_address %"PRIx64"\n",
> >>                          rmrru->base_address, rmrru->end_address);
> >>+            /*
> >>+             * TODO: we may provide a precise paramter just to reserve
> >
> >s/paramter/parameter/
> 
> Fixed.
> 
> >>+             * RMRR range specific to one device.
> >>+             */
> >>+            dprintk(XENLOG_WARNING VTDPREFIX,
> >>+                    "So please set pci_rdmforce to reserve these ranges"
> >>+                    " if you need such a device in hotplug case.\n");
> 
> s/hotplug/passthrough
> 
> >
> >'Please set rdmforce to reserve ranges %lx->%lx if you plan to hotplug this 
> >device.'
> >
> >But then this is going to be a bit verbose, so perhaps:
> >
> >'Ranges %lx-%lx need rdmforce to properly work.' ?
> 
> Its unnecessary to output range again since we already have such a print
> message here.
> 
> >
> >>+
> >>              acpi_register_rmrr_unit(rmrru);
> >>          }
> >>      }
> >>diff --git a/xen/include/asm-x86/hvm/domain.h 
> >>b/xen/include/asm-x86/hvm/domain.h
> >>index 2757c7f..38530e5 100644
> >>--- a/xen/include/asm-x86/hvm/domain.h
> >>+++ b/xen/include/asm-x86/hvm/domain.h
> >>@@ -90,6 +90,10 @@ struct hvm_domain {
> >>      /* Cached CF8 for guest PCI config cycles */
> >>      uint32_t                pci_cf8;
> >>
> >
> >Maybe a comment explaining its purpose?
> 
> Okay.
> 
> /* Force to check/reserve Reserved Device Memory. */
>     bool_t                  pci_force;
> 
> >
> >>+    bool_t                  pci_force;
> >>+    uint32_t                num_pcidevs;
> >>+    struct xen_guest_pcidev_info      *pcidevs;
> >>+
> >
> >You are also missing freeing of this in the hypervisor when the guest
> >is destroyed. Please fix that.
> 
> You're right. I will go there next revision.
> 
> >
> >>      struct pl_time         pl_time;
> >>
> >>      struct hvm_io_handler *io_handler;
> >>diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> >>index 57e2ed7..ba8970d 100644
> >>--- a/xen/include/public/domctl.h
> >>+++ b/xen/include/public/domctl.h
> >>@@ -508,6 +508,25 @@ struct xen_domctl_get_device_group {
> >>  typedef struct xen_domctl_get_device_group xen_domctl_get_device_group_t;
> >>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_device_group_t);
> >>
> >>+/* Currently just one bit to indicate force to check Reserved Device 
> >>Memory. */
> >
> >Not sure I understand. Did you mean:
> >
> >'Check Reserved Device Memory'.
> 
> I can change this as '...force checking Reserved Device Memory.'
> 
> >
> >What happens if you do not have this flag? What are the semantics
> >of this hypercall - as in what will it mean.
> 
> If we have no this flag, these devices owned RMRR can't work in passthrough
> case.
> 
> >
> >>+#define PCI_DEV_RDM_CHECK   0x1
> >>+struct xen_guest_pcidev_info {
> >>+    uint16_t    seg;
> >>+    uint8_t     bus;
> >>+    uint8_t     devfn;
> >>+    uint32_t    flags;
> >>+};
> >>+typedef struct xen_guest_pcidev_info xen_guest_pcidev_info_t;
> >>+DEFINE_XEN_GUEST_HANDLE(xen_guest_pcidev_info_t);
> >>+/* Control whether/how we check and reserve device memory. */
> >>+struct xen_domctl_set_rdm {
> >>+    uint32_t    flags;
> >
> >What is this 'flags' purpose compared to the 'pcidevs.flags'? Please
> >explain.
> 
> I replied something to Kevin, and we just need a global flag so we can
> remove pcidevs.flags.
> 
> >
> >>+    uint32_t    num_pcidevs;
> >>+    XEN_GUEST_HANDLE_64(xen_guest_pcidev_info_t) pcidevs;
> >>+};
> >>+typedef struct xen_domctl_set_rdm xen_domctl_set_rdm_t;
> >>+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_rdm_t);
> >>+
> >>  /* Pass-through interrupts: bind real irq -> hvm devfn. */
> >>  /* XEN_DOMCTL_bind_pt_irq */
> >>  /* XEN_DOMCTL_unbind_pt_irq */
> >>@@ -1070,6 +1089,7 @@ struct xen_domctl {
> >>  #define XEN_DOMCTL_setvnumainfo                  74
> >>  #define XEN_DOMCTL_psr_cmt_op                    75
> >>  #define XEN_DOMCTL_arm_configure_domain          76
> >>+#define XEN_DOMCTL_set_rdm                       77
> >>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
> >>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
> >>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> >>@@ -1135,6 +1155,7 @@ struct xen_domctl {
> >>          struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
> >>          struct xen_domctl_vnuma             vnuma;
> >>          struct xen_domctl_psr_cmt_op        psr_cmt_op;
> >>+        struct xen_domctl_set_rdm           set_rdm;
> >>          uint8_t                             pad[128];
> >>      } u;
> >>  };
> >>diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> >>index d48463f..5a760e2 100644
> >>--- a/xen/xsm/flask/hooks.c
> >>+++ b/xen/xsm/flask/hooks.c
> >>@@ -592,6 +592,7 @@ static int flask_domctl(struct domain *d, int cmd)
> >>      case XEN_DOMCTL_test_assign_device:
> >>      case XEN_DOMCTL_assign_device:
> >>      case XEN_DOMCTL_deassign_device:
> >>+    case XEN_DOMCTL_set_rdm:
> >
> >There is more to XSM than just this file..
> 
> But I don't see more other stuff, like XEN_DOMCTL_assign_device.
> 
> >
> >Please compile with XSM enabled.
> 
> Anyway, I add XSM_ENABLE = y and FLASK_ENABLE = y in Config.mk then
> recompile, but looks good.
> 
> Anything I'm missing?

Ah, then it is fine!

> 
> >>  #endif
> >>          return 0;
> >
> >
> >Also how does this work with 32-bit dom0s? Is there a need to use the
> >compat layer?
> 
> Are you saying in xsm case? Others?
> 
> Actually this new DOMCTL is similar with XEN_DOMCTL_assign_device in some
> senses but I don't see such an issue you're pointing.

I was thinking about the compat layer and making sure it works properly.
> 
> Thanks
> Tiejun

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.