|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm
On Mon, Dec 08, 2014 at 11:16:07AM +0800, Chen, Tiejun wrote:
>
> On 2014/12/3 3:39, Konrad Rzeszutek Wilk wrote:
> >On Mon, Dec 01, 2014 at 05:24:20PM +0800, Tiejun Chen wrote:
> >>This should be based on a new parameter globally, 'pci_rdmforce'.
> >>
> >>pci_rdmforce = 1 => Of course this should be 0 by default.
> >>
> >>'1' means we should force check to reserve all ranges. If failed
> >>VM wouldn't be created successfully. This also can give user a
> >>chance to work well with later hotplug, even if not a device
> >>assignment while creating VM.
> >>
> >>But we can override that by one specific pci device:
> >>
> >>pci = ['AA:BB.CC,rdmforce=0/1]
> >>
> >>But this 'rdmforce' should be 1 by default since obviously any
> >>passthrough device always need to do this. Actually no one really
> >>want to set as '0' so it may be unnecessary but I'd like to leave
> >>this as a potential approach.
> >>
> >>So this domctl provides an approach to control how to populate
> >>reserved device memory by tools.
> >>
> >>Note we always post a message to user about this once we owns
> >>RMRR.
> >>
> >>Signed-off-by: Tiejun Chen <tiejun.chen@xxxxxxxxx>
> >>---
> >> docs/man/xl.cfg.pod.5 | 6 +++++
> >> docs/misc/vtd.txt | 15 ++++++++++++
> >> tools/libxc/include/xenctrl.h | 6 +++++
> >> tools/libxc/xc_domain.c | 28 +++++++++++++++++++++++
> >> tools/libxl/libxl_create.c | 3 +++
> >> tools/libxl/libxl_dm.c | 47
> >> ++++++++++++++++++++++++++++++++++++++
> >> tools/libxl/libxl_internal.h | 4 ++++
> >> tools/libxl/libxl_types.idl | 2 ++
> >> tools/libxl/libxlu_pci.c | 2 ++
> >> tools/libxl/xl_cmdimpl.c | 10 ++++++++
> >
> >In the past we had split the hypervisor and the
> >toolstack patches in two. So that one could focus
> >on the hypervisor ones first, and then in another
> >patch on the toolstack.
> >
>
> Yes.
>
> >But perhaps this was intended to be in one patch?
>
> This change also involve docs so its little bit harder to understand the
> whole page if we split this.
>
> >
> >> xen/drivers/passthrough/pci.c | 39 +++++++++++++++++++++++++++++++
> >> xen/drivers/passthrough/vtd/dmar.c | 8 +++++++
> >> xen/include/asm-x86/hvm/domain.h | 4 ++++
> >
> >I don't see ARM here? Should there be an ARM variant of this? If not
>
> ARM doesn't need this feature.
>
> >should the toolstack ones only run under x86?
>
> And I think this shouldn't broken current ARM path as well. I mean this
> would return simply since ARM hasn't such a hypercall handler.
>
> >
> >> xen/include/public/domctl.h | 21 +++++++++++++++++
> >> xen/xsm/flask/hooks.c | 1 +
> >> 15 files changed, 196 insertions(+)
> >>
> >>diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> >>index 622ea53..9adc41e 100644
> >>--- a/docs/man/xl.cfg.pod.5
> >>+++ b/docs/man/xl.cfg.pod.5
> >>@@ -645,6 +645,12 @@ dom0 without confirmation. Please use with care.
> >> D0-D3hot power management states for the PCI device. False (0) by
> >> default.
> >>
> >>+=item B<rdmforce=BOOLEAN>
> >>+
> >>+(HVM/x86 only) Specifies that the VM would force to check and try to
> >
> >s/force/forced/
>
> I guess you're saying 'be forced'.
>
> >>+reserve all reserved device memory, like RMRR, associated to the PCI
> >>+device. False (0) by default.
> >
> >Not sure I understand. How would the VM be forced to do this? Or is
> >it that the hvmloader would force to do this? And if it fails (as you
>
> Yes.
>
> >say 'try') ? What then?
>
> In most cases we can reserve these regions but if these RMRR regions overlap
> with some fixed range, like guest BIOS, we can't succeed in this case.
>
> >
> >>+
> >> =back
> >>
> >> =back
> >>diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
> >>index 9af0e99..23544d5 100644
> >>--- a/docs/misc/vtd.txt
> >>+++ b/docs/misc/vtd.txt
> >>@@ -111,6 +111,21 @@ in the config file:
> >> To override for a specific device:
> >> pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
> >>
> >>+RDM, 'reserved device memory', for PCI Device Passthrough
> >>+---------------------------------------------------------
> >>+
> >>+The BIOS controls some devices in terms of some reginos of memory used for
> >
> >Could you elaborate what 'some devices' are? Network cards? GPUs? What
> >are the most commons ones.
>
> Some legacy USB device to perform PS2 emulation, and GPU has a stolen memory
> as I remember.
>
> >
> >s/reginos/regions/
>
> Fixed.
>
> >
> >And by regions you mean BAR regions?
>
> No. I guess you want to know some background about RMRR :)
>
> There's a good brief description in Linux:
>
> What is RMRR?
> -------------
>
> There are some devices the BIOS controls, for e.g USB devices to perform
> PS2 emulation. The regions of memory used for these devices are marked
> reserved in the e820 map. When we turn on DMA translation, DMA to those
> regions will fail. Hence BIOS uses RMRR to specify these regions along with
> devices that need to access these regions. OS is expected to setup
> unity mappings for these regions for these devices to access these regions.
>
> >
> >>+these devices. This kind of region should be reserved before creating a VM
> >>+to make sure they are not occupied by RAM/MMIO to conflict, and also we can
> >
> >You said 'This' but here you are using the plural ' are'. IF you want it
> >plural
> >it needs to be 'These regions'
>
> Thanks for your correction.
>
> >>+create necessary IOMMU table successfully.
> >>+
> >>+To enable this globally, add "pci_rdmforce" in the config file:
> >>+
> >>+ pci_rdmforce = 1 (default is 0)
> >
> >The guest config file? Or /etc/xen/xl.conf ?
>
> The guest config file. Here I just follow something about 'pci_msitranslate'
> since they have that usage in common.
>
> >
> >>+
> >>+Or just enable for a specific device:
> >>+ pci = [ '01:00.0,rdmforce=1', '03:00.0' ]
> >>+
> >>
> >> Caveat on Conventional PCI Device Passthrough
> >> ---------------------------------------------
> >>diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>
> [snip]
>
> >>--- a/xen/drivers/passthrough/pci.c
> >>+++ b/xen/drivers/passthrough/pci.c
> >>@@ -34,6 +34,7 @@
> >> #include <xen/tasklet.h>
> >> #include <xsm/xsm.h>
> >> #include <asm/msi.h>
> >>+#include <xen/stdbool.h>
> >>
> >> struct pci_seg {
> >> struct list_head alldevs_list;
> >>@@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
> >> }
> >> break;
> >>
> >>+ case XEN_DOMCTL_set_rdm:
> >>+ {
> >>+ struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
> >>+ struct xen_guest_pcidev_info *pcidevs = NULL;
> >>+ struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);
> >>+
> >>+ if ( d == NULL )
> >>+ return -ESRCH;
> >>+
> >
> >What if this is called on an PV domain?
>
> Currently we just support this in HVM, so I'd like to add this,
>
> if ( d == NULL )
> return -ESRCH;
>
> + ASSERT( is_hvm_domain(d) );
> +
No. Please don't crash the hypervisor.
Just return -ENOSYS or such when done for PV guests.
>
> >
> >You are also missing the XSM checks.
>
> Just see this below.
>
> >
> >What if this is called multiple times. Is it OK to over-ride
> >the 'pci_force' or should it stick once?
>
> It should be fine since just xc/hvmloader need such an information while
> creating a VM.
>
> And especially, currently we just call this one time to set. So why we need
> to call this again and again? I think if anyone want to extend such a case
> you're worrying, he should know any effect before he take a action, right?
Program defensively and also think about preemption. If this call end up
being preempted you might need to call it again. Or if the third-party
toolstack use this operation and call this with wacky values?
>
> >
> >
> >>+ d->arch.hvm_domain.pci_force =
> >>+ xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;
> >
> >Won't we crash here if this is called for PV guests?
>
> After the line, 'ASSERT( is_hvm_domain(d) );', is added, this problem should
> be gone.
No it won't be. You will just crash the hypervisor.
Please please put yourself in the mind that the toolstack can (and will)
have bugs.
>
> >
> >>+ d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;
> >
> >What if the 'num_pcidevs' has some bogus value. You need to check for that.
>
> This value is grabbed from that existing interface, assign_device, so I mean
> this is already checked.
>
> >
> >
> >>+ d->arch.hvm_domain.pcidevs = NULL;
> >
> >Please first free it. It might be that the toolstack
> >is doing this a couple of times. You don't want to leak memory.
> >
>
> Okay,
>
> + if ( d->arch.hvm_domain.pcidevs )
> + xfree(d->arch.hvm_domain.pcidevs);
>
> >
> >>+
> >>+ if ( xdsr->num_pcidevs )
> >>+ {
> >>+ pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
> >>+ xdsr->num_pcidevs);
> >>+ if ( pcidevs == NULL )
> >>+ {
> >>+ rcu_unlock_domain(d);
> >>+ return -ENOMEM;
> >
> >But you already have set 'num_pcidevs' to some value. This copying/check
> >should be done before you modify 'd->arch.hvm_domain'...
>
> This makes sense so I'll move down this fragment.
>
> >>+ }
> >>+
> >>+ if ( copy_from_guest(pcidevs, xdsr->pcidevs,
> >>+ xdsr->num_pcidevs*sizeof(*pcidevs)) )
> >>+ {
> >>+ xfree(pcidevs);
> >>+ rcu_unlock_domain(d);
> >
> >Ditto. You need to do these checks before you modify 'd->arch.hvm_domain'.
> >
> >>+ return -EFAULT;
> >>+ }
> >>+ }
> >>+
> >>+ d->arch.hvm_domain.pcidevs = pcidevs;
> >>+ rcu_unlock_domain(d);
> >>+ }
> >>+ break;
> >>+
> >> case XEN_DOMCTL_assign_device:
> >> if ( unlikely(d->is_dying) )
> >> {
> >>diff --git a/xen/drivers/passthrough/vtd/dmar.c
> >>b/xen/drivers/passthrough/vtd/dmar.c
> >>index 1152c3a..5e41e7a 100644
> >>--- a/xen/drivers/passthrough/vtd/dmar.c
> >>+++ b/xen/drivers/passthrough/vtd/dmar.c
> >>@@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
> >> " RMRR region: base_addr %"PRIx64
> >> " end_address %"PRIx64"\n",
> >> rmrru->base_address, rmrru->end_address);
> >>+ /*
> >>+ * TODO: we may provide a precise paramter just to reserve
> >
> >s/paramter/parameter/
>
> Fixed.
>
> >>+ * RMRR range specific to one device.
> >>+ */
> >>+ dprintk(XENLOG_WARNING VTDPREFIX,
> >>+ "So please set pci_rdmforce to reserve these ranges"
> >>+ " if you need such a device in hotplug case.\n");
>
> s/hotplug/passthrough
>
> >
> >'Please set rdmforce to reserve ranges %lx->%lx if you plan to hotplug this
> >device.'
> >
> >But then this is going to be a bit verbose, so perhaps:
> >
> >'Ranges %lx-%lx need rdmforce to properly work.' ?
>
> Its unnecessary to output range again since we already have such a print
> message here.
>
> >
> >>+
> >> acpi_register_rmrr_unit(rmrru);
> >> }
> >> }
> >>diff --git a/xen/include/asm-x86/hvm/domain.h
> >>b/xen/include/asm-x86/hvm/domain.h
> >>index 2757c7f..38530e5 100644
> >>--- a/xen/include/asm-x86/hvm/domain.h
> >>+++ b/xen/include/asm-x86/hvm/domain.h
> >>@@ -90,6 +90,10 @@ struct hvm_domain {
> >> /* Cached CF8 for guest PCI config cycles */
> >> uint32_t pci_cf8;
> >>
> >
> >Maybe a comment explaining its purpose?
>
> Okay.
>
> /* Force to check/reserve Reserved Device Memory. */
> bool_t pci_force;
>
> >
> >>+ bool_t pci_force;
> >>+ uint32_t num_pcidevs;
> >>+ struct xen_guest_pcidev_info *pcidevs;
> >>+
> >
> >You are also missing freeing of this in the hypervisor when the guest
> >is destroyed. Please fix that.
>
> You're right. I will go there next revision.
>
> >
> >> struct pl_time pl_time;
> >>
> >> struct hvm_io_handler *io_handler;
> >>diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> >>index 57e2ed7..ba8970d 100644
> >>--- a/xen/include/public/domctl.h
> >>+++ b/xen/include/public/domctl.h
> >>@@ -508,6 +508,25 @@ struct xen_domctl_get_device_group {
> >> typedef struct xen_domctl_get_device_group xen_domctl_get_device_group_t;
> >> DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_device_group_t);
> >>
> >>+/* Currently just one bit to indicate force to check Reserved Device
> >>Memory. */
> >
> >Not sure I understand. Did you mean:
> >
> >'Check Reserved Device Memory'.
>
> I can change this as '...force checking Reserved Device Memory.'
>
> >
> >What happens if you do not have this flag? What are the semantics
> >of this hypercall - as in what will it mean.
>
> If we have no this flag, these devices owned RMRR can't work in passthrough
> case.
>
> >
> >>+#define PCI_DEV_RDM_CHECK 0x1
> >>+struct xen_guest_pcidev_info {
> >>+ uint16_t seg;
> >>+ uint8_t bus;
> >>+ uint8_t devfn;
> >>+ uint32_t flags;
> >>+};
> >>+typedef struct xen_guest_pcidev_info xen_guest_pcidev_info_t;
> >>+DEFINE_XEN_GUEST_HANDLE(xen_guest_pcidev_info_t);
> >>+/* Control whether/how we check and reserve device memory. */
> >>+struct xen_domctl_set_rdm {
> >>+ uint32_t flags;
> >
> >What is this 'flags' purpose compared to the 'pcidevs.flags'? Please
> >explain.
>
> I replied something to Kevin, and we just need a global flag so we can
> remove pcidevs.flags.
>
> >
> >>+ uint32_t num_pcidevs;
> >>+ XEN_GUEST_HANDLE_64(xen_guest_pcidev_info_t) pcidevs;
> >>+};
> >>+typedef struct xen_domctl_set_rdm xen_domctl_set_rdm_t;
> >>+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_rdm_t);
> >>+
> >> /* Pass-through interrupts: bind real irq -> hvm devfn. */
> >> /* XEN_DOMCTL_bind_pt_irq */
> >> /* XEN_DOMCTL_unbind_pt_irq */
> >>@@ -1070,6 +1089,7 @@ struct xen_domctl {
> >> #define XEN_DOMCTL_setvnumainfo 74
> >> #define XEN_DOMCTL_psr_cmt_op 75
> >> #define XEN_DOMCTL_arm_configure_domain 76
> >>+#define XEN_DOMCTL_set_rdm 77
> >> #define XEN_DOMCTL_gdbsx_guestmemio 1000
> >> #define XEN_DOMCTL_gdbsx_pausevcpu 1001
> >> #define XEN_DOMCTL_gdbsx_unpausevcpu 1002
> >>@@ -1135,6 +1155,7 @@ struct xen_domctl {
> >> struct xen_domctl_gdbsx_domstatus gdbsx_domstatus;
> >> struct xen_domctl_vnuma vnuma;
> >> struct xen_domctl_psr_cmt_op psr_cmt_op;
> >>+ struct xen_domctl_set_rdm set_rdm;
> >> uint8_t pad[128];
> >> } u;
> >> };
> >>diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> >>index d48463f..5a760e2 100644
> >>--- a/xen/xsm/flask/hooks.c
> >>+++ b/xen/xsm/flask/hooks.c
> >>@@ -592,6 +592,7 @@ static int flask_domctl(struct domain *d, int cmd)
> >> case XEN_DOMCTL_test_assign_device:
> >> case XEN_DOMCTL_assign_device:
> >> case XEN_DOMCTL_deassign_device:
> >>+ case XEN_DOMCTL_set_rdm:
> >
> >There is more to XSM than just this file..
>
> But I don't see more other stuff, like XEN_DOMCTL_assign_device.
>
> >
> >Please compile with XSM enabled.
>
> Anyway, I add XSM_ENABLE = y and FLASK_ENABLE = y in Config.mk then
> recompile, but looks good.
>
> Anything I'm missing?
Ah, then it is fine!
>
> >> #endif
> >> return 0;
> >
> >
> >Also how does this work with 32-bit dom0s? Is there a need to use the
> >compat layer?
>
> Are you saying in xsm case? Others?
>
> Actually this new DOMCTL is similar with XEN_DOMCTL_assign_device in some
> senses but I don't see such an issue you're pointing.
I was thinking about the compat layer and making sure it works properly.
>
> Thanks
> Tiejun
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |