[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through
>>> On 21.03.17 at 02:53, <xuquan8@xxxxxxxxxx> wrote: > On March 20, 2017 3:35 PM, Jan Beulich wrote: >>>>> On 20.03.17 at 02:58, <xuquan8@xxxxxxxxxx> wrote: >>> On March 16, 2017 11:32 PM, Jan Beulich wrote: >>>>>>> On 16.03.17 at 15:21, <xuquan8@xxxxxxxxxx> wrote: >>>>> On March 16, 2017 10:06 PM, Jan Beulich wrote: >>>>>>>>> On 16.03.17 at 14:55, <xuquan8@xxxxxxxxxx> wrote: >>>>>>> I try to pass-through a device with 8G large bar, such as nvidia >>>>>>> M60(note1, pci-e info as below). It takes about '__15 sconds__' to >>>>>>> update 8G large bar in QEMU::xen_pt_region_update().. >>>>>>> Specifically, it is xc_domain_memory_mapping() in >>>>xen_pt_region_update(). >>>>>>> >>>>>>> Digged into xc_domain_memory_mapping(), I find it mainly call >>>>>>> "do_domctl >>>>>>> (…case XEN_DOMCTL_memory_mapping…)" >>>>>>> to mapping mmio region.. of cause, I find out that this mapping >>>>>>> could take a while in the code comment below ' case >>>>>>XEN_DOMCTL_memory_mapping '. >>>>>>> >>>>>>> my questions: >>>>>>> 1. could we make this mapping mmio region quicker? >>>>>> >>>>> >>>>> Thanks for your quick reply. >>>>> >>>>>>Yes, e.g. by using large (2M or 1G) pages. This has been on my todo >>>>>>list for quite a while... >>>>>> >>>>>>> 2. if could not, does it limit by hardware performance? >>>>>> >>>>>>I'm afraid I don't understand the question. If you mean "Is it >>>>>>limited by hw performance", then no, see above. If you mean "Does it >>>>>>limit hw performance", then again no, I don't think so (other than >>>>>>the effect of having more IOMMU translation levels than really >>>>>>necessary for such >>>>large a region). >>>>>> >>>>> >>>>> Sorry, my question is "Is it limited by hw performance"... >>>>> >>>>> I am much confused. why does this mmio mapping take a while? >>>>> I guessed it takes a lot of time to set up p2m / iommu entry. That's >>>>> why I ask "Is it limited by hw performance". >>>> >>>>Well, just count the number of page table entries and that of the >>>>resulting hypercall continuations. It's the sheer amount of work >>>>that's causing the slowness, together with the need for us to use >>>>continuations to be on the safe side. There may well be redundant TLB >>>>invalidations as well. Since we can do better (by using large >>>>pages) I wouldn't call this "limited by hw performance", but of course >>>>one may. >>>> >>> >>> I agree. >>> So far as I know, xen&qemu upstream doesn't support to pass-through >>> large bar (pci-e bar > 4G) device, such as nvidia M60, However cloud >>> providers may want to leverage this feature for machine learning .etc. >>> Is it on your TODO list? >> >>Is what on my todo list? > > support to pass-through large bar (pci-e bar > 4G) device.. > >> I was assuming large BAR handling to work so far >>(Konrad had done some adjustments there quite a while ago, from all I > recall). >> > > > _iirc_ what Konrad mentioned was using qemu-trad.. Quite possible (albeit my memory says hvmloader), but the qemu side (trad or upstream) isn't my realm anyway. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |