Xen project Mailing List

Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server

To: "Yu, Zhang" <yu.c.zhang@xxxxxxxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Tue, 7 Jul 2015 09:23:41 +0000

Accept-language: en-GB, en-US

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, "Keir \(Xen.org\)" <keir@xxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "zhiyuan.lv@xxxxxxxxx" <zhiyuan.lv@xxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Tue, 07 Jul 2015 09:23:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHQt7oUxveBe+DSk0WFiJle6GaHOZ3OP5OAgAAh2GD//+IKgIAAJrPQ///kFQCAACIgcP//6JEAACYPmwAABmZrgA==

Thread-topic: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server

> -----Original Message----- > From: Yu, Zhang [mailto:yu.c.zhang@xxxxxxxxxxxxxxx] > Sent: 07 July 2015 09:16 > To: George Dunlap; Paul Durrant > Cc: Kevin Tian; Keir (Xen.org); Andrew Cooper; George Dunlap; xen- > devel@xxxxxxxxxxxxx; zhiyuan.lv@xxxxxxxxx; Jan Beulich > Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for > ioreq server > > Thanks a lot, George. > > On 7/6/2015 10:06 PM, George Dunlap wrote: > > On Mon, Jul 6, 2015 at 2:33 PM, Paul Durrant <Paul.Durrant@xxxxxxxxxx> > wrote: > >>> -----Original Message----- > >>> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxxxxx] > >>> Sent: 06 July 2015 14:28 > >>> To: Paul Durrant; George Dunlap > >>> Cc: Yu Zhang; xen-devel@xxxxxxxxxxxxx; Keir (Xen.org); Jan Beulich; > Andrew > >>> Cooper; Kevin Tian; zhiyuan.lv@xxxxxxxxx > >>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the > MAX_NR_IO_RANGES for > >>> ioreq server > >>> > >>> On 07/06/2015 02:09 PM, Paul Durrant wrote: > >>>>> -----Original Message----- > >>>>> From: dunlapg@xxxxxxxxx [mailto:dunlapg@xxxxxxxxx] On Behalf Of > >>>>> George Dunlap > >>>>> Sent: 06 July 2015 13:50 > >>>>> To: Paul Durrant > >>>>> Cc: Yu Zhang; xen-devel@xxxxxxxxxxxxx; Keir (Xen.org); Jan Beulich; > >>> Andrew > >>>>> Cooper; Kevin Tian; zhiyuan.lv@xxxxxxxxx > >>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the > MAX_NR_IO_RANGES > >>> for > >>>>> ioreq server > >>>>> > >>>>> On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant > <Paul.Durrant@xxxxxxxxxx> > >>>>> wrote: > >>>>>>> -----Original Message----- > >>>>>>> From: dunlapg@xxxxxxxxx [mailto:dunlapg@xxxxxxxxx] On Behalf > Of > >>>>>>> George Dunlap > >>>>>>> Sent: 06 July 2015 13:36 > >>>>>>> To: Yu Zhang > >>>>>>> Cc: xen-devel@xxxxxxxxxxxxx; Keir (Xen.org); Jan Beulich; Andrew > >>> Cooper; > >>>>>>> Paul Durrant; Kevin Tian; zhiyuan.lv@xxxxxxxxx > >>>>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the > >>> MAX_NR_IO_RANGES > >>>>> for > >>>>>>> ioreq server > >>>>>>> > >>>>>>> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang > <yu.c.zhang@xxxxxxxxxxxxxxx> > >>>>>>> wrote: > >>>>>>>> MAX_NR_IO_RANGES is used by ioreq server as the maximum > >>>>>>>> number of discrete ranges to be tracked. This patch changes > >>>>>>>> its value to 8k, so that more ranges can be tracked on next > >>>>>>>> generation of Intel platforms in XenGT. Future patches can > >>>>>>>> extend the limit to be toolstack tunable, and > MAX_NR_IO_RANGES > >>>>>>>> can serve as a default limit. > >>>>>>>> > >>>>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx> > >>>>>>> > >>>>>>> I said this at the Hackathon, and I'll say it here: I think this is > >>>>>>> the wrong approach. > >>>>>>> > >>>>>>> The problem here is not that you don't have enough memory > ranges. > >>> The > >>>>>>> problem is that you are not tracking memory ranges, but individual > >>>>>>> pages. > >>>>>>> > >>>>>>> You need to make a new interface that allows you to tag individual > >>>>>>> gfns as p2m_mmio_write_dm, and then allow one ioreq server to > get > >>>>>>> notifications for all such writes. > >>>>>>> > >>>>>> > >>>>>> I think that is conflating things. It's quite conceivable that more > >>>>>> than > one > >>>>> ioreq server will handle write_dm pages. If we had enough types to > have > >>>>> two page types per server then I'd agree with you, but we don't. > >>>>> > >>>>> What's conflating things is using an interface designed for *device > >>>>> memory ranges* to instead *track writes to gfns*. > >>>> > >>>> What's the difference? Are you asserting that all device memory ranges > >>> have read side effects and therefore write_dm is not a reasonable > >>> optimization to use? I would not want to make that assertion. > >>> > >>> Using write_dm is not the problem; it's having thousands of memory > >>> "ranges" of 4k each that I object to. > >>> > >>> Which is why I suggested adding an interface to request updates to gfns > >>> (by marking them write_dm), rather than abusing the io range interface. > >>> > >> > >> And it's the assertion that use of write_dm will only be relevant to gfns, > and that all such notifications only need go to a single ioreq server, that I > have a problem with. Whilst the use of io ranges to track gfn updates is, I > agree, not ideal I think the overloading of write_dm is not a step in the > right > direction. > > > > So there are two questions here. > > > > First of all, I certainly think that the *interface* should be able to > > be transparently extended to support multiple ioreq servers being able > > to track gfns. My suggestion was to add a hypercall that allows an > > ioreq server to say, "Please send modifications to gfn N to ioreq > > server X"; and that for the time being, only allow one such X to exist > > at a time per domain. That is, if ioreq server Y makes such a call > > after ioreq server X has done so, return -EBUSY. That way we can add > > support when we need it. > > > > Well, I also agree the current implementation is probably not optimal. > And yes, it seems promiscuous( hope I did not use the wrong word :) ) > to mix the device I/O ranges and the guest memory. But, forwarding an > ioreq to backend driver, just by an p2m type? Although it would be easy > for XenGT to take this approach, I agree with Paul that this would > weaken the functionality of ioreq server. Besides, is it appropriate > for a p2m type to be used this way? It seems strange for me. > > > In fact, you probably already have a problem with two ioreq servers, > > because (if I recall correctly) you don't know for sure when a page > > Fortunately, we do, and these unmapped page tables will be removed from > the rangeset of ioreq server. So the following scenario won't happen. :) > > > has stopped being used as a GPU pagetable. Consider the following > > scenario: > > 1. Two devices, served by ioreq servers 1 and 2. > > 2. driver for device served by ioreq server 1 allocates a page, uses > > it as a pagetable. ioreq server 1 adds that pfn to the ranges it's > > watching. > > 3. driver frees page back to guest OS; but ioreq server 1 doesn't know > > so it doesn't release the range > > 4. driver for device served by ioreq server 2 allocates a page, which > > happens to be the same one used before, and uses it as a pagetable. > > ioreq server 1 tries to add that pfn to the ranges it's watching. > > > > Now you have an "overlap in the range" between the two ioreq servers. > > What do you do? > > > > Regarding using write_dm for actual device memory ranges: Do you have > > any concrete scenarios in mind where you think this will be used? > > > > Fundamentally, write_dm looks to me like it's about tracking gfns -- > > i.e., things backed by guest RAM -- not IO ranges. As such, it should > > have an interface and an implementation that reflects that. > > > > Here, I guess your major concern about the difference between tracking > gfns and I/O ranges is that the gfns are scattered? And yes, this is why > we need more ranges inside a rangeset. Here the new value of the limit, > 8K, is a practical one for XenGT. In the future, we can either provide > other approaches to configure the maximum ranges inside an ioreq server, > or provide some xenheap allocation management routines. Is this OK? > > I thought we had successfully convinced you in hackathon. And seems > I'm wrong. Anyway, your advices are very appreciated. :) > George, I wonder, would it be sufficient - at this stage - to add a new mapping sub-op to the HVM op to distinguish mapping of mapping gfns vs. MMIO ranges. That way we could use the same implementation underneath for now (using the rb_rangeset, which I think stands on its own merits for MMIO ranges anyway) but allow them to diverge later... perhaps using a new P2T (page-to-type) table, which I believe may become necessary as Intel reclaims bits for h/w use and thus squeezes our existing number of supported page types. Paul > > -George > > > > B.R. > Yu > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxx > > http://lists.xen.org/xen-devel > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.