[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server



> -----Original Message-----
> From: Yu, Zhang [mailto:yu.c.zhang@xxxxxxxxxxxxxxx]
> Sent: 07 July 2015 09:16
> To: George Dunlap; Paul Durrant
> Cc: Kevin Tian; Keir (Xen.org); Andrew Cooper; George Dunlap; xen-
> devel@xxxxxxxxxxxxx; zhiyuan.lv@xxxxxxxxx; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
> ioreq server
> 
> Thanks a lot, George.
> 
> On 7/6/2015 10:06 PM, George Dunlap wrote:
> > On Mon, Jul 6, 2015 at 2:33 PM, Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> wrote:
> >>> -----Original Message-----
> >>> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxxxxx]
> >>> Sent: 06 July 2015 14:28
> >>> To: Paul Durrant; George Dunlap
> >>> Cc: Yu Zhang; xen-devel@xxxxxxxxxxxxx; Keir (Xen.org); Jan Beulich;
> Andrew
> >>> Cooper; Kevin Tian; zhiyuan.lv@xxxxxxxxx
> >>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
> MAX_NR_IO_RANGES for
> >>> ioreq server
> >>>
> >>> On 07/06/2015 02:09 PM, Paul Durrant wrote:
> >>>>> -----Original Message-----
> >>>>> From: dunlapg@xxxxxxxxx [mailto:dunlapg@xxxxxxxxx] On Behalf Of
> >>>>> George Dunlap
> >>>>> Sent: 06 July 2015 13:50
> >>>>> To: Paul Durrant
> >>>>> Cc: Yu Zhang; xen-devel@xxxxxxxxxxxxx; Keir (Xen.org); Jan Beulich;
> >>> Andrew
> >>>>> Cooper; Kevin Tian; zhiyuan.lv@xxxxxxxxx
> >>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
> MAX_NR_IO_RANGES
> >>> for
> >>>>> ioreq server
> >>>>>
> >>>>> On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant
> <Paul.Durrant@xxxxxxxxxx>
> >>>>> wrote:
> >>>>>>> -----Original Message-----
> >>>>>>> From: dunlapg@xxxxxxxxx [mailto:dunlapg@xxxxxxxxx] On Behalf
> Of
> >>>>>>> George Dunlap
> >>>>>>> Sent: 06 July 2015 13:36
> >>>>>>> To: Yu Zhang
> >>>>>>> Cc: xen-devel@xxxxxxxxxxxxx; Keir (Xen.org); Jan Beulich; Andrew
> >>> Cooper;
> >>>>>>> Paul Durrant; Kevin Tian; zhiyuan.lv@xxxxxxxxx
> >>>>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
> >>> MAX_NR_IO_RANGES
> >>>>> for
> >>>>>>> ioreq server
> >>>>>>>
> >>>>>>> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang
> <yu.c.zhang@xxxxxxxxxxxxxxx>
> >>>>>>> wrote:
> >>>>>>>> MAX_NR_IO_RANGES is used by ioreq server as the maximum
> >>>>>>>> number of discrete ranges to be tracked. This patch changes
> >>>>>>>> its value to 8k, so that more ranges can be tracked on next
> >>>>>>>> generation of Intel platforms in XenGT. Future patches can
> >>>>>>>> extend the limit to be toolstack tunable, and
> MAX_NR_IO_RANGES
> >>>>>>>> can serve as a default limit.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>
> >>>>>>>
> >>>>>>> I said this at the Hackathon, and I'll say it here:  I think this is
> >>>>>>> the wrong approach.
> >>>>>>>
> >>>>>>> The problem here is not that you don't have enough memory
> ranges.
> >>> The
> >>>>>>> problem is that you are not tracking memory ranges, but individual
> >>>>>>> pages.
> >>>>>>>
> >>>>>>> You need to make a new interface that allows you to tag individual
> >>>>>>> gfns as p2m_mmio_write_dm, and then allow one ioreq server to
> get
> >>>>>>> notifications for all such writes.
> >>>>>>>
> >>>>>>
> >>>>>> I think that is conflating things. It's quite conceivable that more 
> >>>>>> than
> one
> >>>>> ioreq server will handle write_dm pages. If we had enough types to
> have
> >>>>> two page types per server then I'd agree with you, but we don't.
> >>>>>
> >>>>> What's conflating things is using an interface designed for *device
> >>>>> memory ranges* to instead *track writes to gfns*.
> >>>>
> >>>> What's the difference? Are you asserting that all device memory ranges
> >>> have read side effects and therefore write_dm is not a reasonable
> >>> optimization to use? I would not want to make that assertion.
> >>>
> >>> Using write_dm is not the problem; it's having thousands of memory
> >>> "ranges" of 4k each that I object to.
> >>>
> >>> Which is why I suggested adding an interface to request updates to gfns
> >>> (by marking them write_dm), rather than abusing the io range interface.
> >>>
> >>
> >> And it's the assertion that use of write_dm will only be relevant to gfns,
> and that all such notifications only need go to a single ioreq server, that I
> have a problem with. Whilst the use of io ranges to track gfn updates is, I
> agree, not ideal I think the overloading of write_dm is not a step in the 
> right
> direction.
> >
> > So there are two questions here.
> >
> > First of all, I certainly think that the *interface* should be able to
> > be transparently extended to support multiple ioreq servers being able
> > to track gfns.  My suggestion was to add a hypercall that allows an
> > ioreq server to say, "Please send modifications to gfn N to ioreq
> > server X"; and that for the time being, only allow one such X to exist
> > at a time per domain.  That is, if ioreq server Y makes such a call
> > after ioreq server X has done so, return -EBUSY.  That way we can add
> > support when we need it.
> >
> 
> Well, I also agree the current implementation is probably not optimal.
> And yes, it seems promiscuous( hope I did not use the wrong word :) )
> to mix the device I/O ranges and the guest memory. But, forwarding an
> ioreq to backend driver, just by an p2m type? Although it would be easy
> for XenGT to take this approach, I agree with Paul that this would
> weaken the functionality of ioreq server. Besides, is it appropriate
> for a p2m type to be used this way? It seems strange for me.
> 
> > In fact, you probably already have a problem with two ioreq servers,
> > because (if I recall correctly) you don't know for sure when a page
> 
> Fortunately, we do, and these unmapped page tables will be removed from
> the rangeset of ioreq server. So the following scenario won't happen. :)
> 
> > has stopped being used as a GPU pagetable.  Consider the following
> > scenario:
> > 1. Two devices, served by ioreq servers 1 and 2.
> > 2. driver for device served by ioreq server 1 allocates a page, uses
> > it as a pagetable.  ioreq server 1 adds that pfn to the ranges it's
> > watching.
> > 3. driver frees page back to guest OS; but ioreq server 1 doesn't know
> > so it doesn't release the range
> > 4. driver for device served by ioreq server 2 allocates a page, which
> > happens to be the same one used before, and uses it as a pagetable.
> > ioreq server 1 tries to add that pfn to the ranges it's watching.
> >
> > Now you have an "overlap in the range" between the two ioreq servers.
> > What do you do?
> >
> > Regarding using write_dm for actual device memory ranges: Do you have
> > any concrete scenarios in mind where you think this will be used?
> >
> > Fundamentally, write_dm looks to me like it's about tracking gfns --
> > i.e., things backed by guest RAM -- not IO ranges.  As such, it should
> > have an interface and an implementation that reflects that.
> >
> 
> Here, I guess your major concern about the difference between tracking
> gfns and I/O ranges is that the gfns are scattered? And yes, this is why
> we need more ranges inside a rangeset. Here the new value of the limit,
> 8K, is a practical one for XenGT. In the future, we can either provide
> other approaches to configure the maximum ranges inside an ioreq server,
> or provide some xenheap allocation management routines. Is this OK?
> 
> I thought we had successfully convinced you in hackathon. And seems
> I'm wrong. Anyway, your advices are very appreciated. :)
> 

George,

I wonder, would it be sufficient - at this stage - to add a new mapping sub-op 
to the HVM op to distinguish mapping of mapping gfns vs. MMIO ranges. That way 
we could use the same implementation underneath for now (using the rb_rangeset, 
which I think stands on its own merits for MMIO ranges anyway) but allow them 
to diverge later... perhaps using a new P2T (page-to-type) table, which I 
believe may become necessary as Intel reclaims bits for h/w use and thus 
squeezes our existing number of supported page types.

  Paul

> >   -George
> >
> 
> B.R.
> Yu
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxx
> > http://lists.xen.org/xen-devel
> >
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.