[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server



Thanks a lot, George.

On 7/6/2015 10:06 PM, George Dunlap wrote:
On Mon, Jul 6, 2015 at 2:33 PM, Paul Durrant <Paul.Durrant@xxxxxxxxxx> wrote:
-----Original Message-----
From: George Dunlap [mailto:george.dunlap@xxxxxxxxxxxxx]
Sent: 06 July 2015 14:28
To: Paul Durrant; George Dunlap
Cc: Yu Zhang; xen-devel@xxxxxxxxxxxxx; Keir (Xen.org); Jan Beulich; Andrew
Cooper; Kevin Tian; zhiyuan.lv@xxxxxxxxx
Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
ioreq server

On 07/06/2015 02:09 PM, Paul Durrant wrote:
-----Original Message-----
From: dunlapg@xxxxxxxxx [mailto:dunlapg@xxxxxxxxx] On Behalf Of
George Dunlap
Sent: 06 July 2015 13:50
To: Paul Durrant
Cc: Yu Zhang; xen-devel@xxxxxxxxxxxxx; Keir (Xen.org); Jan Beulich;
Andrew
Cooper; Kevin Tian; zhiyuan.lv@xxxxxxxxx
Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES
for
ioreq server

On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant <Paul.Durrant@xxxxxxxxxx>
wrote:
-----Original Message-----
From: dunlapg@xxxxxxxxx [mailto:dunlapg@xxxxxxxxx] On Behalf Of
George Dunlap
Sent: 06 July 2015 13:36
To: Yu Zhang
Cc: xen-devel@xxxxxxxxxxxxx; Keir (Xen.org); Jan Beulich; Andrew
Cooper;
Paul Durrant; Kevin Tian; zhiyuan.lv@xxxxxxxxx
Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
MAX_NR_IO_RANGES
for
ioreq server

On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>
wrote:
MAX_NR_IO_RANGES is used by ioreq server as the maximum
number of discrete ranges to be tracked. This patch changes
its value to 8k, so that more ranges can be tracked on next
generation of Intel platforms in XenGT. Future patches can
extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
can serve as a default limit.

Signed-off-by: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>

I said this at the Hackathon, and I'll say it here:  I think this is
the wrong approach.

The problem here is not that you don't have enough memory ranges.
The
problem is that you are not tracking memory ranges, but individual
pages.

You need to make a new interface that allows you to tag individual
gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
notifications for all such writes.


I think that is conflating things. It's quite conceivable that more than one
ioreq server will handle write_dm pages. If we had enough types to have
two page types per server then I'd agree with you, but we don't.

What's conflating things is using an interface designed for *device
memory ranges* to instead *track writes to gfns*.

What's the difference? Are you asserting that all device memory ranges
have read side effects and therefore write_dm is not a reasonable
optimization to use? I would not want to make that assertion.

Using write_dm is not the problem; it's having thousands of memory
"ranges" of 4k each that I object to.

Which is why I suggested adding an interface to request updates to gfns
(by marking them write_dm), rather than abusing the io range interface.


And it's the assertion that use of write_dm will only be relevant to gfns, and 
that all such notifications only need go to a single ioreq server, that I have 
a problem with. Whilst the use of io ranges to track gfn updates is, I agree, 
not ideal I think the overloading of write_dm is not a step in the right 
direction.

So there are two questions here.

First of all, I certainly think that the *interface* should be able to
be transparently extended to support multiple ioreq servers being able
to track gfns.  My suggestion was to add a hypercall that allows an
ioreq server to say, "Please send modifications to gfn N to ioreq
server X"; and that for the time being, only allow one such X to exist
at a time per domain.  That is, if ioreq server Y makes such a call
after ioreq server X has done so, return -EBUSY.  That way we can add
support when we need it.


Well, I also agree the current implementation is probably not optimal.
And yes, it seems promiscuous( hope I did not use the wrong word :) )
to mix the device I/O ranges and the guest memory. But, forwarding an
ioreq to backend driver, just by an p2m type? Although it would be easy
for XenGT to take this approach, I agree with Paul that this would
weaken the functionality of ioreq server. Besides, is it appropriate
for a p2m type to be used this way? It seems strange for me.

In fact, you probably already have a problem with two ioreq servers,
because (if I recall correctly) you don't know for sure when a page

Fortunately, we do, and these unmapped page tables will be removed from
the rangeset of ioreq server. So the following scenario won't happen. :)

has stopped being used as a GPU pagetable.  Consider the following
scenario:
1. Two devices, served by ioreq servers 1 and 2.
2. driver for device served by ioreq server 1 allocates a page, uses
it as a pagetable.  ioreq server 1 adds that pfn to the ranges it's
watching.
3. driver frees page back to guest OS; but ioreq server 1 doesn't know
so it doesn't release the range
4. driver for device served by ioreq server 2 allocates a page, which
happens to be the same one used before, and uses it as a pagetable.
ioreq server 1 tries to add that pfn to the ranges it's watching.

Now you have an "overlap in the range" between the two ioreq servers.
What do you do?

Regarding using write_dm for actual device memory ranges: Do you have
any concrete scenarios in mind where you think this will be used?

Fundamentally, write_dm looks to me like it's about tracking gfns --
i.e., things backed by guest RAM -- not IO ranges.  As such, it should
have an interface and an implementation that reflects that.


Here, I guess your major concern about the difference between tracking
gfns and I/O ranges is that the gfns are scattered? And yes, this is why
we need more ranges inside a rangeset. Here the new value of the limit, 8K, is a practical one for XenGT. In the future, we can either provide
other approaches to configure the maximum ranges inside an ioreq server,
or provide some xenheap allocation management routines. Is this OK?

I thought we had successfully convinced you in hackathon. And seems
I'm wrong. Anyway, your advices are very appreciated. :)

  -George


B.R.
Yu
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.