[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v7] x86/p2m: use large pages for MMIO mappings
>>> On 09.02.16 at 11:56, <george.dunlap@xxxxxxxxxx> wrote: > On 09/02/16 08:42, Jan Beulich wrote: >>>>> On 08.02.16 at 19:04, <george.dunlap@xxxxxxxxxx> wrote: >>> First -- and this isn't a blocker, but just a question (and sorry if >>> it's been answered before) -- why have the "0 means I did it all, <nr >>> means I did it partially"? Why not just return the number of entries >>> actually handled? >> >> Because I view zero as the conventional "success" indicator. No >> other deeper reason. > > For certain classes of functionality, yes. But for other classes -- > where the caller will request N but the OS / hypervisor / whatever may > do some other number fewer than N, then the convention is to always pass > the number of items completed. read() and write() system calls are like > this, as are most of the XENMEM operations (e.g., > XENMEM_increase_resevration). > > Since this domctl looks a lot like some of those XENMEM operations, > wouldn't it make more sense to follow that convention, and return the > number of extents actually allocated? Well, if comparing with the XENMEM ones, perhaps. But if comparing with other domctls, perhaps rather not? I've really been undecided here from the very beginning, since I had considered that alternative right away. >>>> @@ -949,14 +965,21 @@ static int set_typed_p2m_entry(struct do >>>> static int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, >>>> mfn_t mfn) >>>> { >>>> - return set_typed_p2m_entry(d, gfn, mfn, p2m_map_foreign, >>>> + return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, >>>> p2m_map_foreign, >>>> p2m_get_hostp2m(d)->default_access); >>>> } >>>> >>>> int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, >>>> - p2m_access_t access) >>>> + unsigned int order, p2m_access_t access) >>>> { >>>> - return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access); >>>> + if ( order && >>>> + rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn), >>>> + mfn_x(mfn) + (1UL << order) - 1) && >>>> + !rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn), >>>> + mfn_x(mfn) + (1UL << order) - 1) ) >>>> + return order; >>> >>> What is this about? Isn't set_mmio_p2m_entry() now supposed to have a >>> calling convention like clear_mmio_p2m_entry(), where it returns >>> recommended_order+1 if you should retry with order, and this value >>> (recommended_order) is guaranteed to be strictly less than what was >>> passed in? >>> >>> Did you mean to return PAGE_ORDER_4k+1 here, perhaps? (Or (order-9)+1?) >> >> No. This is catering for callers of both {,ept_}p2m_type_to_flags() >> not easily being in the position of requesting further page splitting. >> Hence rather than complicate the code there, it is here where we >> make sure that the r/o enforcement won't be missed. > > I wasn't asking what the check is about; I'm asking about the return > value. Did you really mean that if someone passed in order 9, that it > would say "Please try with order 8 instead"? I'm asking because I would > have expected that to be "Please try with order 0 instead". > > (Although my question wasn't clear because I thought it was returning > the *same* value, not one less. A brief comment like, "Request trying > again with order-1" would be helpful if we end up retaining this return > value.) > > I'm still becoming somewhat familiar with the p2m code, but it looks to > me like ept_set_entry() at least doesn't tolerate orders that aren't > multiples of EPT_TABLE_ORDER. > > This conditional will only actually do anything under the following > conditions: > > a. the requested gfn order is 9 (since we're not doing 1G pages yet) > > b. at order 9, the mfn range partially overlaps mmio_ro_ranges > > c. the p2m entry for that gfn is order 9 (or larger) > > (Re c: i.e., if the order in the p2m entry is already 0, then > set_typed_p2m_entry() will already request a 4K page, making this entire > conditional redundant.) > > Let's also suppose for simplicity that: > > d. at order 8, the mfn range does *not* partially overlap mmio_ro_ranges > anymore. > > Under those conditions, it looks to me like the following will happen: > 1. map_mmio_regions() will call set_mmio_p2m_entry() with order 9 > 2. set_mmio_p2m_entry() will return 9 (requesting using order 8) > 3. map_mmio_regions() will call set_mmio_p2m_entry() with order 8 > 4. set_mmio_p2m_entry() will call set_typed_p2m_entry() with order 8 > 5. set_typed_p2m_entry() will read the current p2m entry of >= 9 and > continue (since 8 <= 9) > 6. set_typed_p2m_entry() will call p2m_set_entry() with an order of 8 > 7. ept_set_entry() will return -EINVAL, since order % EPT_TABLE_ORDER != 0. > > Am I missing something? Between step 6 and step 7 there is actually p2m_set_entry() breaking up the request into chunks or orders the hardware supports. So to answer your first question above - yes, the intention was to reduce orders in small steps, even if the hardware doesn't support those orders. Indeed this wasn't the most efficient way of doing it (as Andrew also noted), but the intention specifically was for that code to not needlessly depend on hardware characteristics. But this is all moot now anyway, with the return value going to become PAGE_ORDER_4K+1 (due to the below). >>>> --- a/xen/arch/x86/mm/p2m-ept.c >>>> +++ b/xen/arch/x86/mm/p2m-ept.c >>>> @@ -136,6 +136,7 @@ static void ept_p2m_type_to_flags(struct >>>> entry->r = entry->x = 1; >>>> entry->w = !rangeset_contains_singleton(mmio_ro_ranges, >>>> entry->mfn); >>>> + ASSERT(entry->w || !is_epte_superpage(entry)); >>> >>> What is this about? Single-page epte entries are allowed to be either >>> read-only or read-write, but superpage entries have to have write >>> permission? >> >> Yes, due to the lack of an "order" input here, and (as mentioned >> above) the difficulties of this function's callers otherwise needing >> to be able to deal with further page splitting directives (coming >> out of this function). >> >> But now that you mention this I think you're right further up: >> That code as it is now indeed seems to put things up for the >> ASSERT() here to actually be triggerable (if there would be a >> large enough r/o MMIO region). So it looks like code further >> up should indeed use PAGE_ORDER_4K+1 here, as you suggest >> (which would then also address one of Andrew's concerns). > > You could either get rid of the "!rangeset_contains_range()" in > set_mmio_p2m_entry() (not allowing >4k orders even if the entire range > fits), or you could change these assertions to check that either the > entire range is contained or that it doesn't overlap (essentially the > inverse of the check at set_mmio_p2m_entry()). > > I don't immediately see any reason why you couldn't use superpages if > the entire mfn range fits within mmio_ro_ranges. There indeed is no technical reason other than - as said - the difficulty of enforcing the page split at the time the new entry is about to be written (when its flags get calculated). > (Although if you have > reason to believe that these almost never contain superpages, it would > certainly make the code simpler to just always use 4k pages in case of > an overlap.) The MMCFG area(s) could easily be (a) large one(s), but we don't expose this to other than Dom0. MSI-X tables, otoh, will most likely be solitary pages. VT-d's RMRRs are less simple to predict, considering that they're reportedly also being used for graphics devices. But I simply haven't got enough time to do the necessary surgery right now that would be needed to support large page r/o MMIO mappings. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |