Xen project Mailing List

Re: [PATCH] x86/EPT: relax iPAT for "invalid" MFNs

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 11 Jun 2024 13:52:58 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Delivery-date: Tue, 11 Jun 2024 11:53:03 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 11.06.2024 13:08, Roger Pau Monné wrote: > On Tue, Jun 11, 2024 at 11:33:24AM +0200, Jan Beulich wrote: >> On 11.06.2024 11:02, Roger Pau Monné wrote: >>> On Tue, Jun 11, 2024 at 10:26:32AM +0200, Jan Beulich wrote: >>>> On 11.06.2024 09:41, Roger Pau Monné wrote: >>>>> On Mon, Jun 10, 2024 at 04:58:52PM +0200, Jan Beulich wrote: >>>>>> --- a/xen/arch/x86/mm/p2m-ept.c >>>>>> +++ b/xen/arch/x86/mm/p2m-ept.c >>>>>> @@ -503,7 +503,8 @@ int epte_get_entry_emt(struct domain *d, >>>>>> >>>>>> if ( !mfn_valid(mfn) ) >>>>>> { >>>>>> - *ipat = true; >>>>>> + *ipat = type != p2m_mmio_direct || >>>>>> + (!is_iommu_enabled(d) && !cache_flush_permitted(d)); >>>>> >>>>> Looking at this, shouldn't the !mfn_valid special case be removed, and >>>>> mfns without a valid page be processed normally, so that the guest >>>>> MTRR values are taken into account, and no iPAT is enforced? >>>> >>>> Such removal is what, in the post commit message remark, I'm referring to >>>> as "moving to too lax". Doing so might be okay, but will imo be hard to >>>> prove to be correct for all possible cases. Along these lines goes also >>>> that I'm adding the IOMMU-enabled and cache-flush checks: In principle >>>> p2m_mmio_direct should not be used when neither of these return true. Yet >>>> a similar consideration would apply to the immediately subsequent if(). >>>> >>>> Removing this code would, in particular, result in INVALID_MFN getting a >>>> type of WB by way of the subsequent if(), unless the type there would >>>> also be p2m_mmio_direct (which, as said, it ought to never be for non- >>>> pass-through domains). That again _may_ not be a problem as long as such >>>> EPT entries would never be marked present, yet that's again difficult to >>>> prove. >>> >>> My understanding is that the !mfn_valid() check was a way to detect >>> MMIO regions in order to exit early and set those to UC. I however >>> don't follow why the guest MTRR settings shouldn't also be applied to >>> those regions. >> >> It's unclear to me whether the original purpose of he check really was >> (just) MMIO. It could as well also have been to cover the (then not yet >> named that way) case of INVALID_MFN. >> >> As to ignoring guest MTRRs for MMIO: I think that's to be on the safe >> side. We don't want guests to map uncachable memory with a cachable >> memory type. Yet control isn't fine grained enough to prevent just >> that. Hence why we force UC, allowing merely to move to WC via PAT. > > Would that be to cover up for guests bugs, or there's a coherency > reason for not allowing guests to access memory using fully guest > chosen cache attributes? I think the main reason is that this way we don't need to bother thinking of whether MMIO regions may need caches flushed in order for us to be sure memory is all up-to-date. But I have no insight into what the original reasons here may have been. > I really wonder whether Xen has enough information to figure out > whether a hole (MMIO region) is supposed to be accessed as UC or > something else. It certainly hasn't, and hence is erring on the (safe) side of forcing UC. > Your proposed patch already allows guest to set such attributes in > PAT, and hence I don't see why also taking guest MTRRs into account > would be any worse. Whatever the guest sets in PAT, UC in EMT will win except fot the special case of WC. >>>>> I also think this likely wants a: >>>>> >>>>> Fixes: 81fd0d3ca4b2 ('x86/hvm: simplify 'mmio_direct' check in >>>>> epte_get_entry_emt()') >>>> >>>> Oh, indeed, I should have dug out when this broke. I didn't because I >>>> knew this mfn_valid() check was there forever, neglecting that it wasn't >>>> always (almost) first. >>>> >>>>> As AFAICT before that commit direct MMIO regions would set iPAT to WB, >>>>> which would result in the correct attributes (albeit guest MTRR was >>>>> still ignored). >>>> >>>> Two corrections here: First iPAT is a boolean; it can't be set to WB. >>>> And then what was happening prior to that change was that for the APIC >>>> access page iPAT was set to true, thus forcing WB there. iPAT was left >>>> set to false for all other p2m_mmio_direct pages, yielding (PAT- >>>> overridable) UC there. >>> >>> Right, that behavior was still dubious to me, as I would assume those >>> regions would also want to fetch the type from guest MTRRs. >> >> Well, for the APIC access page we want to prevent it becoming UC. It's MMIO >> from the guest's perspective, yet _we_ know it's really ordinary RAM. For >> actual MMIO see above; the only case where we probably ought to respect >> guest MTRRs is when they say WC (following from what I said further up). >> Yet that's again an independent change to (possibly) make. > > For emulated devices we might map regular RAM into what the guest > otherwise thinks it's MMIO. Right, and for non-pass-through domains we force everything to WB already. > Maybe the mfn_valid() check should be > inverted, and return WB when the underlying mfn is RAM, and otherwise > use the guest MTRRs to decide the cache attribute? First: Whether WB is correct for RAM isn't known. With some peculiar device assigned, the guest may want/need part of its RAM be e.g. WC or WT. (It's only without any physical devices assigned that we can be quite sure that WB is good for all of RAM.) Therefore, second, I think respecting MTRRs for RAM is less likely to cause problems than respecting them for MMIO. I think at this point the main question is: Do we want to do things at least along the lines of this v1, or do we instead feel certain enough to switch the mfn_valid() to a comparison against INVALID_MFN (and perhaps moving it up to almost the top of the function)? One caveat here that I forgot to mention before: MFNs taken out of EPT entries will never be INVALID_MFN, for the truncation that happens when populating entries. In that case we rely on mfn_valid() to be "rejecting" them. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.