[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces


  • To: Xin Li <xin@xxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, virtualization@xxxxxxxxxxxxxxx
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Wed, 11 Jun 2025 14:58:37 +0200
  • Authentication-results: smtp-out1.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=n7n8zpSv
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Cc: Ajay Kaher <ajay.kaher@xxxxxxxxxxxx>, Broadcom internal kernel review list <bcm-kernel-feedback-list@xxxxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Wed, 11 Jun 2025 12:58:55 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13.05.25 09:44, Xin Li wrote:
On 5/12/2025 4:20 AM, Jürgen Groß wrote:
On 09.05.25 10:18, Xin Li wrote:
On 5/6/2025 2:20 AM, Juergen Gross wrote:
I'm trying to evaluate how to add the immediate form MSR instructions
on top of this patch set.  And I'm close to get it done.

There is something to consider when running as a Xen PV guest, ...

Andrew said he doens't plan to expose WRMSRNS to PV guests, and doesn't
expect MSR_IMM to be useful in a PV guest either, which I fully agree.

Note that in the Xen PV case the RDMSR/WRMSR patching must not happen
even as an intermediate step, as this would clobber the indirect call
information needed when patching in the direct call for the Xen case.

Good point!

... as this still needs to be true.

There are 2 different ways to deal with this:

1. When running as a Xen PV guest disable X86_FEATURE_WRMSRNS and
    ASM_WRMSRNS_IMM (e.g. in xen_init_capabilities()).

2. Buffer the original instruction before patching in apply_alternatives()
    in order to avoid the sequence limitation above (see attached patch).

Deciding whether to retain the pvops MSR API is the responsibility of
the x86 maintainers, who are the ones experiencing the challenges of maintaining the code.

Well, I'm the PV ops maintainer, so it is basically me who needs to deal
with this. OTOH I do understand that diagnosis of problems with PV ops is
more complicated than without.

Indeed, significant improvements continue to be implemented.



tglx said @https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/:

 > I fundamentaly hate adding this to the PV infrastructure. We don't
 > want more PV ops, quite the contrary.

That is the reason I took a different direction, i.e., removing the
pvops MSR APIs.  But if your approach is cleaner, they may prefer it.

In the end it isn't adding additional PV ops interfaces. It is modifying
existing ones.


diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/ paravirt.h
index a463c747c780..df10b0e4f7b8 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -175,24 +175,72 @@ static inline void __write_cr4(unsigned long x)
      PVOP_VCALL1(cpu.write_cr4, x);
  }
-static inline u64 paravirt_read_msr(u32 msr)
+static __always_inline u64 paravirt_read_msr(u32 msr)
  {
-    return PVOP_CALL1(u64, cpu.read_msr, msr);
+    EAX_EDX_DECLARE_ARGS(val, low, high);

This is under CONFIG_PARAVIRT_XXL, thus CONFIG_XEN_PV and CONFIG_X86_64,
therefore we don't need to consider 32-bit at all, no?

Right. OTOH the macros are there, so why not use them?

In the end I'm fine to open code the 64-bit case here.


Here is a patch I cooked.  I added an ALTERNATIVE() hack because the new instructions can't be more than 6 bytes long.  But with the patch you
just sent, it shouldn't be needed.

I have meanwhile dropped the patch copying the original indirect call.

Reason is that I'm seeing a potential risk with current alternative
patching when using ALTERNATIVE_[23](): depending on the tested features
it might happen that an instruction sequence not suitable for the current
runtime environment is patched in as an intermediate step. In case there
is an interrupt happening just then AND the handling of the interrupt is
using the patch site, this could result in crashes or undefined behavior.

I have meanwhile a set of 3 patches fixing that problem by merging all
alternative patching of a patch site in the local buffer and only then
patching the code at the target site with the final result.

The same problem arises with your code below, but this time it isn't
fixed by my patches: the two ALTERNATIVE() instances in the asm() construct
would need to be patched in a single atomic operation to be consistent.
Otherwise you could end up e.g. on bare metal with paravirt_read_msr()
having replaced the indirect call with "rdmsr", but not yet having added
the code to merge %rdx into %rax.

I'm just doing a V2 of my series, but this time including the additional
support of the non-serializing and immediate forms. Lets see how this will
look like. I will drop using the EAX_EDX_* macros, but due to the reason
mentioned above I won't switch to your variant completely.


diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index df10b0e4f7b8..82ffc11d6f1f 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -177,18 +177,20 @@ static inline void __write_cr4(unsigned long x)

  static __always_inline u64 paravirt_read_msr(u32 msr)
  {
-    EAX_EDX_DECLARE_ARGS(val, low, high);
+    u64 val;

      PVOP_TEST_NULL(cpu.read_msr);
      asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
                      "rdmsr", ALT_NOT_XEN,
                      ALT_CALL_INSTR, ALT_XENPV_CALL)
+             ALTERNATIVE("", "shl $0x20, %%rdx; or %%rdx, %%rax", ALT_NOT_XEN)
               "2:\n"
               _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
-             : EAX_EDX_RET(val, low, high), ASM_CALL_CONSTRAINT
-             : paravirt_ptr(cpu.read_msr), "c" (msr));
+             : "=a" (val), ASM_CALL_CONSTRAINT
+             : paravirt_ptr(cpu.read_msr), "c" (msr)
+             : "rdx");

-    return EAX_EDX_VAL(val, low, high);
+    return val;

Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.