[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/AMD: Add support for AMD's OSVW feature in guests



Responses inline.

On 01/17/12 03:26, Jan Beulich wrote:
On 16.01.12 at 22:11, Boris Ostrovsky<boris.ostrovsky@xxxxxxx>  wrote:
--- a/tools/libxc/xc_cpuid_x86.c        Mon Jan 09 16:01:44 2012 +0100
+++ b/tools/libxc/xc_cpuid_x86.c        Mon Jan 16 22:08:09 2012 +0100
@@ -108,6 +108,7 @@ static void amd_xc_cpuid_policy(
                      bitmaskof(X86_FEATURE_SSE4A) |
                      bitmaskof(X86_FEATURE_MISALIGNSSE) |
                      bitmaskof(X86_FEATURE_3DNOWPREFETCH) |
+                   bitmaskof(X86_FEATURE_OSVW) |

Indentation.

Also, is this really meaningful to PV guests?

Does amd_xc_cpuid_policy() get called for PV guests? I thought this is HVM path only.

xc_cpuid_pv_policy() is where clear_bit(X86_FEATURE_OSVW, regs[2]) was removed to handle PV.

I believe this (i.e. OSVW changes) is meaningful for PV. Taking erratum 400 as an example -- we don't need a Linux PV guest reading an MSR before going to idle (in amd_e400_idle()).

And valid for pre-Fam10?

No, it indeed shouldn't be set in those cases.

I could set the bit conditionally, based on host family but the trouble is I don't necessarily know what family the guest will be. Or is this known at this point?


                      bitmaskof(X86_FEATURE_XOP) |
                      bitmaskof(X86_FEATURE_FMA4) |
                      bitmaskof(X86_FEATURE_TBM) |
--- a/xen/arch/x86/cpu/amd.c    Mon Jan 09 16:01:44 2012 +0100
+++ b/xen/arch/x86/cpu/amd.c    Mon Jan 16 22:08:09 2012 +0100
@@ -32,6 +32,13 @@
  static char opt_famrev[14];
  string_param("cpuid_mask_cpu", opt_famrev);

+/*
+ * Set osvw_len to higher number when updated Revision Guides
+ * are published and we know what the new status bits are
+ */

This is ugly, as it requires permanent attention. The value to start
with should really be 64 (beyond which other code adjustments are
going to be necessary anyway).

I went back and forth on this. The reason I settled on 4 is because we obviously don't know what errata the higher bits will cover and because it is (at least theoretically) possible that a guest shouldn't see the same status bits as the host.

But maybe code maintenance is more burden than it's worth?


+static uint64_t osvw_length = 4, osvw_status;
+static DEFINE_SPINLOCK(amd_lock);
+
  static inline void wrmsr_amd(unsigned int index, unsigned int lo,
                unsigned int hi)
  {
@@ -182,6 +189,35 @@ static void __devinit set_cpuidmask(cons
        }
  }

+static void amd_guest_osvw_init(struct vcpu *vcpu)
+{
+    if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+       return;

Shouldn't we bail here for pre-Fam10 CPUs too?

As you noted below the vendor check should be removed anyway since this is being called from under such a check already.

As for family --- I think I need to solve how X86_FEATURE_OSVW is set and that will take care of whether anything that gets initialized in this routine is ever used.


Further, as asked above already - is all this really meaningful to PV
guests? Otherwise the code would better go somewhere under
xen/arch/x86/hvm/svm/ ...

Also, throughout this file indentation needs to be changed to Linux
style (tabs instead of spaces), unless you want to contribute a patch
to convert the whole file to Xen style (in which case you'd still need
to make adjustments here).

+
+    /*
+     * Guests should see errata 400 and 415 as fixed (assuming that
+     * HLT and IO instructions are intercepted).
+     */
+    vcpu->arch.amd.osvw.length = (osvw_length>= 3) ? (osvw_length) : 3;

An expression consisting of just an identifier for sure doesn't need
parentheses.

+    vcpu->arch.amd.osvw.status = osvw_status&  ~(6ULL);
+
+    /*
+     * By increasing VCPU's osvw.length to 3 we are telling the guest that
+     * all osvw.status bits inside that length, including bit 0 (which is
+     * reserved for erratum 298), are valid. However, if host processor's
+     * osvw_len is 0 then osvw_status[0] carries no information. We need to
+     * be conservative here and therefore we tell the guest that erratum 298
+     * is present (because we really don't know).
+     */
+    if (osvw_length == 0&&  boot_cpu_data.x86 == 0x10)

Why do you check the family here? If osvw_length can only ever be
zero on Fam10, then the first check is sufficient. If osvw_length can
be zero on other than Fam10 (no matter whether we bail early older
CPUs), then the check is actually wrong.

10h is the only family affected by this erratum.


+       vcpu->arch.amd.osvw.status |= 1;
+}
+
+void amd_vcpu_initialise(struct vcpu *vcpu)
+{
+    amd_guest_osvw_init(vcpu);
+}
+
  /*
   * Check for the presence of an AMD erratum. Arguments are defined in amd.h

   * for each known erratum. Return 1 if erratum is found.
@@ -512,6 +548,30 @@ static void __devinit init_amd(struct cp
        set_cpuidmask(c);

        check_syscfg_dram_mod_en();
+
+    /*
+     * Get OSVW bits. If bits are not the same on different processors then
+     * choose the worst case (i.e. if erratum is present on one processor and
+     * not on another assume that the erratum is present everywhere).
+     */
+     if (test_bit(X86_FEATURE_OSVW,&boot_cpu_data.x86_capability)) {
+         uint64_t len, status;
+
+        if (rdmsr_safe(MSR_AMD_OSVW_ID_LENGTH, len) ||
+            rdmsr_safe(MSR_AMD_OSVW_STATUS, status))
+            len = status = 0;
+
+        spin_lock(&amd_lock);

What is the locking here supposed to protect against? The AP call tree
is smp_callin() ->  identify_cpu() ->  init_amd(), which cannot race with
anything (as the initiating processor is waiting for cpu_state to change
to CPU_STATE_CALLIN).

+
+        if (len<  osvw_length)
+            osvw_length = len;
+
+        osvw_status |= status;
+        osvw_status&= (1ULL<<  osvw_length) - 1;
+
+        spin_unlock(&amd_lock);
+    } else
+        osvw_length = osvw_status = 0;
  }

  static struct cpu_dev amd_cpu_dev __cpuinitdata = {
--- a/xen/arch/x86/domain.c     Mon Jan 09 16:01:44 2012 +0100
+++ b/xen/arch/x86/domain.c     Mon Jan 16 22:08:09 2012 +0100
@@ -422,6 +422,10 @@ int vcpu_initialise(struct vcpu *v)
      if ( (rc = vcpu_init_fpu(v)) != 0 )
          return rc;

+    /* Vendor-specific initialization */
+    if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)

If you check the vendor here, you don't need to anywhere in the
descendant functions.

+        amd_vcpu_initialise(v);
+
      if ( is_hvm_domain(d) )
      {
          rc = hvm_vcpu_initialise(v);
--- a/xen/arch/x86/hvm/svm/svm.c        Mon Jan 09 16:01:44 2012 +0100
+++ b/xen/arch/x86/hvm/svm/svm.c        Mon Jan 16 22:08:09 2012 +0100
@@ -1044,6 +1044,27 @@ static void svm_init_erratum_383(struct
      }
  }

+static int svm_handle_osvw(struct vcpu *v, uint32_t msr, uint64_t *val, uint 
read)
+{
+    uint eax, ebx, ecx, edx;
+
+    /* Guest OSVW support */
+    hvm_cpuid(0x80000001,&eax,&ebx,&ecx,&edx);
+    if (!test_bit((X86_FEATURE_OSVW&  31),&ecx))

Formatting of changes to this file should be changed to Xen style.

+        return -1;
+
+    if (read) {
+        if (msr == MSR_AMD_OSVW_ID_LENGTH)
+            *val = v->arch.amd.osvw.length;
+        else
+            *val = v->arch.amd.osvw.status;
+    }
+    /* Writes are ignored */
+
+    return 0;
+}
+
+
  static int svm_cpu_up(void)
  {
      uint64_t msr_content;
--- a/xen/arch/x86/traps.c      Mon Jan 09 16:01:44 2012 +0100
+++ b/xen/arch/x86/traps.c      Mon Jan 16 22:08:09 2012 +0100
@@ -2542,6 +2542,15 @@ static int emulate_privileged_op(struct
              if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
                  goto fail;
              break;
+        case MSR_AMD_OSVW_ID_LENGTH:
+        case MSR_AMD_OSVW_STATUS:
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD ) {
+                if (!boot_cpu_has(X86_FEATURE_OSVW))
+                    goto fail;
+                else

Bogus "else" after a "goto". And I wonder, provided there is some
point in doing all this for PV guests in the first place, whether this
shouldn't be kept getting handled by the default case.


If we go into default case we will emit "Domain attempted WRMSR ..." message in the log. I was trying to avoid that since writing to these registers is not really an error, it just gets ignored.

I'll fix the rest.

Thanks for review.
-boris


+                    break; /* Writes are ignored */
+            }
+            /* Fall through to default case */
          default:
              if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) )
                  break;
@@ -2573,6 +2582,7 @@ static int emulate_privileged_op(struct
          break;

      case 0x32: /* RDMSR */
+

Stray addition of a newline.

          switch ( (u32)regs->ecx )
          {
  #ifdef CONFIG_X86_64
@@ -2632,6 +2642,23 @@ static int emulate_privileged_op(struct
              regs->eax = (uint32_t)msr_content;
              regs->edx = (uint32_t)(msr_content>>  32);
              break;
+        case MSR_AMD_OSVW_ID_LENGTH:
+        case MSR_AMD_OSVW_STATUS:
+            if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
+                if (!boot_cpu_has(X86_FEATURE_OSVW))
+                    goto fail;
+                else {

Bogus "else" after a "goto".

Jan

+                    if ((u32)regs->ecx == MSR_AMD_OSVW_ID_LENGTH)
+                        msr_content = v->arch.amd.osvw.length;
+                    else
+                        msr_content = v->arch.amd.osvw.status;
+
+                    regs->eax = (uint32_t)msr_content;
+                    regs->edx = (uint32_t)(msr_content>>  32);
+                }
+            } else
+                goto rdmsr_normal;
+            break;
          default:
              if ( rdmsr_hypervisor_regs(regs->ecx,&val) )
              {





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.