[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/2][XTF] xtf/vpmu: MSR read/write tests for VPMU



Adding personal email id for future correspondence on this thread - mgambhir@xxxxxxxxxxx

On 05/04/2017 05:13 PM, Mohit Gambhir wrote:


On 05/03/2017 04:41 PM, Andrew Cooper wrote:
On 25/04/17 22:45, Mohit Gambhir wrote:
diff --git a/tests/vpmu/Makefile b/tests/vpmu/Makefile
new file mode 100644
index 0000000..1eaf436
--- /dev/null
+++ b/tests/vpmu/Makefile
@@ -0,0 +1,9 @@
+include $(ROOT)/build/common.mk
+
+NAME      := vpmu
+CATEGORY  := utility
utilities don't get run automatically.  Is this intentional?  If it
isn't, what is the plan for making it automatically run? vpmu is still
disabled by default in all branches due to the security vulnerabilities,
so even if this vpmu test was automatically run, it would skip due to
vpmu not being visible.
The reason I wanted it to not run automatically was because I thought
it will fail when vpmu
is not enabled - which is the default case. But if XTF will skip vpmu
tests when it is disabled
then we can run the tests automatically. Should the CATEGORY be
functional in that case?
As a tangent, I wonder if it would be a useful to have a separate
category for incomplete tests, but which are still useful to have for
manual running.
Possibly. Utility category seems to do just that but I can see that it
is really not meant for
functional tests. There are situations where writing tests before or
while implementing a
feature is useful  and in that case this new category can be
beneficial. It would also benefit
to have some kind of "expected failure" return type.
Hopefully I have covered all of these points sufficiently in the 0/2
thread.  In the meantime, I'll consider how best to introduce a "not
quite fully baked yet" category.

+TEST-ENVS := $(ALL_ENVIRONMENTS)
You build for all environments, but then have PV unilaterally skip.
Again, is this intentional?
Yes, I thought I would go back and add PV/AMD tests in V2 but I can
leave those TEST-ENVS
out for now and "skip the skip" :) if that's preferable.
For HVM, why all environments?  does PMU have any interaction with the
operating or paging mode in use at the time of the samples?
Not that I know of. So should it just be hvm64?
For now, I'd stick with just hvm64.  (A complication which I have yet to
sort out is the TEST-REVISION builds, so adding new logical content to
an existing test doesn't interfere with OSSTests bisection logic, but
that isn't an immediate problem.)
Fixed in v5.
+
+    printk("-----------------\n");
+    printk("Testing version 1\n");
+    printk("-----------------\n");
+
+    /* for all general purpose counters */
+    for ( i = 0; i < ng; i++ )
+    {
+        /* test writing to IA32_PMCx */
+        idx = MSR_IA32_PMC(i);
+
+        /* test we can write to all valid bits in the counters */
+        /* don't set bit 31 since that gets sign-extended */
What is wrong with bit 31 being sign extended?
The problem there is that setting PMCx to 0xffffffff sign extends 31st
bit to
the remaining bits. So PMCx becomes 0xffffffffffff (when bit width is
48). rdmsr then
reads all 48 bits and the value mismatches the one that was written.
That behaviour sounds mad, but I see it now described in the manuals.
Oh well...

A smarter test would provide both - write value to write and read
value to check against.
But test_valid_msr_write() only takes wval and checks to see if we
read the same value
that we wrote.
Well - you did introduce that function.  If altering it would be a
smarter test, then perhaps that is the better option to take.
Fixed in v5
+        wval = ((1ull << 31) - 1) ;
+
+        if ( !test_valid_msr_write(idx, wval) )
+            return false;
+
+        /* set all valid bits in MSR_IA32_EVENTSELx */
+        idx = MSR_IA32_PERFEVTSEL(i);
+        wval = ((1ull << 32) - 1) ^
(IA32_PERFEVENTSELx_ENABLE_ANYTHREAD |
+                IA32_PERFEVENTSELx_ENABLE_PCB);
What is wrong with Pin Control in this register? Architecturally, it is
a software configurable bit.
Look at this patch on xen-devel

[PATCH] Fix hypervisor crash when writing to VPMU MSR

I found that out while writing this test.
Welcome to the very grey areas of processors which XTF has a habit of
finding.  (I really need to correlate all the suspected errata in the
doxygen comments).

Obviously, Xen shouldn't crash, but we are going to need feedback from
Intel before working out how to proceed.

+    /* test IA32_DEBUGCTL */
+    idx = MSR_IA32_DEBUGCTL;
+
+    /* Test IA32_DEBUGCTL facilities enabled by v2 */
+    wval = IA32_DEBUGCTL_Freeze_LBR_ON_PMI |
IA32_DEBUGCTL_Freeze_PerfMon_On_PMI;
+
+    /* FIXME: This should really be a valid write but it isnt
supported by the
+     * VPMU yet */
In which case the test should be correct here, and highlight that there
is a bug in Xen.
But then, should the main test return xtf_success, xtf_skip or
xtf_failure?
Failure.  Xen is provably malfunctioning.
Fixed in v5. The tests now fail.

+    /* Get perf capabilties */
+    if ( rdmsr_safe(MSR_IA32_PERF_CAPABILITIES, &caps) )
+    {
+        printk("Fault while reading MSR_IA32_PERF_CAPABILITIES\n");
The PERF_CAPABILITIES MSR is only valid to read if PDCM is advertised in
CPUID.

We currently never advertise PCDM, and is a bug that
MSR_PERF_CAPABILITIES is accessable at all in guest context.
(specifically, it is because Xen leaks almost all host MSR state into
guests).
So I suppose I can't test full-bit width writes then?
Well, the code should be architecturally correct to start with. After
that, it is reasonable to explain that Xen used to leak
PERF_CAPABILITIES into guests view, at which point probing it is a
legitimate thing to do.

However, you should take care that your logic doesn't begin to fail at
the point that Xen is fixed to behave architecturally (and actually
return #GP[0] if PDCM isn't advertised to the guest), because I will
definitely be fixing the MSR leakage problems (if noone beats me to it).
Fixed in v5 ( I think!)
+        return false;
+    }
+
+    if ( !(caps >> 13) & 1 )
This lacks sufficient brackets for what you are intending to do. Also,
it looks like you probably want a #define at the top of the file for
this constant.
! and & have the same precedence and right to left associativity. So
technically it should
do what it is intended to do. But I will make it look better.
Logical NOT and Address-of have the same precedence.  Logical NOT has a
higher precedence than Bitwise AND.

And to resolve any doubt, here is what Clang things:

andrewcoop@andrewcoop:/local/xen-test-framework.git$ make CC=clang-4.0
-j4 -s
main.c:396:10: error: logical not is only applied to the left hand side
of this bitwise operator [-Werror,-Wlogical-not-parentheses]
     if ( !(caps >> 13) & 1 )
          ^             ~
Hard to argue that. Fixed in v5.
+
+        case 3:
+                /* test version 3 facilities */
+                if ( !test_intel_pmu_ver3( ngregs, nfregs) )
+                    return xtf_failure("Fail: Failed VPMU version
3\n");
+                /* fall through */
+
+        case 2:
+                /* test version 2 facilities */
+                if ( !test_intel_pmu_ver2(ngregs, nfregs) )
+                    return xtf_failure("Fail: Failed VPMU version
2\n");
+
+                /* test version 1 facilities */
+                if ( !test_intel_pmu_ver1(ngregs) )
+                    return xtf_failure("Fail: Failed VPMU version
1\n");
+
+                /* test full width counters */
+                if ( !test_full_width_cnts(ngregs, ngbits) )
+                    return xtf_failure("Fail: Failed full width
test\n");
+
+                break;
+
+        case 1:
+                /* version 1 unsupported */
You have a v1 test function, yet it is unsupported?
Yes, from VPMUs perspective, v1 in itself is unsupported because that
would
have required disabling access to PERF_GLOBAL_STATUS and
PERF_GLOBAL_CTRL MSRs.
However, each version does support facilities introduced in the
previous version.
Thus, test_intel_pmu_ver1() is intended for v2 testing and helps
breaking the
test_intel_pmu_ver2()  function into facilities introduced by the
architecture in
two separate versions - v1 and v2
You are logically confusing Xen's reasoning for supporting different
versions of hardware PMU, with the version it decides to advertise to
guests.

It is certainly reasonable for the vPMU driver in Xen to decide that,
without PMU v2, it is unacceptably complicated to deal with context
switching the counter state.  However, this has no bearing on which
version of PMU is emulated to the guest, and it would certainly should
be possible to configure Xen to only advertise v1 to guests, while the
real hardware actually supports v2.  (After all, we already downgrade v4
to v3 in a guests view).
Yes, I think I understand what you are saying. I still do stand by the decision of having a
separate function to test v1 facilities for -
A) modularity
B) Localizing the knowledge of xen internals to test_main() function. The rest of the code
is written to test the facilities  as described in the SDM.
On another note about the MSR leakage, the v$N test case should ideally
check that none of the v$N+1 capabilities/features are leaked into the
guest, but this smells like a similar quantity of "redoing it properly
the 2nd time around" as my attempts to fix CPUID handling in Xen.
Maybe something that we can add in a future patch?
~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.