Xen project Mailing List

Re: [Xen-devel] [PATCH v3 2/2][XTF] xtf/vpmu: MSR read/write tests for VPMU

To: Mohit Gambhir <mohit.gambhir@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 3 May 2017 21:41:25 +0100

Delivery-date: Wed, 03 May 2017 20:41:47 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 25/04/17 22:45, Mohit Gambhir wrote: >>> diff --git a/tests/vpmu/Makefile b/tests/vpmu/Makefile >>> new file mode 100644 >>> index 0000000..1eaf436 >>> --- /dev/null >>> +++ b/tests/vpmu/Makefile >>> @@ -0,0 +1,9 @@ >>> +include $(ROOT)/build/common.mk >>> + >>> +NAME := vpmu >>> +CATEGORY := utility >> utilities don't get run automatically. Is this intentional? If it >> isn't, what is the plan for making it automatically run? vpmu is still >> disabled by default in all branches due to the security vulnerabilities, >> so even if this vpmu test was automatically run, it would skip due to >> vpmu not being visible. > The reason I wanted it to not run automatically was because I thought > it will fail when vpmu > is not enabled - which is the default case. But if XTF will skip vpmu > tests when it is disabled > then we can run the tests automatically. Should the CATEGORY be > functional in that case? >> As a tangent, I wonder if it would be a useful to have a separate >> category for incomplete tests, but which are still useful to have for >> manual running. > Possibly. Utility category seems to do just that but I can see that it > is really not meant for > functional tests. There are situations where writing tests before or > while implementing a > feature is useful and in that case this new category can be > beneficial. It would also benefit > to have some kind of "expected failure" return type. Hopefully I have covered all of these points sufficiently in the 0/2 thread. In the meantime, I'll consider how best to introduce a "not quite fully baked yet" category. >>> +TEST-ENVS := $(ALL_ENVIRONMENTS) >> You build for all environments, but then have PV unilaterally skip. >> Again, is this intentional? > Yes, I thought I would go back and add PV/AMD tests in V2 but I can > leave those TEST-ENVS > out for now and "skip the skip" :) if that's preferable. >> For HVM, why all environments? does PMU have any interaction with the >> operating or paging mode in use at the time of the samples? > Not that I know of. So should it just be hvm64? For now, I'd stick with just hvm64. (A complication which I have yet to sort out is the TEST-REVISION builds, so adding new logical content to an existing test doesn't interfere with OSSTests bisection logic, but that isn't an immediate problem.) > >>> + >>> + printk("-----------------\n"); >>> + printk("Testing version 1\n"); >>> + printk("-----------------\n"); >>> + >>> + /* for all general purpose counters */ >>> + for ( i = 0; i < ng; i++ ) >>> + { >>> + /* test writing to IA32_PMCx */ >>> + idx = MSR_IA32_PMC(i); >>> + >>> + /* test we can write to all valid bits in the counters */ >>> + /* don't set bit 31 since that gets sign-extended */ >> What is wrong with bit 31 being sign extended? > The problem there is that setting PMCx to 0xffffffff sign extends 31st > bit to > the remaining bits. So PMCx becomes 0xffffffffffff (when bit width is > 48). rdmsr then > reads all 48 bits and the value mismatches the one that was written. That behaviour sounds mad, but I see it now described in the manuals. Oh well... > > A smarter test would provide both - write value to write and read > value to check against. > But test_valid_msr_write() only takes wval and checks to see if we > read the same value > that we wrote. Well - you did introduce that function. If altering it would be a smarter test, then perhaps that is the better option to take. > >> >>> + wval = ((1ull << 31) - 1) ; >>> + >>> + if ( !test_valid_msr_write(idx, wval) ) >>> + return false; >>> + >>> + /* set all valid bits in MSR_IA32_EVENTSELx */ >>> + idx = MSR_IA32_PERFEVTSEL(i); >>> + wval = ((1ull << 32) - 1) ^ >>> (IA32_PERFEVENTSELx_ENABLE_ANYTHREAD | >>> + IA32_PERFEVENTSELx_ENABLE_PCB); >> What is wrong with Pin Control in this register? Architecturally, it is >> a software configurable bit. > Look at this patch on xen-devel > > [PATCH] Fix hypervisor crash when writing to VPMU MSR > > I found that out while writing this test. Welcome to the very grey areas of processors which XTF has a habit of finding. (I really need to correlate all the suspected errata in the doxygen comments). Obviously, Xen shouldn't crash, but we are going to need feedback from Intel before working out how to proceed. > >>> + /* test IA32_DEBUGCTL */ >>> + idx = MSR_IA32_DEBUGCTL; >>> + >>> + /* Test IA32_DEBUGCTL facilities enabled by v2 */ >>> + wval = IA32_DEBUGCTL_Freeze_LBR_ON_PMI | >>> IA32_DEBUGCTL_Freeze_PerfMon_On_PMI; >>> + >>> + /* FIXME: This should really be a valid write but it isnt >>> supported by the >>> + * VPMU yet */ >> In which case the test should be correct here, and highlight that there >> is a bug in Xen. > But then, should the main test return xtf_success, xtf_skip or > xtf_failure? Failure. Xen is provably malfunctioning. >>> >>> + /* Get perf capabilties */ >>> + if ( rdmsr_safe(MSR_IA32_PERF_CAPABILITIES, &caps) ) >>> + { >>> + printk("Fault while reading MSR_IA32_PERF_CAPABILITIES\n"); >> The PERF_CAPABILITIES MSR is only valid to read if PDCM is advertised in >> CPUID. >> >> We currently never advertise PCDM, and is a bug that >> MSR_PERF_CAPABILITIES is accessable at all in guest context. >> (specifically, it is because Xen leaks almost all host MSR state into >> guests). > So I suppose I can't test full-bit width writes then? Well, the code should be architecturally correct to start with. After that, it is reasonable to explain that Xen used to leak PERF_CAPABILITIES into guests view, at which point probing it is a legitimate thing to do. However, you should take care that your logic doesn't begin to fail at the point that Xen is fixed to behave architecturally (and actually return #GP[0] if PDCM isn't advertised to the guest), because I will definitely be fixing the MSR leakage problems (if noone beats me to it). >>> + return false; >>> + } >>> + >>> + if ( !(caps >> 13) & 1 ) >> This lacks sufficient brackets for what you are intending to do. Also, >> it looks like you probably want a #define at the top of the file for >> this constant. > ! and & have the same precedence and right to left associativity. So > technically it should > do what it is intended to do. But I will make it look better. Logical NOT and Address-of have the same precedence. Logical NOT has a higher precedence than Bitwise AND. And to resolve any doubt, here is what Clang things: andrewcoop@andrewcoop:/local/xen-test-framework.git$ make CC=clang-4.0 -j4 -s main.c:396:10: error: logical not is only applied to the left hand side of this bitwise operator [-Werror,-Wlogical-not-parentheses] if ( !(caps >> 13) & 1 ) ^ ~ > >>> + >>> + case 3: >>> + /* test version 3 facilities */ >>> + if ( !test_intel_pmu_ver3( ngregs, nfregs) ) >>> + return xtf_failure("Fail: Failed VPMU version >>> 3\n"); >>> + /* fall through */ >>> + >>> + case 2: >>> + /* test version 2 facilities */ >>> + if ( !test_intel_pmu_ver2(ngregs, nfregs) ) >>> + return xtf_failure("Fail: Failed VPMU version >>> 2\n"); >>> + >>> + /* test version 1 facilities */ >>> + if ( !test_intel_pmu_ver1(ngregs) ) >>> + return xtf_failure("Fail: Failed VPMU version >>> 1\n"); >>> + >>> + /* test full width counters */ >>> + if ( !test_full_width_cnts(ngregs, ngbits) ) >>> + return xtf_failure("Fail: Failed full width >>> test\n"); >>> + >>> + break; >>> + >>> + case 1: >>> + /* version 1 unsupported */ >> You have a v1 test function, yet it is unsupported? > Yes, from VPMUs perspective, v1 in itself is unsupported because that > would > have required disabling access to PERF_GLOBAL_STATUS and > PERF_GLOBAL_CTRL MSRs. > However, each version does support facilities introduced in the > previous version. > Thus, test_intel_pmu_ver1() is intended for v2 testing and helps > breaking the > test_intel_pmu_ver2() function into facilities introduced by the > architecture in > two separate versions - v1 and v2 You are logically confusing Xen's reasoning for supporting different versions of hardware PMU, with the version it decides to advertise to guests. It is certainly reasonable for the vPMU driver in Xen to decide that, without PMU v2, it is unacceptably complicated to deal with context switching the counter state. However, this has no bearing on which version of PMU is emulated to the guest, and it would certainly should be possible to configure Xen to only advertise v1 to guests, while the real hardware actually supports v2. (After all, we already downgrade v4 to v3 in a guests view). On another note about the MSR leakage, the v$N test case should ideally check that none of the v$N+1 capabilities/features are leaked into the guest, but this smells like a similar quantity of "redoing it properly the 2nd time around" as my attempts to fix CPUID handling in Xen. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.