[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/2][XTF] xtf/vpmu: MSR read/write tests for VPMU



On 25/04/17 22:45, Mohit Gambhir wrote:
>>> diff --git a/tests/vpmu/Makefile b/tests/vpmu/Makefile
>>> new file mode 100644
>>> index 0000000..1eaf436
>>> --- /dev/null
>>> +++ b/tests/vpmu/Makefile
>>> @@ -0,0 +1,9 @@
>>> +include $(ROOT)/build/common.mk
>>> +
>>> +NAME      := vpmu
>>> +CATEGORY  := utility
>> utilities don't get run automatically.  Is this intentional?  If it
>> isn't, what is the plan for making it automatically run?  vpmu is still
>> disabled by default in all branches due to the security vulnerabilities,
>> so even if this vpmu test was automatically run, it would skip due to
>> vpmu not being visible.
> The reason I wanted it to not run automatically was because I thought
> it will fail when vpmu
> is not enabled - which is the default case. But if XTF will skip vpmu
> tests when it is disabled
> then we can run the tests automatically. Should the CATEGORY be
> functional in that case?
>> As a tangent, I wonder if it would be a useful to have a separate
>> category for incomplete tests, but which are still useful to have for
>> manual running.
> Possibly. Utility category seems to do just that but I can see that it
> is really not meant for
> functional tests. There are situations where writing tests before or
> while implementing a
> feature is useful  and in that case this new category can be
> beneficial. It would also benefit
> to have some kind of "expected failure" return type.

Hopefully I have covered all of these points sufficiently in the 0/2
thread.  In the meantime, I'll consider how best to introduce a "not
quite fully baked yet" category.

>>> +TEST-ENVS := $(ALL_ENVIRONMENTS)
>> You build for all environments, but then have PV unilaterally skip.
>> Again, is this intentional?
> Yes, I thought I would go back and add PV/AMD tests in V2 but I can
> leave those TEST-ENVS
> out for now and "skip the skip" :) if that's preferable.
>> For HVM, why all environments?  does PMU have any interaction with the
>> operating or paging mode in use at the time of the samples?
> Not that I know of. So should it just be hvm64?

For now, I'd stick with just hvm64.  (A complication which I have yet to
sort out is the TEST-REVISION builds, so adding new logical content to
an existing test doesn't interfere with OSSTests bisection logic, but
that isn't an immediate problem.)

>
>>> +
>>> +    printk("-----------------\n");
>>> +    printk("Testing version 1\n");
>>> +    printk("-----------------\n");
>>> +
>>> +    /* for all general purpose counters */
>>> +    for ( i = 0; i < ng; i++ )
>>> +    {
>>> +        /* test writing to IA32_PMCx */
>>> +        idx = MSR_IA32_PMC(i);
>>> +
>>> +        /* test we can write to all valid bits in the counters */
>>> +        /* don't set bit 31 since that gets sign-extended */
>> What is wrong with bit 31 being sign extended?
> The problem there is that setting PMCx to 0xffffffff sign extends 31st
> bit to
> the remaining bits. So PMCx becomes 0xffffffffffff (when bit width is
> 48). rdmsr then
> reads all 48 bits and the value mismatches the one that was written.

That behaviour sounds mad, but I see it now described in the manuals. 
Oh well...

>
> A smarter test would provide both - write value to write and read
> value to check against.
> But test_valid_msr_write() only takes wval and checks to see if we
> read the same value
> that we wrote.

Well - you did introduce that function.  If altering it would be a
smarter test, then perhaps that is the better option to take.

>
>>
>>> +        wval = ((1ull << 31) - 1) ;
>>> +
>>> +        if ( !test_valid_msr_write(idx, wval) )
>>> +            return false;
>>> +
>>> +        /* set all valid bits in MSR_IA32_EVENTSELx */
>>> +        idx = MSR_IA32_PERFEVTSEL(i);
>>> +        wval = ((1ull << 32) - 1) ^
>>> (IA32_PERFEVENTSELx_ENABLE_ANYTHREAD |
>>> +                IA32_PERFEVENTSELx_ENABLE_PCB);
>> What is wrong with Pin Control in this register?  Architecturally, it is
>> a software configurable bit.
> Look at this patch on xen-devel
>
> [PATCH] Fix hypervisor crash when writing to VPMU MSR
>
> I found that out while writing this test.

Welcome to the very grey areas of processors which XTF has a habit of
finding.  (I really need to correlate all the suspected errata in the
doxygen comments).

Obviously, Xen shouldn't crash, but we are going to need feedback from
Intel before working out how to proceed.

>
>>> +    /* test IA32_DEBUGCTL */
>>> +    idx = MSR_IA32_DEBUGCTL;
>>> +
>>> +    /* Test IA32_DEBUGCTL facilities enabled by v2 */
>>> +    wval = IA32_DEBUGCTL_Freeze_LBR_ON_PMI |
>>> IA32_DEBUGCTL_Freeze_PerfMon_On_PMI;
>>> +
>>> +    /* FIXME: This should really be a valid write but it isnt
>>> supported by the
>>> +     * VPMU yet */
>> In which case the test should be correct here, and highlight that there
>> is a bug in Xen.
> But then, should the main test return xtf_success, xtf_skip or
> xtf_failure?

Failure.  Xen is provably malfunctioning.

>>>
>>> +    /* Get perf capabilties */
>>> +    if ( rdmsr_safe(MSR_IA32_PERF_CAPABILITIES, &caps) )
>>> +    {
>>> +        printk("Fault while reading MSR_IA32_PERF_CAPABILITIES\n");
>> The PERF_CAPABILITIES MSR is only valid to read if PDCM is advertised in
>> CPUID.
>>
>> We currently never advertise PCDM, and is a bug that
>> MSR_PERF_CAPABILITIES is accessable at all in guest context.
>> (specifically, it is because Xen leaks almost all host MSR state into
>> guests).
> So I suppose I can't test full-bit width writes then?

Well, the code should be architecturally correct to start with.  After
that, it is reasonable to explain that Xen used to leak
PERF_CAPABILITIES into guests view, at which point probing it is a
legitimate thing to do.

However, you should take care that your logic doesn't begin to fail at
the point that Xen is fixed to behave architecturally (and actually
return #GP[0] if PDCM isn't advertised to the guest), because I will
definitely be fixing the MSR leakage problems (if noone beats me to it).

>>> +        return false;
>>> +    }
>>> +
>>> +    if ( !(caps >> 13) & 1 )
>> This lacks sufficient brackets for what you are intending to do.  Also,
>> it looks like you probably want a #define at the top of the file for
>> this constant.
> ! and & have the same precedence and right to left associativity. So
> technically it should
> do what it is intended to do. But I will make it look better.

Logical NOT and Address-of have the same precedence.  Logical NOT has a
higher precedence than Bitwise AND.

And to resolve any doubt, here is what Clang things:

andrewcoop@andrewcoop:/local/xen-test-framework.git$ make CC=clang-4.0
-j4 -s
main.c:396:10: error: logical not is only applied to the left hand side
of this bitwise operator [-Werror,-Wlogical-not-parentheses]
    if ( !(caps >> 13) & 1 )
         ^             ~

>
>>> +
>>> +        case 3:
>>> +                /* test version 3 facilities */
>>> +                if ( !test_intel_pmu_ver3( ngregs, nfregs) )
>>> +                    return xtf_failure("Fail: Failed VPMU version
>>> 3\n");
>>> +                /* fall through */
>>> +
>>> +        case 2:
>>> +                /* test version 2 facilities */
>>> +                if ( !test_intel_pmu_ver2(ngregs, nfregs) )
>>> +                    return xtf_failure("Fail: Failed VPMU version
>>> 2\n");
>>> +
>>> +                /* test version 1 facilities */
>>> +                if ( !test_intel_pmu_ver1(ngregs) )
>>> +                    return xtf_failure("Fail: Failed VPMU version
>>> 1\n");
>>> +
>>> +                /* test full width counters */
>>> +                if ( !test_full_width_cnts(ngregs, ngbits) )
>>> +                    return xtf_failure("Fail: Failed full width
>>> test\n");
>>> +
>>> +                break;
>>> +
>>> +        case 1:
>>> +                /* version 1 unsupported */
>> You have a v1 test function, yet it is unsupported?
> Yes, from VPMUs perspective, v1 in itself is unsupported because that
> would
> have required disabling access to PERF_GLOBAL_STATUS and
> PERF_GLOBAL_CTRL MSRs.
> However, each version does support facilities introduced in the
> previous version.
> Thus, test_intel_pmu_ver1() is intended for v2 testing and helps
> breaking the
> test_intel_pmu_ver2()  function into facilities introduced by the
> architecture in
> two separate versions - v1 and v2

You are logically confusing Xen's reasoning for supporting different
versions of hardware PMU, with the version it decides to advertise to
guests.

It is certainly reasonable for the vPMU driver in Xen to decide that,
without PMU v2, it is unacceptably complicated to deal with context
switching the counter state.  However, this has no bearing on which
version of PMU is emulated to the guest, and it would certainly should
be possible to configure Xen to only advertise v1 to guests, while the
real hardware actually supports v2.  (After all, we already downgrade v4
to v3 in a guests view).

On another note about the MSR leakage, the v$N test case should ideally
check that none of the v$N+1 capabilities/features are leaked into the
guest, but this smells like a similar quantity of "redoing it properly
the 2nd time around" as my attempts to fix CPUID handling in Xen.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.