[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 12/30] xen/x86: Generate deep dependencies of features



>>> On 15.02.16 at 16:28, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 15/02/16 14:06, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>>>  
>>>  static void sanitise_featureset(uint32_t *fs)
>>>  {
>>> +    uint32_t disabled_features[FSCAPINTS];
>>>      unsigned int i;
>>>  
>>>      for ( i = 0; i < FSCAPINTS; ++i )
>>>      {
>>>          /* Clamp to known mask. */
>>>          fs[i] &= known_features[i];
>>> +
>>> +        /*
>>> +         * Identify which features with deep dependencies have been
>>> +         * disabled.
>>> +         */
>>> +        disabled_features[i] = ~fs[i] & deep_features[i];
>>> +    }
>>> +
>>> +    for_each_set_bit(i, (void *)disabled_features,
>>> +                     sizeof(disabled_features) * 8)
>>> +    {
>>> +        const uint32_t *dfs = lookup_deep_deps(i);
>>> +        unsigned int j;
>>> +
>>> +        ASSERT(dfs); /* deep_features[] should guarentee this. */
>>> +
>>> +        for ( j = 0; j < FSCAPINTS; ++j )
>>> +        {
>>> +            fs[j] &= ~dfs[j];
>>> +            disabled_features[j] &= ~dfs[j];
>>> +        }
>>>      }
>> Am I getting the logic in the Python script right that it is indeed
>> unnecessary for this loop to be restarted even when a feature
>> at a higher numbered bit position results in a lower numbered
>> bit getting cleared?
> 
> Correct - the python logic flattens the dependency tree so an individual
> lookup_deep_deps() will get you all features eventually influenced by
> feature i.  It might be the deep features for i includes a feature
> numerically lower than i, but because dfs[] contains all features on the
> eventual chain, we don't need to start back from 0 again.
> 
> I felt this was far more productive to code at build time, rather than
> at runtime.

Oh, definitely.

>>> --- a/xen/tools/gen-cpuid.py
>>> +++ b/xen/tools/gen-cpuid.py
>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
> nr_entries)
>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>  
>>> +    deps = {
>>> +        XSAVE:
>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>> +
>>> +        AVX:
>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>> I continue to question whether namely XOP, but perhaps also the
>> others here except maybe AVX2, really is depending on AVX, and
>> not just on XSAVE.
> 
> I am sure we have had this argument before.

Indeed, hence the "I continue to ...".

> All VEX encoded SIMD instructions (including XOP which is listed in the
> same category by AMD) are specified to act on 256bit AVX state, and
> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
> instructions encoding %xmm registers, which explicitly zero the upper
> 128bits of the associated %ymm register.
> 
> This is very clearly a dependency on AVX, even if it isn't written in
> one clear concise statement in the instruction manuals.

The question is what AVX actually means: To me it's an instruction set
extension, not one of machine state. The machine state extension to
me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
intentionally say "to me", because I can certainly see why this may be
viewed differently.) Note how you yourself have recourse to XCR0,
which is very clearly tied to XSAVE and not AVX, above (and note also
that there's nothing called AVX to be enabled in XCR0, it's YMM that
you talk about).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.