[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Is: cpuid creation of PV guests is not correct.

On 22/07/2014 20:43, Konrad Rzeszutek Wilk wrote:
>> I.e., no matter how I pin the vcpus, the guest sees the 4 vcpus as if
>> they were all SMT siblings, within the same core, sharing all cache
>> levels.
> My recollection was that the setting of these CPUID values is
> tied in how the toolstack sees it - and since the toolstack
> runs in the initial domain - that is where it picks this data up.
> This problem had been discussed by Andrew Cooper at some point
> (Hackathon? Emails? IRC?) and moved under the 'fix cpuid creation/parsing'.
> I think that this issue should not affect Elena's patchset - 
> as the vNUMA is an innocent bystander that gets affected by this.
> As such changing the title.

There are a whole set of related issues with regard to cpuid under Xen
currently.  I investigated the problems from the point of view of
heterogeneous host feature levelling.  I do plan to work on these issues
(as feature levelling is an important usecase for XenServer) and will do
so when the migration v2 work is complete.

However, to summarise the issues:

Xen's notion of a domains cpuid policy was adequate for single-vcpu VMs,
but was never updated when multi-vcpu VMs were introduced.  There is no
concept of per-vcpu information in the policy, which is why all the
cache IDs you read are identical.

The policy is theoretically controlled exclusively from the toolstack. 
The toolstack has the responsibility of setting the contents of any
leaves it believes the guest might be interested in, and Xen stores
these values wholesale.  If a cpuid query is requested of a domain which
lacks an entry for that specific leaf, the information is retrieved by
running a cpuid instruction, which is not necessarily deterministic.

The toolstack, under the cpuid policy of the domain it is running in,
attempts to guess the featureset to be offered to a domain, with
possible influence from user-specified domain configuration.  Xen
doesn't validate the featureset when the policy is set.  Instead, there
is veto/sanity code used on all accesses to the policy.  As a result,
the cpuid values as seen by the guest are not necessarily the same as
the values successfully set by the toolstack.

The various IDs which are obtained from cpuid inside a domain will
happen to the the IDs available to libxc when it was building the policy
for the domain.  For a regular PV dom0, it will be the IDs available on
the pcpu (or several, given rescheduling) on which libxc was running.

Xen can completely control the values returned by the cpuid instruction
from HVM/PVM domains.  On the other hand, results for PV guests are
strictly opt-in via use of the Xen forced-emulation prefix.  As a
result, well behaved PV kernels will see the policy, but regular
userspace applications in PV guests will see the native cpuid results.

There are two caveats.  Intel Ivy-bridge (and later) hardware have
support for cpuid faulting which allows Xen to regain exactly the same
level of control over PV guests as it has for HVM guests.  There are
also cpuid masking (Intel)/override (AMD) MSRs (which vary in
availability between processor generations) which allow the visible
featureset of any cpuid instruction to be altered.

I have some vague plans for how to fix these issues, which I will need
to see about designing sensibly in due course.  However, a brief
overview is something like this:

* Ownership of the entire domain policy resides with Xen rather than the
toolstack, and when a domain is created, it shall inherit from the host
setup, given appropriate per-domain type restrictions.
* The toolstack may query and modify a domains policy, with verification
of the modifications before before they are accepted.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.