[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] x86/intel: Protect set_cpuidmask() against #GP faults

> From: Andrew Cooper [mailto:amc96@xxxxxxxxxxxxxxxx] On Behalf Of
> Andrew Cooper
> On 08/08/2014 21:57, Tian, Kevin wrote:
> >> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> >> Sent: Wednesday, June 18, 2014 7:59 AM
> >>
> >> Virtual environments such as Xen HVM containers and VirtualBox do not
> >> necessarily provide support for feature masking MSRs.
> > could you elaborate how above environment leads to #GP fault?
> The exact crash was a user trying XenServer.$NEXT in VirtualBox, on a
> SandyBridge system.  For very very bad reasons, XenServer has an
> unconditional cpuid_mask_xsave_eax=0 on its boot command line.  (It was
> put in as a crutch at the last moment in an old release several years
> ago, as the toolstack only knew features as a set of bits it XORs
> together to check compatibility.  Naturally, the work to do a proper fix
> got deferred in preference for newer customer-visible features.  Once I
> have gotten migrationv2 finished, I will be fixing it all from the
> ground up.)
> The code below decides by model number alone that MSR 0x134 is usable,
> and uses it.  Both Xen HVM VMs and VirtualBox hand back an unconditional
> #GP fault for attempting to use the masking registers.  The primary
> point of this patch is to use _safe() variants, as model number alone is
> not sufficient to guarentee the presence of the MSRs.

It makes sense then.

> >
> >> As their presence is detected by model numbers alone, and their use
> >> predicated
> >> on command line parameters, use the safe() variants of {wr,rd}msr() to
> avoid
> >> dying with an early #GP fault.
> >>
> >> While playing in this function, make some further improvements.
> >>
> >> * Rename the masking MSR names for consistency, and name the CPU
> models
> >> for
> >>   the benefit of humans reading the code.
> >> * Correct the CPU selection as specified in the flexmigration document.
> All
> >>   steppings of 0x17 and 0x1a are stated to have masking MSRs.
> > Thanks for catching this.
> >
> >> * Provide log messages indicating the masks applied, or lack of masking
> >>   capabilities.
> >> * Call set_cpuidmask() unconditionally so faulting-capable hardware still
> gets
> >>   a log message indicating to the user why their command line arguments
> are
> >>   not taking effect.
> > I don't think it a good idea to call set_cpuidmask() unconditionally. Though
> you may
> > argue mask/faulting are exclusively implemented, but from code p.o.v I'd
> like to
> > see faulting over mask here, otherwise it just causes confusion when reading
> the
> > code. Indication to user is easy... you can just add an option check when
> faulting
> > is present.
> Is it safe to assume that "faulting available" necessitates "masking
> unavailable"?  Neither are architecturally committed features.
> (Frankly, fautling is the correct solution and it would be nice to see
> it architecturally committed.)
> ~Andrew

no such guaranteed assumption, so others have to know this tricky fact
(I think most people won't have) to figure out why faulting and masking
are both attempted in the code.

This is the only concern I have with the patch. If you can keep original
intention, I'll ack next version.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.