[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen 4.15.5: msr_relaxed required for MSR 0x1a2



On Mon, Nov 20, 2023 at 08:44:41AM +0000, James Dingwall wrote:
> On Fri, Nov 17, 2023 at 11:17:46AM +0100, Roger Pau Monné wrote:
> > On Fri, Nov 17, 2023 at 09:18:39AM +0000, James Dingwall wrote:
> > > On Thu, Nov 16, 2023 at 04:32:47PM +0000, Andrew Cooper wrote:
> > > > On 16/11/2023 4:15 pm, James Dingwall wrote:
> > > > > Hi,
> > > > >
> > > > > Per the msr_relaxed documentation:
> > > > >
> > > > >    "If using this option is necessary to fix an issue, please report 
> > > > > a bug."
> > > > >
> > > > > After recently upgrading an environment from Xen 4.14.5 to Xen 4.15.5 
> > > > > we
> > > > > started experiencing a BSOD at boot with one of our Windows guests.  
> > > > > We found
> > > > > that enabling `msr_relaxed = 1` in the guest configuration has 
> > > > > resolved the
> > > > > problem.  With a debug build of Xen and `hvm_debug=2048` on the 
> > > > > command line
> > > > > the following messages were caught as the BSOD happened:
> > > > >
> > > > > (XEN) [HVM:11.0] <vmx_msr_read_intercept> ecx=0x1a2
> > > > > (XEN) vmx.c:3298:d11v0 RDMSR 0x000001a2 unimplemented
> > > > > (XEN) d11v0 VIRIDIAN CRASH: 1e ffffffffc0000096 fffff80b8de81eb5 0 0
> > > > >
> > > > > I found that MSR 0x1a2 is MSR_TEMPERATURE_TARGET and from that this 
> > > > > patch
> > > > > series from last month:
> > > > >
> > > > > https://patchwork.kernel.org/project/xen-devel/list/?series=796550
> > > > >
> > > > > Picking out just a small part of that fixes the problem for us. 
> > > > > Although the
> > > > > the patch is against 4.15.5 I think it would be relevant to more 
> > > > > recent
> > > > > releases too.
> > > > 
> > > > Which version of Windows, and what hardware?
> > > > 
> > > > The Viridian Crash isn't about the RDMSR itself - it's presumably
> > > > collateral damage shortly thereafter.
> > > > 
> > > > Does filling in 0 for that MSR also resolve the issue?  It's model
> > > > specific and we absolutely cannot pass it through from real hardware
> > > > like that.
> > > > 
> > > 
> > > Hi Andrew,
> > > 
> > > Thanks for your response.  The guest is running Windows 10 and the crash
> > > happens in a proprietary hardware driver.
> > 
> > When you say proprietary you mean a custom driver made for your
> > use-case, or is this some vendor driver widely available?
> > 
> 
> Hi Roger,
> 
> We have emulated some point of sale hardware with a custom qemu device.  It
> is reasonably common but limited to its particular sector.  As the physical
> hardware is all built to the same specification I assume the driver has made
> assumptions about the availability of MSR_TEMPERATURE_TARGET and doesn't
> handle the case it is absent which leads to the BSOD in the Windows guest.

Hello James,

We have in the past exposed MSRs in order to workaround OSes
assumptions about such MSRs being unconditionally present, so it's not
completely unacceptable that we might end up exposing this if strictly
required.

My question would be, is it possible for such driver to get fixed in
order to avoid unconditionally poking at MSR_TEMPERATURE_TARGET, or
that's impossible?

>From the Intel manual it seems like MSR_TEMPERATURE_TARGET is
unconditionally present on certain models, and hence we might have no
other option but to end up adding a dummy handler for reads.  I do
wonder whether returning all 0 is correct, as then the "thermal
throttling" would be enable when the CPU temp > 0C, which is
unrealistic.  I assume that wouldn't matter much as long as drivers
don't choke on such weird value.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.