Xen project Mailing List

Re: [Xen-devel] PVH cpuid feature flags

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Date: Fri, 31 Jan 2014 10:49:36 -0500

Delivery-date: Fri, 31 Jan 2014 15:49:47 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, Jan 27, 2014 at 01:06:55PM +0100, Roger Pau Monné wrote: > On 24/01/14 19:31, Konrad Rzeszutek Wilk wrote: > > On Tue, Jan 21, 2014 at 08:28:14PM +0100, Roger Pau Monné wrote: > >> Hello, > >> > >> While doing some benchmarks on PV/PVH/PVHVM, I've realized that the > >> cpuid feature flags exposed to PVH guests are kind of strange, this is > >> the output of the feature flags as seen by an HVM domain: > >> > > > > What about a PV guest? I presume if you ran an NetBSD PV guest it would > > give a format similar to this? > > I guess so, the feature flags reported by NetBSD PV will probably be the > same of the ones reported by FreeBSD PVH (unless NetBSD PV also does > some kind of pre-filtering). > > > > >> Features=0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT> > >> > >> Features2=0x81b82201<SSE3,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,HV> > >> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> > >> AMD Features2=0x1<LAHF> > >> > >> And this is what a PVH domain sees when running on the same hardware: > >> > >> Features=0x1fc98b75<FPU,DE,TSC,MSR,PAE,CX8,APIC,SEP,CMOV,PAT,CLFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT> > >> Features2=0x80982201<SSE3,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,HV> > >> AMD Features=0x20100800<SYSCALL,NX,LM> > >> AMD Features2=0x1<LAHF> > >> > >> I would expect the feature flags to be quite similar between an HVM > >> domain and a PV domain (since they both run inside of a HVM container). > >> AFAIK, there's no reason to disable PSE, PGE, PSE36 and RDTSCP for PVH > >> guests. Also, is there any reason why PVH guests have the ACPI, SS and > >> CLFLUSH feature flags but not HVM? > > > > S5? > > SS - CPU cache supports self-snoop. > > Not sure if that should be enabled or not for PVH. Not even sure what does means for a guest. I thought all CPUs do snooping except when they go in C states? > > > > > ACPI is enabled for PV I think, but Linux PV guests disable them > > as there is no ACPI tables in PV mode: > > > > 429 if (!xen_initial_domain()) > > > > 430 cpuid_leaf1_edx_mask &= > > > > 431 ~((1 << X86_FEATURE_ACPI)); /* disable ACPI > > */ > > 432 > > > > 434 > > > > CLFLUSH - no idea why it would be disabled. > > > > > > The rdsctp should be enabled. In the past I think it was related to > > 'timer=' option. We would either trap it, or emulate it with a constant > > value, or passthrough. It should be passing it through but maybe there > > is a bug? > > > > PSE, PGE, PSE36, PG1GB, etc, should all be exposed. Actually the PG1TB > > is not exposed because of another bug: > > http://www.gossamer-threads.com/lists/xen/devel/313596 > > Think so, now that we run inside of an HVM container we should be able > to make use of all those. Agreed. > > > > >> > >> Most (if not all) of this probably comes from the fact that we are > >> reporting the same feature flags as pure PV guests, but I see no reason > >> to do that for PVH guests, we should decide what's supported on PVH and > >> set the feature flags accordingly. > > > > Right and also have a nice policy. The problem is that we set/unset > > the cpuid flags in the toolstack (and in two places - depending on the > > architecture) and also in the hypervisor. > > Yes, all this cpuid flag stuff seems like a mess to me, there are so > many places where we enable or blacklist certain cpu flags that makes me > wonder if it would be more sane to define a set of flags that an HVM > container supports and maybe then blacklist some of them if they are not > actually implemented/usable on the specific kind of guest. So, first go through HVM list and then follow with the PV? > > > Anyhow, these I know we disable: > > > > 425 cpuid_leaf1_edx_mask = > > > > 426 ~((1 << X86_FEATURE_MTRR) | /* disable MTRR */ > > > > 427 (1 << X86_FEATURE_ACC)); /* thermal monitoring */ > > > > > > And I think your patch to the Xen hypervisor does it too - it clears > > the MTRR by default now. The ACC is (if my memory is correct) for > > the Pentium 4 and such - we can disable it as well. > > > > 428 > > > > 433 cpuid_leaf1_ecx_mask &= ~(1 << (X86_FEATURE_X2APIC % 32)); > > > > > > And this we definitly need to disable. The x2APIC should not > > be exposed as we want to use the Xen's version of APIC ops. And > > if the x2APIC bit is enabled in Linux, there is some other code > > (NMI handler) that will use that without using the APIC ops. > > And blow up :-( > > > > > > Then there is the MWAIT. Actually we can put that on the side. > > I know it is important for dom0, but since we don't have those > > patches yet in, we can ignore that. However, the hypervisor > > (pv_cpuid) disables it. > > > > > > There is also the XSAVE business: > > > > save_mask = > > 440 (1 << (X86_FEATURE_XSAVE % 32)) | > > > > 441 (1 << (X86_FEATURE_OSXSAVE % 32)); > > > > 442 > > > > 443 /* Xen will set CR4.OSXSAVE if supported and not disabled by > > force */ > > 444 if ((cx & xsave_mask) != xsave_mask) > > > > 445 cpuid_leaf1_ecx_mask &= ~xsave_mask; /* disable XSAVE > > & OSXSAVE */ > > > > Which I am not clear about. > > > > > > This looks like to make a uniform 'cpuid' look in the hypervisor > > we need somehow to glue hvm_cpuid and pv_cpuid with some > > form of tables/lookups. > > > > And make sure that the same logic is reflected in > > xc_cpuid_x86.c (toolstack). > > Agree. On a slight different topic, why do we enable the APIC flags for > PV(H) guests? > > We certainly have no APIC, and makes me wonder if we should disable it > now and enable it once we have hardware APIC virtualization in place. > This would allow PVH to either use the traditional PV style, or a > hardware virtualized APIC if we enable it for PVH guests (and make the > guest aware of it by turning the flag). I thought it was off for PV guests (except dom0). That is my recollection when 'perf' starts and realizes it can't sample the APIC. But when it comes to dom0 it needs that otherwise it won't even parse the ACPI MADT tables - and you need to parse those as you need to get INT_SRV_OVR right. The reason you need that is because once the ACPI AML code kicks in, it has to figure out which GSI is the ACPI SCI and the INT_SRV_OVR might have an override (like pin 9 might be hard-wired to pin 20 instead of 9). In other words - need this for dom0 to let it do its work. > > Roger. > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.