Xen project Mailing List

Re: [Xen-devel] x86/AMD: Nested VM failed to boot L2 guest due to setting/clearing CR0.CD bit

To: "Suravee Suthikulanit" <suravee.suthikulpanit@xxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Tue, 06 Aug 2013 08:12:39 +0100

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Christoph Egger <chegger@xxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>

Delivery-date: Tue, 06 Aug 2013 07:12:59 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 06.08.13 at 04:27, Suravee Suthikulanit <suravee.suthikulpanit@xxxxxxx> wrote: > Hi All, > > While I was testing nested VM on with latest Xen on AMD system, I am running > into issue where > the L2 guest (Linux) seems to stuck right after loading the kernel. When > using the "xl debug-keys d" to dump registers, > the L2 guest RIP always at the instruction which tries to write the CR0.CD > bit. Besides, once starting L2 guest and it > got stuck, L0 Dom0 becomes very slow until I kill the L2 guest. > > After looking into the hvm code for handling CR0 (i.e. > xen/arch/x86/hvm/hvm.c: hvm_set_cr0()), > I see that the code tries to issue local cache flush on all the cores when > the L2 guest is > setting the CR0.CD bit. (Please see the code snippet below.) > > if ( (value & X86_CR0_CD) && !(value & X86_CR0_NW) ) > { > /* Entering no fill cache mode. */ > spin_lock(&v->domain->arch.hvm_domain.uc_lock); > v->arch.hvm_vcpu.cache_mode = NO_FILL_CACHE_MODE; > > if ( !v->domain->arch.hvm_domain.is_in_uc_mode ) > { > /* Flush physical caches. */ > ---> HERE on_each_cpu(local_flush_cache, NULL, 1); > hvm_set_uc_mode(v, 1); > } > spin_unlock(&v->domain->arch.hvm_domain.uc_lock); > } > > When I try to comment out the line, the issue goes away. Is this line > necessary? > Why do we need to flush all the cpu cores when the CR0.CD bit only applies > to a particular core? Doing the flush only on the local CPU would imply that once the affected vCPU migrates to another pCPU, flushing would _then_ need to be done there too. Tracking this would clearly add complexity here. Furthermore, the "UC mode" is being entered on the domain as a whole, i.e. all the pCPU-s that the domain is actively running one would need immediate flushing, and all pCPU-s any of the vCPU-s would migrate to subsequently would need deferred flushing. That said, I still can't see how the flushing here would have this dramatic an effect: It's a one-time thing, when UC mode first gets entered by a domain. So unless CR0.CD gets flipped back and forth by a guest, there shouldn't be more than one flush (or there's a logic error somewhere else). Finally, the need for that code as a whole is under question in the context of XSA-60. I would certainly favor (at least on the SVM side) to handle CR0.CD per vCPU instead of per domain, as long as there are no requirements that CR0.CD be set consistently across multiple CPUs (e.g. within a package; on Intel CPUs I'm being told it's a hard requirement to be consistent at least between sibling hyperthreads, meaning that we can't rip out the current logic altogether in favor of a CR0.CD based solution). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.