Xen project Mailing List

Re: [Xen-devel] Issue policing writes from Xen to PV domain memory

To: Jan Beulich <JBeulich@xxxxxxxx>

From: "Aravindh Puthiyaparambil (aravindp)" <aravindp@xxxxxxxxx>

Date: Mon, 5 May 2014 19:27:11 +0000

Accept-language: en-US

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "TimDeegan\(tim@xxxxxxx\)" <tim@xxxxxxx>

Delivery-date: Mon, 05 May 2014 19:27:23 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: Ac9kBh2tUJWI8Pt1R5ScY+6HxANwigAZteGAABVfxTAAUzWYAAAFEsLAAAkZ2jAAgw0LgAAQcqHA

Thread-topic: Issue policing writes from Xen to PV domain memory

>>>>>> > On adding some debugging, I discovered that it happens after >>>>>> > mem_access >>>>>>is >>>>>>> enabled but xen-access has not started handling events. After >>>>>>> comparing >>>>>>the >>>>>>> stack trace and gla in question There are multiple write faults >>>>>>> to the runstate_guest(v), each causing an event to be sent to >>>>>>> xen-access. Since >>>>>>the >>>>>>> listener is not handling events yet, the fault continues to >>>>>>> occur. I am not sure why the listener does not get a chance to >>>>>>> run. I also do not follow is that why there are multiple faults >>>>>>> as the vcpu should have been paused >>>>>>after >>>>>>> the first event was sent to xen-access and only be resumed after >>>>>>> violation >>>>>>has >>>>>>> been resolved and when it calls xc_access_resume(), which ends up >>>>>>unpausing >>>>>>> the vcpu. Or is this occurring because runstate_guest(v).p is >>>>>>> being accessed from Xen? >>>>>> >>>>>>The runstate changes (and hence needs to get written) as a side >>>>>>effect of pausing the guest (as can be seen from the stack trace). >>>>>>The first question that needs clarification (for me at least, since >>>>>>I don't know much about the access stuff for HVM) is how the same >>>>>>situation gets handled >>>>>>there: Do Xen writes to HVM guest memory get intercepted? Other >>>>>>than for PV, they're not going through the same page tables, so >>>>>>special precautions would be needed to filter them. Quite obviously >>>>>>(I think) if they're not being filtered for HVM, then they shouldn't be >for PV. >>>>> >>>>> AFAIK, they are not being filtered for HVM. I brought up a HVM >>>>> domain and printed out the runstate_guest area value when it is >>>>> registered. I then ran the xen-access test program to monitor for writes. >>>>> Interestingly I never saw a GLA that matched the runstate_guest area. >>>>> This is not the case for a PV domain as it is one of the first >>>>> violations >> the >>>xen- >>>>access test program sees. >>>>> Is this an EPT vs regular page table difference as in one case it is >>>>> shared? >>>> >>>>Yes, as said in my earlier reply. And when hypervisor writes aren't >>>>subject >> to >>>>filtering in HVM, you probably want/need to make things behave that >>>>way >>>for >>>>PV too (albeit it may not be trivial, as you clearly don't want to >>>>start >>>emulating >>>>hypervisor accesses). >>> >>>The main issue I am seeing is the same violation being sent multiple >>>times to the mem_event queue and the listener not getting scheduled to >>>run. I would think that the same issue would happen with a HVM guest >>>but in that case I don't even see a violation of the runstate_guest >>>area. I guess I should try figuring out why wqv->esp becomes 0 instead >>>of trying to stop multiple events for the same violations being sent. >>>If you have a suggestion in the >> area >>>I should be looking at, it would be welcome. >> >> I meant to say why wqv->esp becomes non-zero in the message above. > >That's pretty obvious - this is a per-vCPU field, so there can't be two nested >attempts to wait, yet the call stack you got made pretty clear that this is >what >is happening (and what is expected to be happening - the vCPU gets a >wakeup, needs to write the change runstate, faults, wants the mem-event to >be delivered, hence needs to be put to sleep, which in turn requires another >change to the runstate). It looks like the nested attempts to wait() happens only when the ring is full. The flow is mem_event_claim_slot() -> mem_event_wait_slot() -> wait_event(mem_event_wait_try_grab(med, &rc) != -EBUSY) wait_event() macro looks like this: do { \ if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) \ break; \ for ( ; ; ) { \ prepare_to_wait(&med->wq); \ if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) \ break; \ wait(); \ } \ finish_wait(&med->wq); \ } while (0) In the case where the ring is full, wait() gets called and the cpu gets scheduled away. But since it is in middle of a pagefault, when it runs again it ends up in handle_exception_saved and the same pagefault is tried again. But since finish_wait() never ends up being called wqv->esp never becomes 0 and hence the assert fires on the next go around. Am I on the right track? Thanks, Aravindh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.