[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Issue policing writes from Xen to PV domain memory



>> It looks like the nested attempts to wait() happens only when the ring
>> is full. The flow is
>> mem_event_claim_slot() ->
>>      mem_event_wait_slot() ->
>>               wait_event(mem_event_wait_try_grab(med, &rc) != -
>EBUSY)
>>
>> wait_event() macro looks like this:
>> do {                                            \
>>     if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )
>>    \
>>         break;                                  \
>>     for ( ; ; ) {                               \
>>         prepare_to_wait(&med->wq);                   \
>>         if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )
>>    \
>>             break;                              \
>>         wait();                                 \
>>     }                                           \
>>     finish_wait(&med->wq);                           \
>> } while (0)
>>
>> In the case where the ring is full, wait() gets called and the cpu
>> gets scheduled away. But since it is in middle of a pagefault, when it
>> runs again it ends up in handle_exception_saved and the same pagefault is
>tried again.
>> But since finish_wait() never ends up being called wqv->esp never
>> becomes 0 and hence the assert fires on the next go around. Am I on the
>right track?
>
>Looks like so, with the caveat that it's then unclear to me why the ring would
>be full in the first place - shouldn't it get drained of earlier requests quite
>quickly? But anyway, even if this didn't occur immediately, but only rarely
>after many hours of running, it would still need taking care of.

This happens only when the access listener first attaches to a PV domain for 
write events. It occurs in the window between mem_access has been enabled and 
the listener is ready to handle events. I have not seen it happen once the 
listener starts handling events. And I have not run in to this issue with 
execute violations. 

So I tried a couple of things. One was to check if the ring was full before 
sending a write violation emanating from Xen. If the ring was full instead of 
calling mem_access_send_req() which would end up calling schedule(),  I allowed 
the PTE to be created with the same default permissions which would then cause 
the pagefault to be retried indirectly giving the listener more time to handle 
events. This way the write would be policed. This is what the change looks like 
on top of the original patch.

diff --git a/xen/arch/x86/mm/mem_event.c b/xen/arch/x86/mm/mem_event.c
index 9b45504..ea84bd8 100644
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -121,6 +121,15 @@ static unsigned int mem_event_ring_available(struct 
mem_event_domain *med)
     return avail_req;
 }
 
+unsigned int mem_event_check_ring_availability(struct mem_event_domain *med)
+{
+    int avail_req = RING_FREE_REQUESTS(&med->front_ring);
+
+    mem_event_ring_lock(med);
+    avail_req = mem_event_ring_available(med);
+    mem_event_ring_unlock(med);
+    return avail_req;
+}
 /*
  * mem_event_wake_blocked() will wakeup vcpus waiting for room in the
  * ring. These vCPUs were paused on their way out after placing an event,
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index aee061c..a827ca1 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3066,20 +3065,16 @@ static int sh_page_fault(struct vcpu *v,
             }
 
             /*
-             * Do not police writes to guest memory emanating from the Xen
-             * kernel. Trying to do so will cause the same pagefault to occur
-             * over and over again with an event being sent to the access
-             * listener for each fault. If the access listener's vcpu is not
-             * scheduled during this time, the violation is never resolved and
-             * will eventually end with the host crashing.
+             * Xen is the source of the write violation and if it allowed to go
+             * through when the ring is full it will end up calling nested
+             * wait() and the host will crash. Instead of passing the violation
+             * to the listener, we will let the pagefault be retried, 
indirectly
+             * giving the listener more time to handle the existing violations.
              */
             if ( (violation && access_w) &&
-                 (regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END) )
-            {
-                violation = 0;
-                rc = p2m->set_entry(p2m, gfn_x(gfn), gmfn, PAGE_ORDER_4K,
-                                    p2m_ram_rw, p2m_access_rw);
-            }
+                 (regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END) &&
+                 (!mem_event_check_ring_availability(&d->mem_event->access)) )
+                    violation = 0;
 
             if ( violation )
             {

The other approach I took was to detect if the write violation was emanating 
from Xen when the ring was full and sending that fact down to wait_event(). If 
the context was Xen, then instead of calling wait(), wait_event() would call 
process_pending_softirqs(). This is how wait_event() would look with this 
change:

<In sh_page_fault()>

            if ( (violation && access_w) &&
                 (regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END) &&
                 (!mem_event_check_ring_availability(&d->mem_event->access)) )
                xen_ctxt = 1;

<In wait.h>
#define wait_event(wq, condition, xen_ctxt)     \
do {                                            \
    if ( condition )                            \
        break;                                  \
    for ( ; ; ) {                               \
        if ( !xen_ctxt )                        \
            prepare_to_wait(&wq);               \
        if ( condition )                        \
            break;                              \
        if ( !xen_ctxt )                        \
            wait();                             \
        else                                    \
            process_pending_softirqs();         \
    }                                           \
    if ( !xen_ctxt )                            \
        finish_wait(&wq);                       \
} while (0)

Is either of the above solutions viable?

Thanks,
Aravindh


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.