[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Issue policing writes from Xen to PV domain memory



>> The above sequence of events would not cause an EPT violation (I
>> realize this can happen only in the guest context) or pagefault (The
>> page is present and marked writable in the guest). All that would
>> happen is an event to be sent to the listener from __hvm_copy() and no
>cascading faults will occur.
>
>I have to admit that I'm unable to spot where such an event gets sent.

I had preceded this by mentioning "Now take the similar scenario in the 
hypothetical HVM + EPT case where we are policing Xen writes to guest memory." 
The code would look something like this if it was implemented. Please note this 
was just a quick hacky write up :-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index da220bf..8ce99d5 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2750,6 +2750,65 @@ static enum hvm_copy_result __hvm_copy(
             return HVMCOPY_unhandleable;
         }
 
+        if ( unlikely(mem_event_check_ring(&curr->domain->mem_event->access)) )
+        {
+            struct p2m_domain *p2m = p2m_get_hostp2m(curr->domain);
+            mfn_t gmfn;
+            p2m_type_t _p2mt;
+            p2m_access_t p2ma;
+            bool_t violation = 0;
+            mem_event_request_t *req_ptr = NULL;
+
+            gmfn = p2m->get_entry(p2m, gfn, &_p2mt, &p2ma, 0, NULL);;
+
+            /* If the access is against the permissions, then send to 
mem_event */
+            switch (p2ma)
+            {
+            case p2m_access_n:
+            case p2m_access_n2rwx:
+            case p2m_access_x:
+                violation = 1;
+                break;
+
+            case p2m_access_w:
+            case p2m_access_wx:
+                if ( flags & HVMCOPY_from_guest )
+                    violation = 1;
+                break;
+
+            case p2m_access_r:
+            case p2m_access_rx:
+            case p2m_access_rx2rw:
+                if ( flags & HVMCOPY_to_guest )
+                    violation = 1;
+                break;
+
+            case p2m_access_rw:
+            case p2m_access_rwx:
+                break;
+            }
+
+            if ( violation )
+            {
+                paddr_t gpa = (gfn << PAGE_SHIFT) +
+                              (flags & HVMCOPY_virt ?
+                               (addr & ((1 << PAGE_SHIFT) - 1)) : 0);
+
+                if ( !p2m_mem_access_check(gpa, (flags & HVMCOPY_virt ? 1 : 0),
+                                           addr, 0, 1, 0, &req_ptr) )
+                {
+                    if ( unlikely(req_ptr != NULL) )
+                    {
+                        mem_access_send_req(curr->domain, req_ptr);
+                        xfree(req_ptr);
+                    }
+                    put_page(page);
+                    return HVMCOPY_unhandleable;
+                }
+            }
+        }
+
         p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK);
 
         if ( flags & HVMCOPY_to_guest )

>What is quite clear is that you don't want multiple nested runstate area
>updates to occur in the first place, i.e. you need to properly deal with the
>_first_ page fault occurring instead of waiting until multiple such events pile
>up. And I'm afraid doing so will require some (perhaps gross) hackery...

OK, I will take a stab at this. 

>> Looking at what the solution for the ring being full in the PV case
>> whether we are policing Xen writes or not, calling wait() will not
>> work due to the scenario I had mentioned a while back and is shown above
>in the stack trace.
>> I am repeating that flow here
>> mem_event_claim_slot() ->
>>      mem_event_wait_slot() ->
>>               wait_event(mem_event_wait_try_grab(med, &rc) != -
>EBUSY)
>>
>> wait_event() macro looks like this:
>> do {
>>     if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )
>>         break;
>>     for ( ; ; ) {
>>         prepare_to_wait(&med->wq);
>>         if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )
>>             break;
>>         wait();
>>     }
>>     finish_wait(&med->wq);
>> } while (0)
>>
>> In the case where the ring is full, wait() gets called and the cpu
>> gets scheduled away. But since it is in middle of a pagefault, when it
>> runs again it ends up in handle_exception_saved and the same pagefault is
>tried again.
>> But since finish_wait() never ends up being called wqv->esp never
>> becomes 0 and hence the assert fires on the next go around. So I think
>> we should be calling
>> process_pending_softirqs() instead of wait() for PV domains.
>
>That would effectively be a spin wait then, which is surely not the right 
>thing.
>But I don't follow your explanation above anyway - when coming back from
>wait(), the state is the same as the original one, so the page fault handling
>continues, it's not being retried.

Let me step through this. The pagefault for runstate occurs and wait_event() 
gets called and the ring is full. 

wait_event():
do { 
    if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )       ==> This will 
return EBUSY
        break; 
    for ( ; ; ) { 
        prepare_to_wait(&med->wq); 
        if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )   ==> This will 
return EBUSY
            break; 
        wait();                                                                 
                                  ==> Write to runstate area occurs...
    } 
    finish_wait(&med->wq);
} while (0)

...now this write to runstate will again cause a pagefault and we will end up 
in wait_event() again. The previous attempt would have called prepare_to_wait() 
but finish_wait() was not called. finish_wait()->__finish_wait() is where 
wqv->esp gets reset to NULL. So now:

wait_event():
do { 
    if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )       ==> This will 
return EBUSY
        break; 
    for ( ; ; ) { 
        prepare_to_wait(&med->wq);                                              
   ==> Assertion 'wqv->esp == 0' fails
        if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) 
            break; 
        wait();
    } 
    finish_wait(&med->wq);
} while (0)

Does this make sense?

Thanks,
Aravindh


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.