[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Issue policing writes from Xen to PV domain memory

>>> On 09.05.14 at 04:42, <aravindp@xxxxxxxxx> wrote:
> The above sequence of events would not cause an EPT violation (I realize this 
> can happen only in the guest context) or pagefault (The page is present and 
> marked writable in the guest). All that would happen is an event to be sent 
> to the listener from __hvm_copy() and no cascading faults will occur.

I have to admit that I'm unable to spot where such an event gets sent.

> The one thing I am not able to figure out is why doesn't the listener i.e. 
> Dom0's VCPU get to run and process the events in the window between access 
> enable and process events loop. I am not familiar with the Xen scheduler well 
> enough to know how it would react to cascading pagefaults occurring for a 
> guest area in the Xen context like it is happening above. I have even tried 
> pinning Dom0 and the guest on different CPUs but this still occurs. I would 
> be grateful if you could provide some insight here.

I'm in no way convinced that this is scheduler related at all, which your
pinning experiment would appear to confirm.

What is quite clear is that you don't want multiple nested runstate
area updates to occur in the first place, i.e. you need to properly
deal with the _first_ page fault occurring instead of waiting until
multiple such events pile up. And I'm afraid doing so will require some
(perhaps gross) hackery...

> Looking at what the solution for the ring being full in the PV case whether 
> we are policing Xen writes or not, calling wait() will not work due to the 
> scenario I had mentioned a while back and is shown above in the stack trace. 
> I am repeating that flow here
> mem_event_claim_slot() -> 
>       mem_event_wait_slot() ->
>                wait_event(mem_event_wait_try_grab(med, &rc) != -EBUSY)
> wait_event() macro looks like this:
> do { 
>     if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) 
>         break; 
>     for ( ; ; ) { 
>         prepare_to_wait(&med->wq); 
>         if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) 
>             break; 
>         wait(); 
>     } 
>     finish_wait(&med->wq); 
> } while (0)
> In the case where the ring is full, wait() gets called and the cpu gets 
> scheduled away. But since it is in middle of a pagefault, when it runs again 
> it ends up in handle_exception_saved and the same pagefault is tried again. 
> But since finish_wait() never ends up being called wqv->esp never becomes 0 
> and 
> hence the assert fires on the next go around. So I think we should be calling 
> process_pending_softirqs() instead of wait() for PV domains.

That would effectively be a spin wait then, which is surely not the right
thing. But I don't follow your explanation above anyway - when coming
back from wait(), the state is the same as the original one, so the page
fault handling continues, it's not being retried.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.