[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Issue policing writes from Xen to PV domain memory

>>>>>> > On adding some debugging, I discovered that it happens after
>>>>>> > mem_access
>>>>>>> enabled but xen-access has not started handling events. After
>>>>>>> comparing
>>>>>>> stack trace and gla in question There are multiple write faults
>>>>>>> to the runstate_guest(v), each causing an event to be sent to
>>>>>>> xen-access. Since
>>>>>>> listener is not handling events yet, the fault continues to
>>>>>>> occur. I am not sure why the listener does not get a chance to
>>>>>>> run.  I also do not follow is that why there are multiple faults
>>>>>>> as the vcpu should have been paused
>>>>>>> the first event was sent to xen-access and only be resumed after
>>>>>>> violation
>>>>>>> been resolved and when it calls xc_access_resume(), which ends up
>>>>>>> the vcpu. Or is this occurring because runstate_guest(v).p is
>>>>>>> being accessed from Xen?
>>>>>>The runstate changes (and hence needs to get written) as a side
>>>>>>effect of pausing the guest (as can be seen from the stack trace).
>>>>>>The first question that needs clarification (for me at least, since
>>>>>>I don't know much about the access stuff for HVM) is how the same
>>>>>>situation gets handled
>>>>>>there: Do Xen writes to HVM guest memory get intercepted? Other
>>>>>>than for PV, they're not going through the same page tables, so
>>>>>>special precautions would be needed to filter them. Quite obviously
>>>>>>(I think) if they're not being filtered for HVM, then they shouldn't be
>for PV.
>>>>> AFAIK, they are not being filtered for HVM. I brought up a HVM
>>>>> domain and printed out the runstate_guest area value when it is
>>>>> registered. I then ran the xen-access test program to monitor for writes.
>>>>> Interestingly I never saw a GLA that matched the runstate_guest area.
>>>>> This is not the case for a PV domain as it is one of the first
>>>>> violations
>> the
>>>>access test program sees.
>>>>> Is this an EPT vs regular page table difference as in one case it is 
>>>>> shared?
>>>>Yes, as said in my earlier reply. And when hypervisor writes aren't
>> to
>>>>filtering in HVM, you probably want/need to make things behave that
>>>>PV too (albeit it may not be trivial, as you clearly don't want to
>>>>hypervisor accesses).
>>>The main issue I am seeing is the same violation being sent multiple
>>>times to the mem_event queue and the listener not getting scheduled to
>>>run. I would think that the same issue would happen with a HVM guest
>>>but in that case I don't even see a violation of the runstate_guest
>>>area. I guess I should try figuring out why wqv->esp becomes 0 instead
>>>of trying to stop multiple events for the same violations being sent.
>>>If you have a suggestion in the
>> area
>>>I should be looking at, it would be welcome.
>> I meant to say why wqv->esp becomes non-zero in the message above.
>That's pretty obvious - this is a per-vCPU field, so there can't be two nested
>attempts to wait, yet the call stack you got made pretty clear that this is 
>is happening (and what is expected to be happening - the vCPU gets a
>wakeup, needs to write the change runstate, faults, wants the mem-event to
>be delivered, hence needs to be put to sleep, which in turn requires another
>change to the runstate).

It looks like the nested attempts to wait() happens only when the ring is full. 
The flow is
mem_event_claim_slot() -> 
        mem_event_wait_slot() ->
                 wait_event(mem_event_wait_try_grab(med, &rc) != -EBUSY)

wait_event() macro looks like this:
do {                                            \
    if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )                          
        break;                                  \
    for ( ; ; ) {                               \
        prepare_to_wait(&med->wq);                   \
        if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )                      
            break;                              \
        wait();                                 \
    }                                           \
    finish_wait(&med->wq);                           \
} while (0)

In the case where the ring is full, wait() gets called and the cpu gets 
scheduled away. But since it is in middle of a pagefault, when it runs again it 
ends up in handle_exception_saved and the same pagefault is tried again. But 
since finish_wait() never ends up being called wqv->esp never becomes 0 and 
hence the assert fires on the next go around. Am I on the right track?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.