On Thu, Jul 17, 2014 at 2:51 PM, Aravindh Puthiyaparambil (aravindp) <aravindp@xxxxxxxxx> wrote:
>> +void mem_event_vcpu_unpause(struct vcpu *v) {
>> + if ( test_and_clear_bool(v->paused_for_mem_event) )
>
>And now that we consider more than one mem event piling up to pause a
>vcpu, this has to become an atomic counter, which unpauses on zero, and
>takes care of underflow.
Very true. I have seen this event pile up occur in practice in our product.
The problem becomes how to tell apart real event responses that should dec the pause count from spurious crap from the toolstack. IOW, how to not unpause the vcpu when count reaches zero due to bad responses. I think the answer is: you
can't, if the toolstack is evil, behavior undefined and bigger fish to fry.
You really can't, but the important bit is to ensure that Xen is sufficiently insulated from buggy toolstack components that it doesn't fall over.
From my experimenting with the pausedomain refcoutnging, weird stuff happens when the domain pause count turns negative. I ended up with a domain which would never be scheduled again (even after returning the count to positive and back to 0), and a domain
which couldn't be killed using `xl destroy`. Rebooting was the only option.
So long as Xen doesn't fall into these problems, a buggy toolstack (especially with mem_events) already has many ways to screw over a domain, so one more is not a problem.
I misunderstood Andreâs point. Your response made it clear what the concern was.
Thanks,
Aravindh