[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] vm_event: sync domctl



On 23/12/2015 21:06, Tamas K Lengyel wrote:


On Wed, Dec 23, 2015 at 9:55 PM, Tamas K Lengyel <tamas@xxxxxxxxxxxxx> wrote:



On Wed, Dec 23, 2015 at 8:14 PM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
On 23/12/2015 18:11, Tamas K Lengyel wrote:


On Wed, Dec 23, 2015 at 6:17 PM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
On 23/12/2015 15:41, Razvan Cojocaru wrote:
> On 12/23/2015 04:53 PM, Tamas K Lengyel wrote:
>> Introduce new vm_event domctl option which allows an event subscriber
>> to request all vCPUs not currently pending a vm_event request to be paused,
>> thus allowing the subscriber to sync up on the state of the domain. This
>> is especially useful when the subscribed wants to disable certain events
>> from being delivered and wants to ensure no more requests are pending on the
>> ring before doing so.
>>
>> Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
>> Cc: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
>> Cc: Ian Campbell <ian.campbell@xxxxxxxxxx>
>> Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
>> Cc: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
>> Signed-off-by: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>
> This certainly looks very interesting. Would xc_domain_pause() not be
> enough for your use case then?

I second this query. I would have thought xc_domain_pause() does
exactly what you want in this case.

The problem is in what order the responses are processed. I may not be correct about the logic but here is what my impression was: xc_domain_unpause resumes all vCPUs even if there is still a vm_event response that has not been processed. Now, if the subscriber set response flags (altp2m switch, singlestep toggle, etc) those actions would not be properly performed on the vCPU before it's resumed. If the subscriber processes all requests and signals via the event channel that the responses are on the ring, then calls xc_domain_unpause, we can still have a race between processing the responses from the ring and unpausing the vCPU.
Â
The code provided is racy, as it is liable to alter which pause
references it takes/releases depending on what other pause/unpause
actions are being made.

It's understood that the user would not use xc_domain_pause/unpause while using vm_event responses with response flags specified. Even then, it was already racy IMHO if the user called xc_domain_unpause before processing requests from the vm_event ring that originally paused the vCPU, so this doesn't change that situation.

Pausing is strictly reference counted. (or rather, it is since c/s 3eb1c70 "properly reference count DOMCTL_{,un}pausedomain hypercalls". Before then, it definitely was buggy.)

There is the domain pause count, and pause counts per vcpu. All domain pause operations take both a domain pause reference, and a vcpu pause reference on each vcpu. A vcpu is only eligible to be scheduled if its pause reference count is zero. If two independent tasks call vcpu_pause() on the same vcpu, it will remain paused until both independent tasks have called vcpu_unpause().

Actually, I've double-checked and v->pause_count and v->vm_event_pause_count are both increased for a vm_event request. So you are right, the reference counting will make sure that v->pause_count > 0 until we process the vm_event response and call xc_domain_unpause. I was under the impression that wasn't the case. We can ignore this patch.

Thanks and sorry for the noise ;)

Not a problem at all. This is complicated stuff, and IMO it was equally as likely that there was a real bug lurking.

~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.