[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5/8] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup



>>> On 04.07.16 at 16:21, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
> On 07/04/16 17:17, Jan Beulich wrote:
>>>>> On 04.07.16 at 15:50, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>> On 07/04/16 16:11, Jan Beulich wrote:
>>>>>>> On 04.07.16 at 15:03, <czuzu@xxxxxxxxxxxxxxx> wrote:
>>>>> On 7/4/2016 3:47 PM, Jan Beulich wrote:
>>>>>>>>> On 30.06.16 at 20:45, <czuzu@xxxxxxxxxxxxxxx> wrote:
>>>>>>> The arch_vm_event structure is dynamically allocated and freed @
>>>>>>> vm_event_cleanup_domain. This cleanup is triggered e.g. when the 
>>>>>>> toolstack 
>>> user
>>>>>>> disables domain monitoring (xc_monitor_disable), which in turn 
>>>>>>> effectively
>>>>>>> discards any information that was in arch_vm_event.write_data.
>>>>>> Isn't that rather a toolstack user bug, not warranting a relatively
>>>>>> extensive (even if mostly mechanical) hypervisor change like this
>>>>>> one? Sane monitor behavior, after all, is required anyway for the
>>>>>> monitored guest to survive.
>>>>>
>>>>> Sorry but could you please rephrase this, I don't quite understand what 
>>>>> you're saying.
>>>>> The write_data field in arch_vm_event should _not ever_ be invalidated 
>>>>> as a direct result of a toolstack user's action.
>>>>
>>>> The monitoring app can cause all kinds of problems to the guest it
>>>> monitors. Why would this specific one need taking care of in the
>>>> hypervisor, instead of demanding that the app not disable monitoring
>>>> at the wrong time?
>>>
>>> I'm not sure there's a right time here. The problem is that, if I
>>> understand this correctly, a race is possible between the moment the
>>> userspace application responds to the vm_event _and_ call
>>> xc_monitor_disable() and the time hvm_do_resume() gets called.
>> 
>> It's that _and_ in your reply that I put under question, but I admit
>> that questioning may be due to my limited (or should I say non-
>> existent) knowledge on the user space parts here: Why would the
>> app be _required_ to "responds to the vm_event _and_ call
>> xc_monitor_disable()", rather than delaying the latter for long
>> enough?
> 
> It's not required to do that, it's just that we don't really know what
> "long enough means". I suppose a second should do be more than enough,
> but obviously we'd prefer a fool-proof solution with better guarantees
> and no lag - I thought that the hypervisor change is trivial enough to
> not make the tradeoff into much of an issue.

It's mostly mechanical indeed, but iirc it grows struct vcpu (or was
it struct domain?), which is never really desirable (but admittedly
often unavoidable). I'm therefore simply trying to understand what
alternatives there are.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.