[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events
On Mon, Jun 8, 2020 at 3:16 PM Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> wrote: > > On 6/8/20 11:44 PM, Tamas K Lengyel wrote: > > On Mon, Jun 8, 2020 at 2:14 PM Razvan Cojocaru > > <rcojocaru@xxxxxxxxxxxxxxx> wrote: > >> > >> On 6/8/20 10:54 PM, Tamas K Lengyel wrote: > >>> On Mon, Jun 8, 2020 at 12:58 PM Razvan Cojocaru > >>>> And last but not least, the proper sequence is: 1. unsubscribe from > >>>> register write events, 2. process all events "still in the chamber" > >>>> (keep checking the ring buffer for a while), 3. detach from the guest > >>>> (disable the vm_event subsystem). Not ideal perhaps (in that it's not > >>>> guaranteed that a VCPU won't resume after a longer period than our > >>>> timeout), but if the sequence is followed there should be no guest hangs > >>>> or crashes (at least none that we or our clients have observed so far). > >>> > >>> Incorrect. That's not enough. You also have to wait for all the vCPUs > >>> to get scheduled before disabling vm_event or otherwise the emulation > >>> is skipped entirely. Please read the patch message. If the user > >>> decides to disable vm_event after getting a CR3 event delivered, the > >>> CR3 never gets updated and results in the guest crashing in > >>> unpredictable ways. Same happens with all the other registers. > >> > >> I did read the patch message. As I've previously stated ("it's not > >> guaranteed that a VCPU won't resume after a longer period than our > >> timeout"), it's not ideal, and I've made no claim that the problem does > >> not exist or that it shouldn't be fixed - but really if you've got a > >> VCPU that will wait more than a couple of seconds to get scheduled then > >> you've got a bigger problem with the VM. > > > > Sorry, missed the part where you say you knew about this issue. We > > didn't and it was absolutely not documented anywhere and certainly not > > mentioned in the original patch that added the feature. It literally > > took us years to figure out what the hell is going on. While as you it > > can be a couple seconds before its safe to disable it can be a lot > > longer depending on the system state, how many VMs are running and how > > many vCPUs are active on the VM. There is absolutely necessary > > use-cases where you want to keep the VM paused after a CR3 event is > > received and exit vm_event afterwards. This bug totally blocks those > > use-cases. Not to mention that it's a total mess having an interface > > where the user has to guess when its safe to do something. If this was > > pointed out when the patch was made I would have objected to that > > being merged. > > No, we didn't know about the issue. It's apparently not my most eloquent > evening. I was merely saying that I did understand what the issue was > from your description, and offering an explanation on why we've never > seen this in QA or production. Of course the issue would have been > mentioned at least (but ideally not exist to begin with) had it been known. > > We do take several passes through the ring buffer (and as a side-effect > reduce the probability of a race occuring to almost zero), but we do > that to make sure we've cleaned up all EPT vm_events; the fact that it > has helped with _this_ issue is a coincidence. > > IIRC Windows, at least, will be upset if a VCPU is stuck for too long. > > As for how the vm_event system behaves: > > 1. A security application that is unable to _prevent_ things being done > to a system is not doing a very good job, since in that case you can > only collect stats and not veto anything. I would argue that the default > for such a monitoring application should be the current one (all events > should be pre-action). Not all security applications require this though. Malware analysis where stealth is required would absolutely not want this side-effect to be visible to the guest where malware could use it to determine that it's being monitored. So I don't buy into this argument. > > 2. This is further supported by the fact that that's exactly how the EPT > vm_events work: you get a "I want to write to this page" event _before_ > the write occurs. If register writes behave differently, you have _two_ > different models: one where you get an event post-action, and one where > you get one pre-action. Whether you get an event before or after the effects of the event have been applied to the system state shouldn't matter as long as you can revert that action. I wouldn't care either way to be the default. But having a default that breaks other use-cases is unacceptable. Tamas
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |