[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 for-4.14] x86/monitor: revert default behavior when monitoring register write events
On Tue, Jun 02, 2020 at 07:49:09AM -0600, Tamas K Lengyel wrote: > For the last couple years we have received numerous reports from users of > monitor vm_events of spurious guest crashes when using events. In particular, > it has observed that the problem occurs when vm_events are being disabled. The > nature of the guest crash varied widely and has only occured occasionally. > This > made debugging the issue particularly hard. We had discussions about this > issue > even here on the xen-devel mailinglist with no luck figuring it out. > > The bug has now been identified as a race-condition between register event > handling and disabling the monitor vm_event interface. > > Patch 96760e2fba100d694300a81baddb5740e0f8c0ee, "vm_event: deny register > writes > if refused by vm_event reply" is the patch that introduced the error. In this FWIW, we use the 'Fixes:' tag in order to make it easier for maintainers of stable trees to know which bugfixes to pick. This should have: Fixes: 96760e2fba10 ('vm_event: deny register writes if refused by vm_event reply') Before the SoB. > patch the default behavior regarding emulation of register write events is > changed so that they get postponed until the corresponding vm_event handler > decides whether to allow such write to take place. Unfortunately this can only > be implemented by performing the deny/allow step when the vCPU gets scheduled. > Due to that postponed emulation of the event if the user decides to pause the > VM in the vm_event handler and then disable events, the entire emulation step > is skipped the next time the vCPU is resumed. Even if the user doesn't pause > during the vm_event handling but exits immediately and disables vm_event, the > situation becomes racey as disabling vm_event may succeed before the guest's > vCPUs get scheduled with the pending emulation task. This has been > particularly > the case with VMS that have several vCPUs as after the VM is unpaused it may > actually take a long time before all vCPUs get scheduled. > > In this patch we are reverting the default behavior to always perform > emulation > of register write events when the event occurs. To postpone them can be turned > on as an option. In that case the user of the interface still has to take care > of only disabling the interface when its safe as it remains buggy. > > Signed-off-by: Tamas K Lengyel <tamas@xxxxxxxxxxxxx> Thanks for taking care of this! > --- > xen/arch/x86/hvm/hvm.c | 14 ++++++++------ > xen/arch/x86/hvm/monitor.c | 13 ++++++++----- > xen/arch/x86/monitor.c | 10 +++++++++- > xen/include/asm-x86/domain.h | 1 + > xen/include/asm-x86/hvm/monitor.h | 7 +++---- > xen/include/public/domctl.h | 1 + > 6 files changed, 30 insertions(+), 16 deletions(-) > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index 74c9f84462..5bb47583b3 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -3601,13 +3601,15 @@ int hvm_msr_write_intercept(unsigned int msr, > uint64_t msr_content, > > ASSERT(v->arch.vm_event); > > - /* The actual write will occur in hvm_do_resume() (if permitted). */ > - v->arch.vm_event->write_data.do_write.msr = 1; > - v->arch.vm_event->write_data.msr = msr; > - v->arch.vm_event->write_data.value = msr_content; > + if ( hvm_monitor_msr(msr, msr_content, msr_old_content) ) > + { > + /* The actual write will occur in hvm_do_resume(), if permitted. > */ > + v->arch.vm_event->write_data.do_write.msr = 1; > + v->arch.vm_event->write_data.msr = msr; > + v->arch.vm_event->write_data.value = msr_content; > > - hvm_monitor_msr(msr, msr_content, msr_old_content); > - return X86EMUL_OKAY; > + return X86EMUL_OKAY; > + } > } > > if ( (ret = guest_wrmsr(v, msr, msr_content)) != X86EMUL_UNHANDLEABLE ) > diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c > index 8aa14137e2..36894b33a4 100644 > --- a/xen/arch/x86/hvm/monitor.c > +++ b/xen/arch/x86/hvm/monitor.c > @@ -53,11 +53,11 @@ bool hvm_monitor_cr(unsigned int index, unsigned long > value, unsigned long old) > .u.write_ctrlreg.old_value = old > }; > > - if ( monitor_traps(curr, sync, &req) >= 0 ) > - return 1; > + return monitor_traps(curr, sync, &req) >= 0 && > + curr->domain->arch.monitor.control_register_values; Nit (it can be fixed while committing IMO): curr should be aligned to monitor. > } > > - return 0; > + return false; > } > > bool hvm_monitor_emul_unimplemented(void) > @@ -77,7 +77,7 @@ bool hvm_monitor_emul_unimplemented(void) > monitor_traps(curr, true, &req) == 1; > } > > -void hvm_monitor_msr(unsigned int msr, uint64_t new_value, uint64_t > old_value) > +bool hvm_monitor_msr(unsigned int msr, uint64_t new_value, uint64_t > old_value) > { > struct vcpu *curr = current; > > @@ -92,8 +92,11 @@ void hvm_monitor_msr(unsigned int msr, uint64_t new_value, > uint64_t old_value) > .u.mov_to_msr.old_value = old_value > }; > > - monitor_traps(curr, 1, &req); > + return monitor_traps(curr, 1, &req) >= 0 && > + curr->domain->arch.monitor.control_register_values; Same here. > } > + > + return false; > } > > void hvm_monitor_descriptor_access(uint64_t exit_info, > diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c > index bbcb7536c7..1517a97f50 100644 > --- a/xen/arch/x86/monitor.c > +++ b/xen/arch/x86/monitor.c > @@ -144,7 +144,15 @@ int arch_monitor_domctl_event(struct domain *d, > struct xen_domctl_monitor_op *mop) > { > struct arch_domain *ad = &d->arch; > - bool requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op); > + bool requested_status; > + > + if ( XEN_DOMCTL_MONITOR_OP_CONTROL_REGISTERS == mop->op ) > + { > + ad->monitor.control_register_values = true; > + return 0; I think this would be better implemented in arch_monitor_domctl_op which already handles other XEN_DOMCTL_MONITOR_OP_* options, and also skips the arch_monitor_domctl_event call? > + } > + > + requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op); > > switch ( mop->event ) > { > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h > index e8cee3d5e5..6fd94c2e14 100644 > --- a/xen/include/asm-x86/domain.h > +++ b/xen/include/asm-x86/domain.h > @@ -418,6 +418,7 @@ struct arch_domain > * This is used to filter out pagefaults. > */ > unsigned int inguest_pagefault_disabled : > 1; > + unsigned int control_register_values : > 1; > struct monitor_msr_bitmap *msr_bitmap; > uint64_t write_ctrlreg_mask[4]; > } monitor; > diff --git a/xen/include/asm-x86/hvm/monitor.h > b/xen/include/asm-x86/hvm/monitor.h > index 66de24cb75..a75cd8545c 100644 > --- a/xen/include/asm-x86/hvm/monitor.h > +++ b/xen/include/asm-x86/hvm/monitor.h > @@ -29,15 +29,14 @@ enum hvm_monitor_debug_type > }; > > /* > - * Called for current VCPU on crX/MSR changes by guest. > - * The event might not fire if the client has subscribed to it in > onchangeonly > - * mode, hence the bool return type for control register write events. > + * Called for current VCPU on crX/MSR changes by guest. Bool return signals > + * whether emulation should be postponed. > */ > bool hvm_monitor_cr(unsigned int index, unsigned long value, > unsigned long old); > #define hvm_monitor_crX(cr, new, old) \ > hvm_monitor_cr(VM_EVENT_X86_##cr, new, old) > -void hvm_monitor_msr(unsigned int msr, uint64_t value, uint64_t old_value); > +bool hvm_monitor_msr(unsigned int msr, uint64_t value, uint64_t old_value); > void hvm_monitor_descriptor_access(uint64_t exit_info, > uint64_t vmx_exit_qualification, > uint8_t descriptor, bool is_write); > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h > index 1ad34c35eb..cbcd25f12c 100644 > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -1025,6 +1025,7 @@ struct xen_domctl_psr_cmt_op { > #define XEN_DOMCTL_MONITOR_OP_DISABLE 1 > #define XEN_DOMCTL_MONITOR_OP_GET_CAPABILITIES 2 > #define XEN_DOMCTL_MONITOR_OP_EMULATE_EACH_REP 3 > +#define XEN_DOMCTL_MONITOR_OP_CONTROL_REGISTERS 4 Could you please add a note that this is broken? Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |