[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 for-4.14] x86/monitor: revert default behavior when monitoring register write events
On Wed, Jun 3, 2020 at 2:28 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: > > On Tue, Jun 02, 2020 at 07:49:09AM -0600, Tamas K Lengyel wrote: > > For the last couple years we have received numerous reports from users of > > monitor vm_events of spurious guest crashes when using events. In > > particular, > > it has observed that the problem occurs when vm_events are being disabled. > > The > > nature of the guest crash varied widely and has only occured occasionally. > > This > > made debugging the issue particularly hard. We had discussions about this > > issue > > even here on the xen-devel mailinglist with no luck figuring it out. > > > > The bug has now been identified as a race-condition between register event > > handling and disabling the monitor vm_event interface. > > > > Patch 96760e2fba100d694300a81baddb5740e0f8c0ee, "vm_event: deny register > > writes > > if refused by vm_event reply" is the patch that introduced the error. In > > this > > FWIW, we use the 'Fixes:' tag in order to make it easier for > maintainers of stable trees to know which bugfixes to pick. This > should have: > > Fixes: 96760e2fba10 ('vm_event: deny register writes if refused by vm_event > reply') > > Before the SoB. > > > patch the default behavior regarding emulation of register write events is > > changed so that they get postponed until the corresponding vm_event handler > > decides whether to allow such write to take place. Unfortunately this can > > only > > be implemented by performing the deny/allow step when the vCPU gets > > scheduled. > > Due to that postponed emulation of the event if the user decides to pause > > the > > VM in the vm_event handler and then disable events, the entire emulation > > step > > is skipped the next time the vCPU is resumed. Even if the user doesn't pause > > during the vm_event handling but exits immediately and disables vm_event, > > the > > situation becomes racey as disabling vm_event may succeed before the guest's > > vCPUs get scheduled with the pending emulation task. This has been > > particularly > > the case with VMS that have several vCPUs as after the VM is unpaused it may > > actually take a long time before all vCPUs get scheduled. > > > > In this patch we are reverting the default behavior to always perform > > emulation > > of register write events when the event occurs. To postpone them can be > > turned > > on as an option. In that case the user of the interface still has to take > > care > > of only disabling the interface when its safe as it remains buggy. > > > > Signed-off-by: Tamas K Lengyel <tamas@xxxxxxxxxxxxx> > > Thanks for taking care of this! > > > --- > > xen/arch/x86/hvm/hvm.c | 14 ++++++++------ > > xen/arch/x86/hvm/monitor.c | 13 ++++++++----- > > xen/arch/x86/monitor.c | 10 +++++++++- > > xen/include/asm-x86/domain.h | 1 + > > xen/include/asm-x86/hvm/monitor.h | 7 +++---- > > xen/include/public/domctl.h | 1 + > > 6 files changed, 30 insertions(+), 16 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > > index 74c9f84462..5bb47583b3 100644 > > --- a/xen/arch/x86/hvm/hvm.c > > +++ b/xen/arch/x86/hvm/hvm.c > > @@ -3601,13 +3601,15 @@ int hvm_msr_write_intercept(unsigned int msr, > > uint64_t msr_content, > > > > ASSERT(v->arch.vm_event); > > > > - /* The actual write will occur in hvm_do_resume() (if permitted). > > */ > > - v->arch.vm_event->write_data.do_write.msr = 1; > > - v->arch.vm_event->write_data.msr = msr; > > - v->arch.vm_event->write_data.value = msr_content; > > + if ( hvm_monitor_msr(msr, msr_content, msr_old_content) ) > > + { > > + /* The actual write will occur in hvm_do_resume(), if > > permitted. */ > > + v->arch.vm_event->write_data.do_write.msr = 1; > > + v->arch.vm_event->write_data.msr = msr; > > + v->arch.vm_event->write_data.value = msr_content; > > > > - hvm_monitor_msr(msr, msr_content, msr_old_content); > > - return X86EMUL_OKAY; > > + return X86EMUL_OKAY; > > + } > > } > > > > if ( (ret = guest_wrmsr(v, msr, msr_content)) != X86EMUL_UNHANDLEABLE ) > > diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c > > index 8aa14137e2..36894b33a4 100644 > > --- a/xen/arch/x86/hvm/monitor.c > > +++ b/xen/arch/x86/hvm/monitor.c > > @@ -53,11 +53,11 @@ bool hvm_monitor_cr(unsigned int index, unsigned long > > value, unsigned long old) > > .u.write_ctrlreg.old_value = old > > }; > > > > - if ( monitor_traps(curr, sync, &req) >= 0 ) > > - return 1; > > + return monitor_traps(curr, sync, &req) >= 0 && > > + curr->domain->arch.monitor.control_register_values; > > Nit (it can be fixed while committing IMO): curr should be aligned to > monitor. This is the established style already in-place, see http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/hvm/monitor.c;h=8aa14137e25a47d3826843441e244decf81dc855;hb=refs/heads/staging#l76: 76 return curr->domain->arch.monitor.emul_unimplemented_enabled && 77 monitor_traps(curr, true, &req) == 1; I don't really care either way but at least we should be consistent. > > > } > > > > - return 0; > > + return false; > > } > > > > bool hvm_monitor_emul_unimplemented(void) > > @@ -77,7 +77,7 @@ bool hvm_monitor_emul_unimplemented(void) > > monitor_traps(curr, true, &req) == 1; > > } > > > > -void hvm_monitor_msr(unsigned int msr, uint64_t new_value, uint64_t > > old_value) > > +bool hvm_monitor_msr(unsigned int msr, uint64_t new_value, uint64_t > > old_value) > > { > > struct vcpu *curr = current; > > > > @@ -92,8 +92,11 @@ void hvm_monitor_msr(unsigned int msr, uint64_t > > new_value, uint64_t old_value) > > .u.mov_to_msr.old_value = old_value > > }; > > > > - monitor_traps(curr, 1, &req); > > + return monitor_traps(curr, 1, &req) >= 0 && > > + curr->domain->arch.monitor.control_register_values; > > Same here. > > > } > > + > > + return false; > > } > > > > void hvm_monitor_descriptor_access(uint64_t exit_info, > > diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c > > index bbcb7536c7..1517a97f50 100644 > > --- a/xen/arch/x86/monitor.c > > +++ b/xen/arch/x86/monitor.c > > @@ -144,7 +144,15 @@ int arch_monitor_domctl_event(struct domain *d, > > struct xen_domctl_monitor_op *mop) > > { > > struct arch_domain *ad = &d->arch; > > - bool requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op); > > + bool requested_status; > > + > > + if ( XEN_DOMCTL_MONITOR_OP_CONTROL_REGISTERS == mop->op ) > > + { > > + ad->monitor.control_register_values = true; > > + return 0; > > I think this would be better implemented in arch_monitor_domctl_op > which already handles other XEN_DOMCTL_MONITOR_OP_* options, and also > skips the arch_monitor_domctl_event call? Hm, yea, that would be better placement for it, you are right. > > > + } > > + > > + requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op); > > > > switch ( mop->event ) > > { > > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h > > index e8cee3d5e5..6fd94c2e14 100644 > > --- a/xen/include/asm-x86/domain.h > > +++ b/xen/include/asm-x86/domain.h > > @@ -418,6 +418,7 @@ struct arch_domain > > * This is used to filter out pagefaults. > > */ > > unsigned int inguest_pagefault_disabled > > : 1; > > + unsigned int control_register_values > > : 1; > > struct monitor_msr_bitmap *msr_bitmap; > > uint64_t write_ctrlreg_mask[4]; > > } monitor; > > diff --git a/xen/include/asm-x86/hvm/monitor.h > > b/xen/include/asm-x86/hvm/monitor.h > > index 66de24cb75..a75cd8545c 100644 > > --- a/xen/include/asm-x86/hvm/monitor.h > > +++ b/xen/include/asm-x86/hvm/monitor.h > > @@ -29,15 +29,14 @@ enum hvm_monitor_debug_type > > }; > > > > /* > > - * Called for current VCPU on crX/MSR changes by guest. > > - * The event might not fire if the client has subscribed to it in > > onchangeonly > > - * mode, hence the bool return type for control register write events. > > + * Called for current VCPU on crX/MSR changes by guest. Bool return signals > > + * whether emulation should be postponed. > > */ > > bool hvm_monitor_cr(unsigned int index, unsigned long value, > > unsigned long old); > > #define hvm_monitor_crX(cr, new, old) \ > > hvm_monitor_cr(VM_EVENT_X86_##cr, new, old) > > -void hvm_monitor_msr(unsigned int msr, uint64_t value, uint64_t old_value); > > +bool hvm_monitor_msr(unsigned int msr, uint64_t value, uint64_t old_value); > > void hvm_monitor_descriptor_access(uint64_t exit_info, > > uint64_t vmx_exit_qualification, > > uint8_t descriptor, bool is_write); > > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h > > index 1ad34c35eb..cbcd25f12c 100644 > > --- a/xen/include/public/domctl.h > > +++ b/xen/include/public/domctl.h > > @@ -1025,6 +1025,7 @@ struct xen_domctl_psr_cmt_op { > > #define XEN_DOMCTL_MONITOR_OP_DISABLE 1 > > #define XEN_DOMCTL_MONITOR_OP_GET_CAPABILITIES 2 > > #define XEN_DOMCTL_MONITOR_OP_EMULATE_EACH_REP 3 > > +#define XEN_DOMCTL_MONITOR_OP_CONTROL_REGISTERS 4 > > Could you please add a note that this is broken? Sure. Thanks, Tamas
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |