Xen project Mailing List

Re: [Xen-devel] XenGT is still regressed on master

From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>

Date: Fri, 8 Mar 2019 14:25:29 +0000

Cc: Juergen Gross <jgross@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul.durrant@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 08 Mar 2019 14:26:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08/03/2019 11:55, Jan Beulich wrote: > If so, my first reaction is to blame the kernel module: > Machine state (of the VM) may not change while processing > a write, other than to carry out the _direct_ effects of the write. I > don't think a p2m type change is supposed to be occurring as a side > effect. I disagree. Anything may change during processing of IOREQ and it's hypervisor that shouldn't make any assumptions on the machine state before and after the request got sent. >> The bug could be mitigated by the following patch but since it's you who >> introduced this helper you might have better ideas how to avoid the >> problem in a clean way here. >> >> --- a/xen/arch/x86/hvm/emulate.c >> +++ b/xen/arch/x86/hvm/emulate.c >> @@ -1139,13 +1139,11 @@ static int linear_write(unsigned long addr, >> unsigned int bytes, void *p_data, >> { >> unsigned int offset, part1; >> >> - case HVMTRANS_okay: >> - return X86EMUL_OKAY; >> - >> case HVMTRANS_bad_linear_to_gfn: >> x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt); >> return X86EMUL_EXCEPTION; >> >> + case HVMTRANS_okay: >> case HVMTRANS_bad_gfn_to_mfn: >> offset = addr & ~PAGE_MASK; >> if ( offset + bytes <= PAGE_SIZE ) > > This is (I'm inclined to say "of course") not an appropriate change in > the general case: Getting back HVMTRANS_okay means the write > was carried out, and hence it shouldn't be tried to be carried out a > 2nd time. > > I take it that changing the kernel driver would at best be sub-optimal > though, so a hypervisor-only fix would be better. Yes, changing the kernel module is very undesirable for many reasons. > Assuming the p2m type change arrives via XEN_DMOP_set_mem_type, > I think what we need to do is delay the actual change until no ioreq > is pending anymore, kind of like the VM event subsystem delays > certain CR and MSR writes until VM entry time. In this situation we'd > then further have to restrict the number of such pending changes, > because we need to record the request(s) somewhere. Question is > how many of such bufferable requests would be enough: Unless we > wanted to enforce that only ordinary aligned writes (and rmw-s) can > get us into this state, apart from writes crossing page boundaries we > may need to be able to cope with AVX512's scatter insns. > > The other alternative I can think of would be to record the specific > failure case (finding p2m_ioreq_server) in the first pass, and act upon > it (instead of your mitigation above) during the retry. At the first > glance it would seem more cumbersome to do the recording at that > layer though. I like the latter suggestion more. Seems less invasive and prone to regressions. I'd like to try to implement it. Although I think the hypervisor check should be more general: like if IOREQ is in progress don't try to got through fast-path and re-enter IOREQ completion path. What if we just check !hvm_ioreq_needs_completion() before returning X86EMUL_OKAY i.e. fall through to the bad_gfn_to_mfn case if that check fails as Paul suggested? Igor _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.