|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] XenGT is still regressed on master
On 08/03/2019 11:55, Jan Beulich wrote:
> If so, my first reaction is to blame the kernel module:
> Machine state (of the VM) may not change while processing
> a write, other than to carry out the _direct_ effects of the write. I
> don't think a p2m type change is supposed to be occurring as a side
> effect.
I disagree. Anything may change during processing of IOREQ and it's
hypervisor that shouldn't make any assumptions on the machine state
before and after the request got sent.
>> The bug could be mitigated by the following patch but since it's you who
>> introduced this helper you might have better ideas how to avoid the
>> problem in a clean way here.
>>
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -1139,13 +1139,11 @@ static int linear_write(unsigned long addr,
>> unsigned int bytes, void *p_data,
>> {
>> unsigned int offset, part1;
>>
>> - case HVMTRANS_okay:
>> - return X86EMUL_OKAY;
>> -
>> case HVMTRANS_bad_linear_to_gfn:
>> x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
>> return X86EMUL_EXCEPTION;
>>
>> + case HVMTRANS_okay:
>> case HVMTRANS_bad_gfn_to_mfn:
>> offset = addr & ~PAGE_MASK;
>> if ( offset + bytes <= PAGE_SIZE )
>
> This is (I'm inclined to say "of course") not an appropriate change in
> the general case: Getting back HVMTRANS_okay means the write
> was carried out, and hence it shouldn't be tried to be carried out a
> 2nd time.
>
> I take it that changing the kernel driver would at best be sub-optimal
> though, so a hypervisor-only fix would be better.
Yes, changing the kernel module is very undesirable for many reasons.
> Assuming the p2m type change arrives via XEN_DMOP_set_mem_type,
> I think what we need to do is delay the actual change until no ioreq
> is pending anymore, kind of like the VM event subsystem delays
> certain CR and MSR writes until VM entry time. In this situation we'd
> then further have to restrict the number of such pending changes,
> because we need to record the request(s) somewhere. Question is
> how many of such bufferable requests would be enough: Unless we
> wanted to enforce that only ordinary aligned writes (and rmw-s) can
> get us into this state, apart from writes crossing page boundaries we
> may need to be able to cope with AVX512's scatter insns.
>
> The other alternative I can think of would be to record the specific
> failure case (finding p2m_ioreq_server) in the first pass, and act upon
> it (instead of your mitigation above) during the retry. At the first
> glance it would seem more cumbersome to do the recording at that
> layer though.
I like the latter suggestion more. Seems less invasive and prone to
regressions. I'd like to try to implement it. Although I think the
hypervisor check should be more general: like if IOREQ is in progress
don't try to got through fast-path and re-enter IOREQ completion path.
What if we just check !hvm_ioreq_needs_completion() before returning
X86EMUL_OKAY i.e. fall through to the bad_gfn_to_mfn case if that check
fails as Paul suggested?
Igor
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |