|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen: Improvements to domain_crash()
On 31/08/18 09:39, Jan Beulich wrote:
>>>> On 30.08.18 at 17:31, <andrew.cooper3@xxxxxxxxxx> wrote:
>> --- a/xen/arch/arm/mem_access.c
>> +++ b/xen/arch/arm/mem_access.c
>> @@ -293,12 +293,7 @@ bool p2m_mem_access_check(paddr_t gpa, vaddr_t gla,
>> const struct npfec npfec)
>> {
>> /* No listener */
>> if ( p2m->access_required )
>> - {
>> - gdprintk(XENLOG_INFO, "Memory access permissions failure, "
>> - "no vm_event listener VCPU %d, dom %d\n",
>> - v->vcpu_id, v->domain->domain_id);
>> - domain_crash(v->domain);
>> - }
>> + domain_crash(v->domain, "No vm_event listener\n");
> Which vCPU caused the issue is lost with this transformation.
It is not useful information. This error means "whatever tool you're
using in dom0 to partially turn on the mem_access subsystem didn't set
up the ring to begin with". There might even be a separate interlock to
prevent this condition happening in practice.
>
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -2078,19 +2078,15 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr)
>> rc = X86EMUL_OKAY;
>> break;
>>
>> - default:
>> - gdprintk(XENLOG_ERR, "invalid cr: %d\n", cr);
>> - goto exit_and_crash;
>> + default: /* VT-x/SVM intercept malfunction? */
>> + domain_crash(curr->domain, "Invalid cr %u\n", cr);
> "cr%u does not exist"? I in particular dislike the space between "cr"
> and the number.
This path is a bit odd, but its not that those CR's don't exist. Its
that their intercepts don't/haven't triggered (and in particular, AMD
has intercepts for a load of CR/DR registers which don't exist, and
don't trigger because #UD processing is handled internally first).
I'm happy to switch to "CR%u", but I don't think "does not exist" is the
right text to use.
I don't intend for this code tree to survive the plans to rework
intercepts in terms of emulation.
>
>> @@ -2315,12 +2307,17 @@ int hvm_set_cr3(unsigned long value, bool_t
>> may_defer)
>> if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) &&
>> (value != v->arch.hvm_vcpu.guest_cr[3]) )
>> {
>> + unsigned long gfn = paddr_to_pfn(value);
> Along the lines of your recent comment on one of my patches:
> Doesn't the comparison above need to ignore the upper half of
> the stored value? (Arguably not something we want to fix in this
> patch, but anyway.)
Oops - I'd forgotten about that aspect of things. That particular issue
I only noticed while reviewing your patch, which post-dates writing this
bit of code by several weeks.
>
>> @@ -2686,17 +2678,22 @@ static void *hvm_map_entry(unsigned long va, bool_t
>> *writable)
>> pfec = PFEC_page_present;
>> gfn = paging_gva_to_gfn(current, va, &pfec);
>> if ( pfec & (PFEC_page_paged | PFEC_page_shared) )
>> - goto fail;
>> + {
>> + domain_crash(current->domain, "Descriptor table is paged or
>> shared\n");
> Since a page can't be paged and shared at the same time, perhaps better
> to log just one of the two (whichever is applicable)?
I could format pfec into the message?
>
>> @@ -3719,7 +3716,7 @@ int hvm_descriptor_access_intercept(uint64_t exit_info,
>> descriptor, is_write);
>> }
>> else if ( !hvm_emulate_one_insn(is_sysdesc_access, "sysdesc access") )
>> - domain_crash(currd);
>> + domain_crash(currd, "sysdesc emulation failure\n");
> The string passed to hvm_emulate_one_insn() makes clear that
> some logging already occurs in the error case - do we really need
> the extra verbosity you add here?
I was trying to avoid using __domain_crash() anywhere, but that became
untenable.
If we are going to use __domain_crash(), I'd prefer it to be obviously
next to a printk() making it obvious that a message is logged. Now I
think about it, I've violated this expectation elsewhere, so think I'll
pull some bits out into a prerequisite patch.
Sadly in this case, half of the hvm_emulate_one_insn() want to raise an
exception and half want a crash on failure. I can't think of a nice
option here.
>
>> --- a/xen/arch/x86/hvm/intercept.c
>> +++ b/xen/arch/x86/hvm/intercept.c
>> @@ -44,7 +44,7 @@ static bool_t hvm_mmio_accept(const struct hvm_io_handler
>> *handler,
>> last = hvm_mmio_last_byte(p);
>> if ( last != first &&
>> !handler->mmio.ops->check(current, last) )
>> - domain_crash(current->domain);
>> + domain_crash(current->domain, "Fatal MMIO error\n");
> How about "%ps accepts %"PRIpaddr" but not %"PRIpaddr?
>
>> @@ -134,8 +134,10 @@ int hvm_process_io_intercept(const struct
>> hvm_io_handler *handler,
>>
>> if ( p->data_is_ptr )
>> {
>> - switch ( hvm_copy_to_guest_phys(p->data + step * i,
>> - &data, p->size, current) )
>> + enum hvm_translation_result res =
>> + hvm_copy_to_guest_phys(p->data + step * i,
>> + &data, p->size, current);
>> + switch ( res )
> Blank line above here please.
>
>> @@ -162,9 +166,12 @@ int hvm_process_io_intercept(const struct
>> hvm_io_handler *handler,
>> {
>> if ( p->data_is_ptr )
>> {
>> + enum hvm_translation_result res;
>> +
>> data = 0;
>> - switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
>> - p->size) )
>> + res = hvm_copy_from_guest_phys(&data, p->data + step * i,
>> + p->size);
>> + switch ( res )
> And here.
Ok for all.
>
>> @@ -2709,7 +2705,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>> {
>> gdprintk(XENLOG_ERR, "invalid VMCB state:\n");
>> svm_vmcb_dump(__func__, vmcb);
>> - domain_crash(v->domain);
>> + __domain_crash(v->domain);
> Perhaps the gdprintk() above then also wants to become gprintk()?
Good point, although the message is quite redundant overall. I'll try
and fix things up suitably in the prerequisite patch.
>> @@ -1381,18 +1380,21 @@ static void vmx_load_pdptrs(struct vcpu *v)
>> return;
>>
>> if ( (cr3 & 0x1fUL) && !hvm_pcid_enabled(v) )
>> - goto crash;
>> + {
>> + domain_crash(v->domain, "Bad cr3 %p\n", _p(cr3));
> I wonder about the use of _p() here, considering your earlier comment
> on my patch to this function that the upper 32 bits are to be ignored
> anyway.
Oh - force of habit, because this is by far the shortest and easiest way
to format a 64bit number. Fixing the 32bit-ness issues will alter how
the information gets printed.
>> --- a/xen/arch/x86/mm/p2m-pod.c
>> +++ b/xen/arch/x86/mm/p2m-pod.c
>> @@ -567,7 +567,7 @@ p2m_pod_decrease_reservation(struct domain *d, gfn_t
>> gfn, unsigned int order)
>> * impossible.
>> */
>> if ( order != 0 )
>> - domain_crash(d);
>> + domain_crash(d, "Fatal PoD error\n");
> I'm a little uncertain here whether it's relevant to have the multiple
> identical messages in the same functions here distinguishable, but
> perhaps simply add #1 etc to them?
TBH I hoping for suggestions on better errors. Most of these are behind
ASSERT_UNREACHABLE(), and I didn't think numbering them like this would
be terribly helpful.
We have an increasing pattern of ASSERT_UNREACHABLE(); domain_crash().
I think it might be better to wrap this up somehow and call
domain_crash() from the #UD handler so we can get a file/line reference
without impacting livepatchability, and get an offending backtrace in
release builds which hit the condition, without being fatal to the host
overall.
>
>> @@ -839,7 +839,7 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn,
>> mfn_t mfn,
>> if ( p2m_is_grant(ot) || p2m_is_foreign(ot) )
>> {
>> /* Really shouldn't be unmapping grant/foreign maps this way */
>> - domain_crash(d);
>> + domain_crash(d, "Attempting to implicitly unmap grant/foreign
>> frame\n");
> Drop the "ing"?
>
>> @@ -991,11 +991,8 @@ void p2m_change_type_range(struct domain *d,
>> if ( gfn < end )
>> rc = p2m->change_entry_type_range(p2m, ot, nt, gfn, end - 1);
>> if ( rc )
>> - {
>> - printk(XENLOG_G_ERR "Error %d changing Dom%d GFNs [%lx,%lx] from %d
>> to %d\n",
>> - rc, d->domain_id, start, end - 1, ot, nt);
>> - domain_crash(d);
>> - }
>> + domain_crash(d, "Error %d changing d%d GFNs [%lx,%lx] from %d to
>> %d\n",
>> + rc, d->domain_id, start, end - 1, ot, nt);
> You change Dom%d to d%d here.
>
>> @@ -1011,11 +1008,8 @@ void p2m_change_type_range(struct domain *d,
>> break;
>> }
>> if ( rc )
>> - {
>> - printk(XENLOG_G_ERR "Error %d manipulating Dom%d's log-dirty
>> ranges\n",
>> - rc, d->domain_id);
>> - domain_crash(d);
>> - }
>> + domain_crash(d, "Error %d manipulating Dom%d's log-dirty ranges\n",
>> + rc, d->domain_id);
> Why not also here? Or actually isn't logging the domain ID explicitly here
> quite pointless (redundant with what domain_crash() produces)?
I'll fix, and drop the redundant d%d.
>
>> --- a/xen/arch/x86/mm/shadow/common.c
>> +++ b/xen/arch/x86/mm/shadow/common.c
>> @@ -1954,12 +1954,9 @@ int sh_remove_write_access(struct domain *d, mfn_t
>> gmfn,
>> /* If this isn't a "normal" writeable page, the domain is trying to
>> * put pagetables in special memory of some kind. We can't allow that.
>> */
>> if ( (pg->u.inuse.type_info & PGT_type_mask) != PGT_writable_page )
>> - {
>> - printk(XENLOG_G_ERR "can't remove write access to mfn %"PRI_mfn
>> - ", type_info is %"PRtype_info "\n",
>> - mfn_x(gmfn), mfn_to_page(gmfn)->u.inuse.type_info);
>> - domain_crash(d);
>> - }
>> + domain_crash(d, "can't remove write access to mfn %"PRI_mfn
>> + ", type_info is %"PRtype_info"\n",
>> + mfn_x(gmfn), mfn_to_page(gmfn)->u.inuse.type_info);
> Here and below - perhaps better for the format strings to go all on one line,
> for grep-ability?
Its split on a formatted mfn so doesn't affect grepability. Without the
split, its verging on the unreasonably long.
>
>> @@ -2583,13 +2576,12 @@ static void sh_update_paging_modes(struct vcpu *v)
>>
>> if ( v != current && vcpu_runnable(v) )
>> {
>> - printk(XENLOG_G_ERR
>> - "Some third party (%pv) is changing this HVM
>> vcpu's"
>> - " (%pv) paging mode while it is running\n",
>> - current, v);
>> /* It's not safe to do that because we can't change
>> * the host CR3 for a running domain */
>> - domain_crash(v->domain);
>> + domain_crash(v->domain,
>> + "Some third party (%pv) is changing this "
>> + "HVM vcpu's (%pv) paging mode while it is "
>> + "running\n", current, v);
> Just "%pv is changing %pv's ..."?
Will do - that is far more concise.
>
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -8686,7 +8686,7 @@ x86_emulate(
>> gprintk(XENLOG_INFO, " stub: %"__stringify(MAX_INST_LEN)"ph\n",
>> stub.func);
>> generate_exception_if(stub_exn.info.fields.trapnr == EXC_UD, EXC_UD);
>> - domain_crash(current->domain);
>> + domain_crash(current->domain, "Fatal exception during emulation\n");
> Perhaps include the exception type?
Actually, this should become __domain_crash() because all information is
in the gprintk() above, but I'll bump it from WARNING to ERROR seeing as
we are crashing the domain.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |