[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 6/9] x86/SVM: Add AVIC vmexit handlers



On Mon, Dec 12, 2016 at 05:34:29PM +0700, Suravee Suthikulpanit wrote:
> Hi Konrad,
> 
> Thanks for review comments.
> 
> On 10/14/2016 10:20 PM, Konrad Rzeszutek Wilk wrote:
> > On Mon, Sep 19, 2016 at 12:52:45AM -0500, Suravee Suthikulpanit wrote:
> > > AVIC introduces two #vmexit handlers:
> > > 
> > > +         * IPIs when the specified Message Type is Fixed
> > > +         * (also known as fixed delivery mode) and
> > > +         * the Trigger Mode is edge-triggered. The hardware
> > > +         * also supports self and broadcast delivery modes
> > > +         * specified via the Destination Shorthand(DSH)
> > > +         * field of the ICRL. Logical and physical APIC ID
> > > +         * formats are supported. All other IPI types cause
> > > +         * a #VMEXIT, which needs to emulated.
> > > +         */
> > > +        vlapic_reg_write(v, APIC_ICR2, icrh);
> > > +        vlapic_reg_write(v, APIC_ICR, icrl);
> > > +        break;
> > > +    case AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN:
> > > +    {
> > > +        /*
> > > +         * At this point, we expect that the AVIC HW has already
> > > +         * set the appropriate IRR bits on the valid target
> > > +         * vcpus. So, we just need to kick the appropriate vcpu.
> > 
> > Is there a pattern to this? Meaning does the CPU go sequentially
> > through the logical APIC ids - which (if say the guest is using
> > logical APIC ids which gives you pretty much 1-1 to physical) - means
> > that this VMEXIT would occur in a sequence of the vCPUs that are not
> > running?
> 
> Not sure if I follow this part. Typically, the current vcpu (the one that
> handles this #vmexit) is the one initiating IPI. Here we check the
> destination and try to kick each one of the matching destination.
> 
> > Because if so, then ..
> > > +
> > > +        for_each_vcpu ( d, c )
> > 
> > .. you could optimize this a bit and start at vCPU+1 (since you know
> > that this current vCPU most certainyl got the vCPU) ?
> > 
> 
> But the current (running) vcpu is not necessary the d->vcpu[0]. Do you mean
> checking if "c == current" in this loop and skip the current vcpu?

Yes.
> 
> > > +        break;
> > > +    case AVIC_INCMP_IPI_ERR_INV_BK_PAGE:
> > > +        dprintk(XENLOG_ERR,
> > > +                "SVM: %s: Invalid bk page (icr=%#08x:%08x, idx=%u)\n",
> > > +                __func__, icrh, icrl, index);
> > 
> > That should never happen right? If it does that means we messed up the
> > allocation somehow. If so, should we disable AVIC for this guest
> > completly and log this error?
> 
> I think it depends on when this happens. It might be hard to determine the
> state of all VCPUs at that point. Shouldn't we just crash the domain
> (probably for AVIC_INCMP_IPI_ERR_INV_TARGET and default case as well)?

That would be much better
> 
> > > [... in arch/x86/hvm/svm/avic.c: avic_handle_dfr_update()]
> > > +    u32 mod = (dfr >> 28) & 0xf;
> > > +
> > > +    /*
> > > +     * We assume that all local APICs are using the same type.
> > > +     * If this changes, we need to flush the AVIC logical
> > > +     * APID id table.
> > > +     */
> > > +    if ( d->ldr_mode == mod )
> > > +        return 0;
> > > +
> > > +    clear_domain_page(_mfn(d->avic_log_ait_mfn));
> > 
> > Whoa. What if another vCPU is using this (aka handling another AVIC
> > access?)?
> 
> Generally, at least in Linux case, I can see that the kernel switch APIC
> routing from cluster to flat early in the boot process prior to setting LDR
> and bringing up the rest of the AP cores). Do you know of any cases where
> the guest OS could be switching the APIC routing mode while the AP cores are
> running?

Put yourself in the shoes of an attacker. IF there is some way we can
screw up the hypervisor the better. Not sure if clearing this page
would lead us in the hypervisor to crash or such.

> 
> > I see that that the 'V' bit would be cleared so the CPU
> > would trap .. (thought that depends right - the clear_domain_page is
> > being done in a sequence so some of the later entries could be valid
> > while the CPU is clearing it).
> 
> Which "V" bit are you referring to here? And when is it cleared?

I sadly don't remember :-(
> 
> > Perhaps you should iterate over all the 'V' bit first (to clear them)
> > and then clear the page using the clear_domain_page?
> > Or better - turn of AVIC first (for the guest), until the page has been 
> > cleared?
> 
> What about adding a check to see if DFR is changed after LDR has already
> been setup and throw some error or warning message for now?

Sure.
> 
> > > [... in arch/x86/hvm/svm/avic.c: svm_avic_vmexit_do_noaccel()]
> > > +            BUG();
> > > +        avic_unaccel_trap_write(v);
> > 
> > Say the values are all messed up (the guest provides the wrong
> > APIC ID).. Shouldn't we inject some type of exception in the guest?
> > (Like the hardware would do? Actually - would the hardware send any type
> > of interrupt if one messes up with the APIC? Or is it that it sets an
> > error bit in the APIC?)
> 
> IIUC, typically, APIC generates APIC error interrupts and set the APIC Error
> Status Register (ESR) when detect errors during interrupt handling. However,
> if the APIC registers are not programmed correctly, I think the system would
> just fail to boot and go into undefined state.


As long as we don't crash the hypervisor I suppose that is OK.
> 
> > > [...]
> > > +        if ( !rw )
> > > +            *avic_get_bk_page_entry(v, offset) = vlapic_read_aligned(
> > > +                                                        vcpu_vlapic(v), 
> > > offset);
> > > +
> > > +        hvm_mem_access_emulate_one(EMUL_KIND_NORMAL, TRAP_invalid_op,
> > > +                                       HVM_DELIVER_NO_ERROR_CODE);
> > 
> > So we update the backing page .. and then inject an invalid_op in the
> > guest? Is that how hardware does it?
> 
> Looking into the hvm_mem_access_emulate_one(), my understanding is that this
> emulates the mem access instruction (e.g. to read/write APIC register), and
> inject TRAP_invalid_op when the emulation fails (i.e. X86EMUL_UNHANDLEABLE).
> Am I missing something here?

Nope. Just me misremembering it.

> 
> Thanks,
> Suravee

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.