[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 05/10] x86/HVM/SVM: Add AVIC initialization code



On 31/12/2016 05:45, Suravee Suthikulpanit wrote:
> diff --git a/xen/arch/x86/hvm/svm/avic.c b/xen/arch/x86/hvm/svm/avic.c
> new file mode 100644
> index 0000000..b62f982
> --- /dev/null
> +++ b/xen/arch/x86/hvm/svm/avic.c
> @@ -0,0 +1,232 @@
> +/*
> + * avic.c: implements AMD Advance Virtual Interrupt Controller (AVIC) support
> + * Copyright (c) 2016, Advanced Micro Devices, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/domain_page.h>
> +#include <xen/sched.h>
> +#include <xen/stdbool.h>
> +#include <asm/acpi.h>
> +#include <asm/apicdef.h>
> +#include <asm/event.h>
> +#include <asm/hvm/emulate.h>
> +#include <asm/hvm/nestedhvm.h>
> +#include <asm/hvm/support.h>
> +#include <asm/hvm/svm/avic.h>
> +#include <asm/hvm/vlapic.h>
> +#include <asm/p2m.h>
> +#include <asm/page.h>
> +
> +/*
> + * Note: Current max index allowed for physical APIC ID table is 255.
> + */
> +#define AVIC_PHY_APIC_ID_MAX    0xFF
> +
> +#define AVIC_DOORBELL           0xc001011b
> +
> +#define AVIC_HPA_SHIFT  12
> +#define AVIC_HPA_MASK           (((1ULL << 40) - 1) << AVIC_HPA_SHIFT)

What does this mask represent?  I have spotted a truncation bug below,
but how to fix the issue depends on the meaning of the mask.

The only limits I can see in the spec is that the host physical
addresses must be present in system RAM, which presumably means they
should follow maxphysaddr like everything else?

> +#define AVIC_VAPIC_BAR_MASK     AVIC_HPA_MASK
> +
> +/*
> + * Note:
> + * Currently, svm-avic mode is not supported with nested virtualization.
> + * Therefore, it is not yet currently enabled by default. Once the support
> + * is in-place, this should be enabled by default.
> + */
> +bool svm_avic = 0;
> +boolean_param("svm-avic", svm_avic);

We are trying to avoid the proliferation of top-level options.  Please
could you introduce a "svm=" option which takes avic as a sub-boolean,
similar to how iommu= currently works?

> +
> +static struct avic_phy_apic_id_ent *
> +avic_get_phy_apic_id_ent(const struct vcpu *v, unsigned int index)

The Physical APIC table is per-domain, not per-cpu according to the
spec.  This function should take a struct domain, although I don't think
you need it at all (see below).

> +{
> +    struct avic_phy_apic_id_ent *avic_phy_apic_id_table;
> +    struct svm_domain *d = &v->domain->arch.hvm_domain.svm;
> +
> +    if ( !d->avic_phy_apic_id_table_mfn )
> +        return NULL;
> +
> +    /*
> +    * Note: APIC ID = 0xff is used for broadcast.
> +    *       APIC ID > 0xff is reserved.
> +    */
> +    if ( index >= 0xff )
> +        return NULL;
> +
> +    avic_phy_apic_id_table = mfn_to_virt(d->avic_phy_apic_id_table_mfn);

This is unsafe and will break on hosts with more than 5TB of RAM.

As you allocate avic_phy_apic_id_table_mfn with alloc_domheap_page(),
you must use {map,unmap}_domain_page() to get temporary mappings (which
will use mfn_to_virt() in the common case), or use
{map,unmap}_domain_page_global() to get permanent mappings.

As the values in here need modifying across a schedule, I would suggest
making a global mapping at create time, and storing the result in a
properly-typed pointer.

> +
> +    return &avic_phy_apic_id_table[index];
> +}
> +
> +int svm_avic_dom_init(struct domain *d)
> +{
> +    int ret = 0;
> +    struct page_info *pg;
> +    unsigned long mfn;

mfn_t please, which would also highlight the type error in svm_domain.

> +
> +    if ( !svm_avic )
> +        return 0;

|| !has_vlapic(d)

HVMLite domains may legitimately be configured without an LAPIC at all.

> +
> +    /*
> +     * Note:
> +     * AVIC hardware walks the nested page table to check permissions,
> +     * but does not use the SPA address specified in the leaf page
> +     * table entry since it uses  address in the AVIC_BACKING_PAGE pointer
> +     * field of the VMCB. Therefore, we set up a dummy page for APIC _mfn(0).
> +     */
> +    if ( !d->arch.hvm_domain.svm.avic_access_page_done )
> +    {
> +        set_mmio_p2m_entry(d, paddr_to_pfn(APIC_DEFAULT_PHYS_BASE),
> +                           _mfn(0), PAGE_ORDER_4K,
> +                           p2m_get_hostp2m(d)->default_access);
> +        d->arch.hvm_domain.svm.avic_access_page_done = true;

I don't see the purpose of this avic_access_page_done boolean, even in
later patches.

> +    }
> +
> +    /* Init AVIC logical APIC ID table */
> +    pg = alloc_domheap_page(d, MEMF_no_owner);
> +    if ( !pg )
> +    {
> +        gdprintk(XENLOG_ERR,
> +                "%d: AVIC logical APIC ID table could not be allocated.\n",
> +                d->domain_id);

Do we really need to have a printk() in the -ENOMEM case?  I'd
personally not bother.

> +        ret = -ENOMEM;
> +        goto err_out;
> +    }
> +    mfn = page_to_mfn(pg);
> +    clear_domain_page(_mfn(mfn));
> +    d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn = mfn;
> +
> +    /* Init AVIC physical APIC ID table */
> +    pg = alloc_domheap_page(d, MEMF_no_owner);
> +    if ( !pg )
> +    {
> +        gdprintk(XENLOG_ERR,
> +                "%d: AVIC physical APIC ID table could not be allocated.\n",
> +                d->domain_id);
> +        ret = -ENOMEM;
> +        goto err_out;
> +    }
> +    mfn = page_to_mfn(pg);
> +    clear_domain_page(_mfn(mfn));
> +    d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn = mfn;
> +
> +    return ret;
> + err_out:
> +    svm_avic_dom_destroy(d);
> +    return ret;
> +}
> +
> +void svm_avic_dom_destroy(struct domain *d)
> +{
> +    if ( !svm_avic )
> +        return;
> +
> +    if ( d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn )
> +    {
> +        
> free_domheap_page(mfn_to_page(d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn));
> +        d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn = 0;
> +    }
> +
> +    if ( d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn )
> +    {
> +        
> free_domheap_page(mfn_to_page(d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn));
> +        d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn = 0;
> +    }
> +}
> +
> +bool svm_avic_vcpu_enabled(const struct vcpu *v)
> +{
> +    const struct arch_svm_struct *s = &v->arch.hvm_svm;
> +    const struct vmcb_struct *vmcb = s->vmcb;
> +
> +    return svm_avic && vmcb->_vintr.fields.avic_enable;

This predicate should just be on avic_enable.  The svm_avic case should
prevent the avic_enable from being set in the first place.

> +}
> +
> +static inline u32 *
> +avic_get_bk_page_entry(const struct vcpu *v, u32 offset)
> +{
> +    const struct vlapic *vlapic = vcpu_vlapic(v);
> +    char *tmp;
> +
> +    if ( !vlapic || !vlapic->regs_page )
> +        return NULL;
> +
> +    tmp = (char *)page_to_virt(vlapic->regs_page);
> +    return (u32 *)(tmp + offset);

This is also unsafe over 5TB.  vlapic->regs is already a global mapping
which you should use.

Also, you should leave a comment by the allocation of regs_page that
AVIC depends on it being a full page allocation.

> +}
> +
> +void svm_avic_update_vapic_bar(const struct vcpu *v, uint64_t data)
> +{
> +    const struct arch_svm_struct *s = &v->arch.hvm_svm;
> +
> +    s->vmcb->avic_vapic_bar = data & AVIC_VAPIC_BAR_MASK;
> +    s->vmcb->cleanbits.fields.avic = 0;

While I can appreciate what you are trying to do here, altering the
physical address in IA32_APICBASE has never functioned properly under
Xen, as nothing updates the P2M.  Worse, with multi-vcpu guests, the use
of a single shared p2m makes this impossible to do properly.

I know it is non-architectural behaviour, but IMO vlapic_msr_set()
should be updated to raise #GP[0] when attempting to change the frame,
to make it fail obviously rather than having interrupts go missing.

Also I see that the Intel side (via vmx_vlapic_msr_changed()) makes the
same mistake.

> +}
> +
> +int svm_avic_init_vmcb(struct vcpu *v)
> +{
> +    paddr_t ma;
> +    u32 *apic_id_reg;
> +    struct arch_svm_struct *s = &v->arch.hvm_svm;
> +    struct vmcb_struct *vmcb = s->vmcb;
> +    struct svm_domain *d = &v->domain->arch.hvm_domain.svm;
> +    const struct vlapic *vlapic = vcpu_vlapic(v);
> +    struct avic_phy_apic_id_ent entry;
> +
> +    if ( !svm_avic )
> +        return 0;
> +
> +    if ( !vlapic || !vlapic->regs_page )
> +        return -EINVAL;

Somewhere you need a bounds check against the APIC ID being within AVIC
limits.

> +
> +    apic_id_reg = avic_get_bk_page_entry(v, APIC_ID);
> +    if ( !apic_id_reg )
> +        return -EINVAL;

This should be vlapic_read()...

> +
> +    s->avic_last_phy_id = avic_get_phy_apic_id_ent(v, *apic_id_reg >> 24);

And this, GET_xAPIC_ID(),

> +    if ( !s->avic_last_phy_id )
> +        return -EINVAL;

Why does a pointer into the physical ID page need storing in
arch_svm_struct?

> +
> +    vmcb->avic_bk_pg_pa = page_to_maddr(vlapic->regs_page) & AVIC_HPA_MASK;

This use of AVIC_HPA_MASK may truncate the the address, which is
definitely a problem.

If AVIC_HPA_MASK isn't just related to maxphysaddr, then you need to
pass appropriate memflags into the alloc_domheap_page() calls to get a
suitable page.

> +    ma = d->avic_log_apic_id_table_mfn;
> +    vmcb->avic_log_apic_id = (ma << PAGE_SHIFT) & AVIC_HPA_MASK;

This ma << PAGE_SHIFT also highlights the type error.  ma != mfn.

> +    ma = d->avic_phy_apic_id_table_mfn;
> +    vmcb->avic_phy_apic_id = (ma << PAGE_SHIFT) & AVIC_HPA_MASK;
> +    vmcb->avic_phy_apic_id |= AVIC_PHY_APIC_ID_MAX;

Surely this should be some calculation derived from d->max_vcpus?

> +
> +    entry = *(s->avic_last_phy_id);
> +    smp_rmb();
> +    entry.bk_pg_ptr = (vmcb->avic_bk_pg_pa & AVIC_HPA_MASK) >> 
> AVIC_HPA_SHIFT;
> +    entry.is_running = 0;
> +    entry.valid = 1;
> +    *(s->avic_last_phy_id) = entry;
> +    smp_wmb();

During domain creation, no guests are running, so you can edit this cleanly.

What values are actually being excluded here?  This, and other patches,
look like you are actually just trying to do a read_atomic(), modify,
write_atomic() update, rather than actually requiring ordering.

> +
> +    svm_avic_update_vapic_bar(v, APIC_DEFAULT_PHYS_BASE);
> +
> +    vmcb->_vintr.fields.avic_enable = 1;
> +
> +    return 0;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> @@ -1799,6 +1802,10 @@ static int svm_msr_write_intercept(unsigned int msr, 
> uint64_t msr_content)
>  
>      switch ( msr )
>      {
> +    case MSR_IA32_APICBASE:
> +        if ( svm_avic_vcpu_enabled(v) )
> +            svm_avic_update_vapic_bar(v, msr_content);
> +        break;

I think this is dead code.  MSR_IA32_APICBASE is handled in
hvm_msr_write_intercept(), and won't fall into the vendor hook.  If you
were to continue down this route, I would add another pi_ops hook, and
replace the VMX layering violation in vlapic_msr_set().

>      case MSR_IA32_SYSENTER_CS:
>      case MSR_IA32_SYSENTER_ESP:
>      case MSR_IA32_SYSENTER_EIP:
> diff --git a/xen/include/asm-x86/hvm/svm/avic.h 
> b/xen/include/asm-x86/hvm/svm/avic.h
> new file mode 100644
> index 0000000..16b433c
> --- /dev/null
> +++ b/xen/include/asm-x86/hvm/svm/avic.h
> @@ -0,0 +1,40 @@
> +#ifndef _SVM_AVIC_H_
> +#define _SVM_AVIC_H_
> +
> +enum avic_incmp_ipi_err_code {
> +    AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE,
> +    AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN,
> +    AVIC_INCMP_IPI_ERR_INV_TARGET,
> +    AVIC_INCMP_IPI_ERR_INV_BK_PAGE,
> +};
> +
> +struct __attribute__ ((__packed__))

Please include xen/compiler.h and use __packed

> +avic_log_apic_id_ent {
> +    u32 guest_phy_apic_id : 8;
> +    u32 res               : 23;
> +    u32 valid             : 1;
> +};
> +
> +struct __attribute__ ((__packed__))
> +avic_phy_apic_id_ent {
> +    u64 host_phy_apic_id  : 8;
> +    u64 res1              : 4;
> +    u64 bk_pg_ptr         : 40;

This field should have mfn in its name, as it has an implicit shift in
its definition.

> +    u64 res2              : 10;
> +    u64 is_running        : 1;
> +    u64 valid             : 1;
> +};
> <snip>
> diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h 
> b/xen/include/asm-x86/hvm/svm/vmcb.h
> index 43cb98e..d3d045f 100644
> --- a/xen/include/asm-x86/hvm/svm/vmcb.h
> +++ b/xen/include/asm-x86/hvm/svm/vmcb.h
> @@ -496,6 +496,24 @@ struct __packed vmcb_struct {
>  };
>  
>  struct svm_domain {
> +    /*
> +     * This per-domain table is used by the hardware to locate
> +     * the vAPIC backing page to be used to deliver interrupts
> +     * based on the guest physical APIC ID.
> +     */
> +    paddr_t avic_phy_apic_id_table_mfn;

paddrs and mfns differ by PAGE_SHIFT.  Please use mfn_t here.

> +
> +    /*
> +     * This per-domain table is used by the hardware to map
> +     * logically addressed interrupt requests (w/ guest logical APIC id)
> +     * to the guest physical APIC ID.
> +     */
> +    paddr_t avic_log_apic_id_table_mfn;
> +
> +    u32 avic_max_vcpu_id;
> +    bool avic_access_page_done;
> +    spinlock_t avic_ldr_mode_lock;

You introduce the spinlock here, but initialise it in the subsequent
patch.  I suspect that hunk wants moving into this patch?

~Andrew

> +    u32 avic_ldr_mode;
>  };
>  
>  struct arch_svm_struct {


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.