[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFC: AMD support for paging


  • To: "Tim Deegan" <tim@xxxxxxx>
  • From: "Andres Lagar-Cavilla" <andres@xxxxxxxxxxxxxxxx>
  • Date: Wed, 15 Feb 2012 09:09:11 -0800
  • Cc: olaf@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, Wei Wang <wei.wang2@xxxxxxx>, keir.xen@xxxxxxxxx, jbeulich@xxxxxxxx, adin@xxxxxxxxxxxxxx
  • Delivery-date: Wed, 15 Feb 2012 17:09:36 +0000
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=message-id :in-reply-to:references:date:subject:from:to:cc:reply-to :mime-version:content-type:content-transfer-encoding; q=dns; s= lagarcavilla.org; b=NcQtsKM66akZAfKGuFggjjem57wTLigVX1TzWqQP0Chl UhVLCKJ+glPp9l5VKabqH+n4AycKV9BGf+FkBWG31tQzjeM1HQ+4/DKUAgLYFDsL 1emTL8X0svyc6RHxbLJgsiQByuev0I9kAwFVE5yK38ozg96tDVe0IQkPqKmzZtE=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> At 08:06 -0800 on 15 Feb (1329293205), Andres Lagar-Cavilla wrote:
>> > Where will p2m access bit be stored? Please note that bit 52 - bit 58
>> > for pte must be zero for p2m_ram_rw pages. For iommu pte, only bit 63
>> > and bit 1 - bit 8 are free to use if PR bit = 1.
>>
>> Wei,
>>
>> Why *must* bits 52-58 be zero for p2m_ram_rw pages? Or is it just the
>> current convention?
>>
>> I propose limiting p2m type to bits 52-55, and storing p2m access on
>> bits
>> 56-58. So, p2m_ram_rw pages with a non-zero access would break the "must
>> be zero" requirement/convention.
>
> Eh, let's step back a bit from this.  The problem is that the IOMMU
> can't handle these bits being set, but the IOMMU can't handle paged-out
> or CoW memory either (because there's no way to cause the peripheral to
> retry a DMA that faulted).
>
> So can we just say that any VM that's going to use paging, sharing
> or access can't have a unified p2m and IOMMU table?  That's a bit ugly
> since the admin will have to make the choice, but it seems like the
> hardware is forcing us into it.
>
> Maybe the sensible choice is to default to _not_ sharing p2m and IOMMU
> tables, and have thet be something an expert user can turn on (and which
> disables the other features)?
>
> Should we have a hypervisor-level interlock between PCI passthrough and
> paging/sharing/access anyway?  Doing both is unlikely to lead to
> happiness.

That's basically my thinking, these are mutually exclusive conditions. And
I'm curious as to whether that is causing the triple fault that's killing
my domains. You seem to imply that p2m+IOMMU unification is turned on by
default -- that would crash PoD domains on AMD, wouldn't it?

Andres

>
> Tim.
>
>> Right now, without any considerations for paging, we're storing the p2m
>> type in bits 52-58 when PR=1. That p2m type can be non zero. p2m_ram_ro,
>> p2m_mmio_dm, p2m_logdirty, p2m_populate_on_demand are all par for the
>> course. Given your statement above, and table 14 in the IOMMU manual,
>> how
>> is this even working? Or is it not working?
>>
>> Would a violation of these rules cause a VMEXIT_SHUTDOWN?
>>
>> Thanks a lot for the info!
>> Andres
>> >
>> > Thanks,
>> > Wei
>> >
>> >> An alternative to enabling mem_access on AMD processors will be to
>> limit
>> >> the access types to 3 bits. This would force disabling support for
>> two
>> >> modes. I'm inclined to disable two out of X, W and WX. I don't think
>> >> they
>> >> make sense without R permissions.
>> >> Thanks!
>> >> Andres
>> >>
>> >>>
>> >>> Thanks,
>> >>> Wei
>> >>>
>> >>>>
>> >>>> Finally, the triple fault. Maybe I'm missing something obvious.
>> >>>> Comments
>> >>>> welcome.
>> >>>>
>> >>>> Patch inline below, thanks!
>> >>>> Andres
>> >>>>
>> >>>> Enable AMD support for paging.
>> >>>>
>> >>>> Signed-off-by: Andres Lagar-Cavilla<andres@xxxxxxxxxxxxxxxx>
>> >>>> Signed-off-by: Adin Scannell<adin@xxxxxxxxxxx>
>> >>>>
>> >>>> diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/mem_event.c
>> >>>> --- a/xen/arch/x86/mm/mem_event.c
>> >>>> +++ b/xen/arch/x86/mm/mem_event.c
>> >>>> @@ -537,10 +537,6 @@ int mem_event_domctl(struct domain *d, x
>> >>>>                if ( !hap_enabled(d) )
>> >>>>                    break;
>> >>>>
>> >>>> -            /* Currently only EPT is supported */
>> >>>> -            if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
>> >>>> -                break;
>> >>>> -
>> >>>>                rc = -EXDEV;
>> >>>>                /* Disallow paging in a PoD guest */
>> >>>>                if ( p2m->pod.entry_count )
>> >>>> diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/p2m-pt.c
>> >>>> --- a/xen/arch/x86/mm/p2m-pt.c
>> >>>> +++ b/xen/arch/x86/mm/p2m-pt.c
>> >>>> @@ -53,6 +53,20 @@
>> >>>>    #define P2M_BASE_FLAGS \
>> >>>>            (_PAGE_PRESENT | _PAGE_USER | _PAGE_DIRTY |
>> _PAGE_ACCESSED)
>> >>>>
>> >>>> +#ifdef __x86_64__
>> >>>> +/* l1e_from_pfn is not designed to have INVALID_MFN stored. The
>> >>>> 0xff..ff
>> >>>> + * value tramples over the higher-order bits used for flags (NX,
>> >>>> p2mt,
>> >>>> + * etc.) This happens for paging entries. Thus we do this
>> clip/unclip
>> >>>> + * juggle for l1 entries only (no paging superpages!) */
>> >>>> +#define EFF_MFN_WIDTH       (PADDR_BITS-PAGE_SHIFT) /* 40 bits */
>> >>>> +#define clipped_mfn(mfn)    ((mfn)&   ((1UL<<   EFF_MFN_WIDTH) -
>> 1))
>> >>>> +#define unclip_mfn(mfn)     ((clipped_mfn((mfn)) == INVALID_MFN) ?
>> \
>> >>>> +                                INVALID_MFN : (mfn))
>> >>>> +#else
>> >>>> +#define clipped_mfn(mfn)    (mfn)
>> >>>> +#define unclip_mfn(mfn)     (mfn)
>> >>>> +#endif /* __x86_64__ */
>> >>>> +
>> >>>>    static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
>> >>>>    {
>> >>>>        unsigned long flags;
>> >>>> @@ -77,6 +91,9 @@ static unsigned long p2m_type_to_flags(p
>> >>>>        case p2m_invalid:
>> >>>>        case p2m_mmio_dm:
>> >>>>        case p2m_populate_on_demand:
>> >>>> +    case p2m_ram_paging_out:
>> >>>> +    case p2m_ram_paged:
>> >>>> +    case p2m_ram_paging_in:
>> >>>>        default:
>> >>>>            return flags;
>> >>>>        case p2m_ram_ro:
>> >>>> @@ -168,7 +185,7 @@ p2m_next_level(struct p2m_domain *p2m, m
>> >>>>                                          shift, max)) )
>> >>>>            return 0;
>> >>>>
>> >>>> -    /* PoD: Not present doesn't imply empty. */
>> >>>> +    /* PoD/paging: Not present doesn't imply empty. */
>> >>>>        if ( !l1e_get_flags(*p2m_entry) )
>> >>>>        {
>> >>>>            struct page_info *pg;
>> >>>> @@ -384,8 +401,9 @@ p2m_set_entry(struct p2m_domain *p2m, un
>> >>>>                                       0, L1_PAGETABLE_ENTRIES);
>> >>>>            ASSERT(p2m_entry);
>> >>>>
>> >>>> -        if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) )
>> >>>> -            entry_content = l1e_from_pfn(mfn_x(mfn),
>> >>>> +        if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) ||
>> >>>> +             (p2mt == p2m_ram_paged) || (p2mt ==
>> p2m_ram_paging_in) )
>> >>>> +            entry_content = l1e_from_pfn(clipped_mfn(mfn_x(mfn)),
>> >>>>                                             p2m_type_to_flags(p2mt,
>> >>>> mfn));
>> >>>>            else
>> >>>>                entry_content = l1e_empty();
>> >>>> @@ -393,7 +411,7 @@ p2m_set_entry(struct p2m_domain *p2m, un
>> >>>>            if ( entry_content.l1 != 0 )
>> >>>>            {
>> >>>>                p2m_add_iommu_flags(&entry_content, 0,
>> >>>> iommu_pte_flags);
>> >>>> -            old_mfn = l1e_get_pfn(*p2m_entry);
>> >>>> +            old_mfn = unclip_mfn(l1e_get_pfn(*p2m_entry));
>> >>>>            }
>> >>>>            /* level 1 entry */
>> >>>>            p2m->write_p2m_entry(p2m, gfn, p2m_entry, table_mfn,
>> >>>> entry_content, 1);
>> >>>> @@ -615,11 +633,12 @@ pod_retry_l1:
>> >>>>                               sizeof(l1e));
>> >>>>
>> >>>>        if ( ret == 0 ) {
>> >>>> +        unsigned long l1e_mfn = unclip_mfn(l1e_get_pfn(l1e));
>> >>>>            p2mt = p2m_flags_to_type(l1e_get_flags(l1e));
>> >>>> -        ASSERT(l1e_get_pfn(l1e) != INVALID_MFN ||
>> !p2m_is_ram(p2mt));
>> >>>> +        ASSERT( (l1e_mfn != INVALID_MFN || !p2m_is_ram(p2mt)) ||
>> >>>> +                (l1e_mfn == INVALID_MFN&&   p2m_is_paging(p2mt))
>> );
>> >>>>
>> >>>> -        if ( p2m_flags_to_type(l1e_get_flags(l1e))
>> >>>> -             == p2m_populate_on_demand )
>> >>>> +        if ( p2mt == p2m_populate_on_demand )
>> >>>>            {
>> >>>>                /* The read has succeeded, so we know that the
>> mapping
>> >>>>                 * exits at this point.  */
>> >>>> @@ -641,7 +660,7 @@ pod_retry_l1:
>> >>>>            }
>> >>>>
>> >>>>            if ( p2m_is_valid(p2mt) || p2m_is_grant(p2mt) )
>> >>>> -            mfn = _mfn(l1e_get_pfn(l1e));
>> >>>> +            mfn = _mfn(l1e_mfn);
>> >>>>            else
>> >>>>                /* XXX see above */
>> >>>>                p2mt = p2m_mmio_dm;
>> >>>> @@ -783,18 +802,26 @@ pod_retry_l2:
>> >>>>    pod_retry_l1:
>> >>>>        if ( (l1e_get_flags(*l1e)&   _PAGE_PRESENT) == 0 )
>> >>>>        {
>> >>>> +        p2m_type_t l1t = p2m_flags_to_type(l1e_get_flags(*l1e));
>> >>>>            /* PoD: Try to populate */
>> >>>> -        if ( p2m_flags_to_type(l1e_get_flags(*l1e)) ==
>> >>>> p2m_populate_on_demand )
>> >>>> +        if ( l1t == p2m_populate_on_demand )
>> >>>>            {
>> >>>>                if ( q != p2m_query ) {
>> >>>>                    if ( !p2m_pod_demand_populate(p2m, gfn,
>> >>>> PAGE_ORDER_4K,
>> >>>> q) )
>> >>>>                        goto pod_retry_l1;
>> >>>>                } else
>> >>>>                    *t = p2m_populate_on_demand;
>> >>>> +        } else {
>> >>>> +            if ( p2m_is_paging(l1t) )
>> >>>> +            {
>> >>>> +                *t = l1t;
>> >>>> +                /* No need to unclip due to check below */
>> >>>> +                mfn = _mfn(l1e_get_pfn(*l1e));
>> >>>> +            }
>> >>>>            }
>> >>>>
>> >>>>            unmap_domain_page(l1e);
>> >>>> -        return _mfn(INVALID_MFN);
>> >>>> +        return (l1t == p2m_ram_paging_out) ? mfn :
>> _mfn(INVALID_MFN);
>> >>>>        }
>> >>>>        mfn = _mfn(l1e_get_pfn(*l1e));
>> >>>>        *t = p2m_flags_to_type(l1e_get_flags(*l1e));
>> >>>> @@ -914,7 +941,7 @@ static void p2m_change_type_global(struc
>> >>>>                        flags = l1e_get_flags(l1e[i1]);
>> >>>>                        if ( p2m_flags_to_type(flags) != ot )
>> >>>>                            continue;
>> >>>> -                    mfn = l1e_get_pfn(l1e[i1]);
>> >>>> +                    mfn = unclip_mfn(l1e_get_pfn(l1e[i1]));
>> >>>>                        gfn = i1 + (i2 + (i3
>> >>>>    #if CONFIG_PAGING_LEVELS>= 4
>> >>>>                                         + (i4 * L3_PAGETABLE_ENTRIES)
>> >>>> @@ -923,7 +950,7 @@ static void p2m_change_type_global(struc
>> >>>>                               * L2_PAGETABLE_ENTRIES) *
>> >>>> L1_PAGETABLE_ENTRIES;
>> >>>>                        /* create a new 1le entry with the new type
>> */
>> >>>>                        flags = p2m_type_to_flags(nt, _mfn(mfn));
>> >>>> -                    l1e_content = l1e_from_pfn(mfn, flags);
>> >>>> +                    l1e_content = l1e_from_pfn(clipped_mfn(mfn),
>> >>>> flags);
>> >>>>                        p2m->write_p2m_entry(p2m, gfn,&l1e[i1],
>> >>>>                                             l1mfn, l1e_content, 1);
>> >>>>                    }
>> >>>> @@ -1073,7 +1100,7 @@ long p2m_pt_audit_p2m(struct p2m_domain
>> >>>>                                    entry_count++;
>> >>>>                                continue;
>> >>>>                            }
>> >>>> -                        mfn = l1e_get_pfn(l1e[i1]);
>> >>>> +                        mfn = unclip_mfn(l1e_get_pfn(l1e[i1]));
>> >>>>                            ASSERT(mfn_valid(_mfn(mfn)));
>> >>>>                            m2pfn = get_gpfn_from_mfn(mfn);
>> >>>>                            if ( m2pfn != gfn&&
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Xen-devel mailing list
>> >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> >>>> http://lists.xensource.com/xen-devel
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>>
>>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.