[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V4 03/15] x86/pv: Rewrite how building PV dom0 handles domheap mappings



Hi,

I've been trying to run this series for a while, but it crashes very
frequentyly starting from the patch that generalizes the mapcache. I think I've
tracked it down to this patch.

On Mon Nov 11, 2024 at 1:11 PM GMT, Elias El Yandouzi wrote:
> From: Hongyan Xia <hongyxia@xxxxxxxxxx>
>
> Building a PV dom0 is allocating from the domheap but uses it like the
> xenheap. Use the pages as they should be.
>
> Signed-off-by: Hongyan Xia <hongyxia@xxxxxxxxxx>
> Signed-off-by: Julien Grall <jgrall@xxxxxxxxxx>
> Signed-off-by: Elias El Yandouzi <eliasely@xxxxxxxxxx>
>
> ----
>     Changes in V4:
>         * Reduce the scope of l{1,2,4}start_mfn variables
>         * Make the macro `UNMAP_MAP_AND_ADVANCE` return the new virtual 
> address
>
>     Changes in V3:
>         * Fold following patch 'x86/pv: Map L4 page table for shim domain'
>
>     Changes in V2:
>         * Clarify the commit message
>         * Break the patch in two parts
>
>     Changes since Hongyan's version:
>         * Rebase
>         * Remove spurious newline
>
> diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
> index 18b7a3e4e025..b03df609cadb 100644
> --- a/xen/arch/x86/pv/dom0_build.c
> +++ b/xen/arch/x86/pv/dom0_build.c
> @@ -382,6 +382,7 @@ static int __init dom0_construct(struct domain *d,
>      l3_pgentry_t *l3tab = NULL, *l3start = NULL;
>      l2_pgentry_t *l2tab = NULL, *l2start = NULL;
>      l1_pgentry_t *l1tab = NULL, *l1start = NULL;
> +    mfn_t l3start_mfn = INVALID_MFN;
>  
>      /*
>       * This fully describes the memory layout of the initial domain. All
> @@ -719,22 +720,34 @@ static int __init dom0_construct(struct domain *d,
>          v->arch.pv.event_callback_cs    = FLAT_COMPAT_KERNEL_CS;
>      }
>  
> +#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) ({  \
> +    do {                                                    \
> +        unmap_domain_page(virt_var);                        \
> +        mfn_var = maddr_to_mfn(maddr);                      \
> +        maddr += PAGE_SIZE;                                 \
> +        virt_var = map_domain_page(mfn_var);                \
> +    } while ( false );                                      \
> +    virt_var;                                               \
> +})
> +
>      if ( !compat )
>      {
> +        mfn_t l4start_mfn;
>          maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
> -        l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> +        l4tab = UNMAP_MAP_AND_ADVANCE(l4start_mfn, l4start, mpt_alloc);

In here l4start is mapped on the idle domain perdomain area, but...

>          clear_page(l4tab);
> -        init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
> -                          d, INVALID_MFN, true);
> -        v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
> +        init_xen_l4_slots(l4tab, l4start_mfn, d, INVALID_MFN, true);
> +        v->arch.guest_table = pagetable_from_mfn(l4start_mfn);
>      }
>      else
>      {
>          /* Monitor table already created by switch_compat(). */
> -        l4start = l4tab = __va(pagetable_get_paddr(v->arch.guest_table));
> +        mfn_t l4start_mfn = pagetable_get_mfn(v->arch.guest_table);
> +        l4start = l4tab = map_domain_page(l4start_mfn);
>          /* See public/xen.h on why the following is needed. */
>          maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
>          l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> +        UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
>      }
>  
>      l4tab += l4_table_offset(v_start);
> @@ -743,15 +756,17 @@ static int __init dom0_construct(struct domain *d,
>      {
>          if ( !((unsigned long)l1tab & (PAGE_SIZE-1)) )
>          {
> +            mfn_t l1start_mfn;
>              maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l1_page_table;
> -            l1start = l1tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> +            l1tab = UNMAP_MAP_AND_ADVANCE(l1start_mfn, l1start, mpt_alloc);
>              clear_page(l1tab);
>              if ( count == 0 )
>                  l1tab += l1_table_offset(v_start);
>              if ( !((unsigned long)l2tab & (PAGE_SIZE-1)) )
>              {
> +                mfn_t l2start_mfn;
>                  maddr_to_page(mpt_alloc)->u.inuse.type_info = 
> PGT_l2_page_table;
> -                l2start = l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> +                l2tab = UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, 
> mpt_alloc);
>                  clear_page(l2tab);
>                  if ( count == 0 )
>                      l2tab += l2_table_offset(v_start);
> @@ -761,19 +776,19 @@ static int __init dom0_construct(struct domain *d,
>                      {
>                          maddr_to_page(mpt_alloc)->u.inuse.type_info =
>                              PGT_l3_page_table;
> -                        l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> +                        UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, 
> mpt_alloc);
>                      }
>                      l3tab = l3start;
>                      clear_page(l3tab);
>                      if ( count == 0 )
>                          l3tab += l3_table_offset(v_start);
> -                    *l4tab = l4e_from_paddr(__pa(l3start), L4_PROT);
> +                    *l4tab = l4e_from_mfn(l3start_mfn, L4_PROT);
>                      l4tab++;
>                  }
> -                *l3tab = l3e_from_paddr(__pa(l2start), L3_PROT);
> +                *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);
>                  l3tab++;
>              }
> -            *l2tab = l2e_from_paddr(__pa(l1start), L2_PROT);
> +            *l2tab = l2e_from_mfn(l1start_mfn, L2_PROT);
>              l2tab++;
>          }
>          if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
> @@ -792,27 +807,32 @@ static int __init dom0_construct(struct domain *d,
>  
>      if ( compat )
>      {
> -        l2_pgentry_t *l2t;
> -
>          /* Ensure the first four L3 entries are all populated. */
>          for ( i = 0, l3tab = l3start; i < 4; ++i, ++l3tab )
>          {
>              if ( !l3e_get_intpte(*l3tab) )
>              {
> +                mfn_t l2start_mfn;
>                  maddr_to_page(mpt_alloc)->u.inuse.type_info = 
> PGT_l2_page_table;
> -                l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> -                clear_page(l2tab);
> -                *l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT);
> +                UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
> +                clear_page(l2start);
> +                *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);
>              }
>              if ( i == 3 )
>                  l3e_get_page(*l3tab)->u.inuse.type_info |= PGT_pae_xen_l2;
>          }
>  
> -        l2t = map_l2t_from_l3e(l3start[3]);
> -        init_xen_pae_l2_slots(l2t, d);
> -        unmap_domain_page(l2t);
> +        UNMAP_DOMAIN_PAGE(l2start);
> +        l2start = map_l2t_from_l3e(l3start[3]);
> +        init_xen_pae_l2_slots(l2start, d);
>      }
>  
> +#undef UNMAP_MAP_AND_ADVANCE
> +
> +    UNMAP_DOMAIN_PAGE(l1start);
> +    UNMAP_DOMAIN_PAGE(l2start);
> +    UNMAP_DOMAIN_PAGE(l3start);

... l4start is not unmapped here. This is a problem, because we're about to
change the page tables into dom0's and start using its mapcache.

IMO, we should be unmapping here, and remapping in dom0's context. Otherwise
l4start becomes a transiently stale pointer. Any remaining pointer obtained via
map_domain_page() is a dangling pointer after the mapcache+pagetable switch.

> +
>      /* Pages that are part of page tables must be read only. */
>      mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages, 
> &flush_flags);
>  
> @@ -987,6 +1007,8 @@ static int __init dom0_construct(struct domain *d,
>          pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, 
> vconsole_start,
>                            vphysmap_start, si);
>  
> +    UNMAP_DOMAIN_PAGE(l4start);

As it is, this unmap is operating on the wrong mapcache, I think. I don't quite
understand why I see intermittent boot crashes and not constant ones, but this
seems like a bug.

What we want, I think, is:

  1. Increase the scope of l4start_mfn to be function-level.
  2. Do UNMAP_DOMAIN_PAGE(l4start) along with l1start, l2start and l3start.
  3. Include a pair of map_domain_page() and UNMAP_DOMAIN_PAGE() within the
     conditional, surrounding pv_shim_setup_dom.

> +
>  #ifdef CONFIG_COMPAT
>      if ( compat )
>          xlat_start_info(si, pv_shim ? XLAT_start_info_console_domU

I'll keep testing it in case I missed something, but this seems to work.

Cheers,
Alejandro



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.