[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Walking an HVM's shadow page tables and other memory management questions.



I took your suggestion and passed in from Windows code the PFN (indirectly) to be associated with the MFN found in the grant table entry referenced.

 

I modified  create_grant_host_mapping() in mm.c to have an additional IF statement based on a new flag I added:

 

   // RRC: new code.

    if ( flags & GNTMAP_contains_physaddr )

        return create_pfn_to_mfn_mapping(addr, frame, current);

 

The new routine does something similar to the XENMEM_add_to_physmap code.  This code appear to do what I wanted, which is to replace the mapping from GPFN to MFN.  When I run it, it does appear to change the mapping correctly as you can see from my debug statements:

 

// RRC: new routine to use with grant table map ref.

static int create_pfn_to_mfn_mapping(

    uint64_t addr, mfn_t mfn, struct vcpu *v)

{

    unsigned long prev_mfn, gpfn;

    struct domain *d = v->domain;

 

    // This call is only valid for translated domains.

    // The MFN specified must not be 0.

    if ( !shadow_mode_translate(d) || (mfn == 0) )

    {

        return -EINVAL;

    }

 

    // Get the frame number.

    gpfn = (unsigned long)(addr >> PAGE_SHIFT);

   

    LOCK_BIGLOCK(d);

 

    /* Remove previously mapped page if it was present. */

    prev_mfn = gmfn_to_mfn(d, gpfn);

 

    // RRC: debug

    gdprintk(XENLOG_WARNING, "create_pfn_to_mfn_mapping: pfn:%"PRIx64" prev MFN:0x%x\n",

             (u64)gpfn, (u32)prev_mfn);

   

    if ( mfn_valid(prev_mfn) )

    {

        if ( IS_XEN_HEAP_FRAME(mfn_to_page(prev_mfn)) )

        {

            gdprintk(XENLOG_WARNING, "removing xen heap frame\n");

           

            /* Xen heap frames are simply unhooked from this phys slot. */

            guest_physmap_remove_page(d, gpfn, prev_mfn);

        }               

        else

        {

            gdprintk(XENLOG_WARNING, "removing normal domain frame\n");

           

            /* Normal domain memory is freed, to avoid leaking memory. */

            guest_remove_page(d, gpfn);

        }

       

    }

 

    /* Map at new location. */

    guest_physmap_add_page(d, gpfn, mfn);

 

    UNLOCK_BIGLOCK(d);

    return 0;

}

 

 

XEN) mm.c:2643:d3 grant host mapping: pa:1697000 frame:0x14f0d4

(XEN) mm.c:2608:d3 create_pfn_to_mfn_mapping: pfn:1697 prev MFN:0x7e5f0

(XEN) mm.c:2621:d3 removing normal domain frame

(XEN) memory.c:164:d3 guest_remove_page GMFN:0x1697, MFN:0x7e5f0

(XEN) memory.c:181:d3 guest_remove_page type_info:0xe8000001, count_info:0x80000004

(XEN) memory.c:192:d3 guest_remove_page page_is_removable:0x0

(XEN) common.c:2194:d3 sh_remove_all_mappings gmfn:0x7e5f0

(XEN) common.c:3098:d3 sh_p2m_remove_page  removing gfn=0x1697 mfn=0x7e5f0

(XEN) common.c:2194:d3 sh_remove_all_mappings gmfn:0x7e5f0

(XEN) common.c:3137:d3 shadow_guest_physmap_add_page: gfn 0x1697, mfn 0x14f0d4

(XEN) common.c:3147:d3 shadow_guest_physmap_add_page: gfn 0x1697, omfn 0xffffffff

(XEN) common.c:3172:d3 shadow_guest_physmap_add_page: sh_mfn_to_gfn ogfn 0x1697, mfn 0x14f0d4

(XEN) common.c:3204:d3 shadow_guest_physmap_add_page: shadow_set_p2m_entry gfn=0x1697 -> mfn 0x14f0d4

(XEN) common.c:3209:d3 shadow_guest_physmap_add_page: set_gpfn_from_mfn gfn=0x1697 -> mfn 0x14f0d4

 

 

I then attempt to read the newly re-mapped page from Windows and I get an error message from get_page_from_l1e() in mm.c

 

    /* Foreign mappings into guests in shadow external mode don't

     * contribute to writeable mapping refcounts.  (This allows the

     * qemu-dm helper process in dom0 to map the domain's memory without

     * messing up the count of "real" writable mappings.) */

    okay = (((l1e_get_flags(l1e) & _PAGE_RW) &&

             !(unlikely(shadow_mode_external(d) && (d != current->domain))))

            ? get_page_and_type(page, d, PGT_writable_page)

            : get_page(page, d));

    if ( !okay )

    {

        MEM_LOG("Error getting mfn %lx (pfn %lx) from L1 entry %" PRIpte

                " for dom%d",

                mfn, get_gpfn_from_mfn(mfn),

                l1e_get_intpte(l1e), d->domain_id);

    }

 

It looks like get_page_and_type is returning 0.

 

(XEN) mm.c:633:d3 l1e_get_flags(l1e) =0x63, shadow_mode_external(d)= 0x4000, current->domain=0x3,get_page_and_type=0x0, get_page(page, d)=0x0

(XEN) mm.c:639:d3 Error getting mfn 14f0d4 (pfn 1697) from L1 entry 000000014f0d4063 for dom3

 

Domain 2 is the grantor and Domain 3 is the grantee in this example.  It appears to me that it is failing because dom3 is not the owner of the shared page

 

        if ( unlikely((x & PGC_count_mask) == 0) ||  /* Not allocated? */

             unlikely((nx & PGC_count_mask) == 0) || /* Count overflow? */

             unlikely(d != _domain) )                /* Wrong owner? */

 

Any suggestions?

 

 


From: Keir Fraser [mailto:keir@xxxxxxxxxxxxx]
Sent: Tuesday, July 03, 2007 5:04 AM
To: Roger Cruz; xen-devel@xxxxxxxxxxxxxxxxxxx
Cc: Tim Deegan
Subject: Re: [Xen-devel] Walking an HVM's shadow page tables and other memory management questions.

 

You are barking up the wrong tree by attempting to poke the mapping into a guest pte. After all, what would you poke? Guest PTEs address the guest-pseudo-physical space, in which the foreign page is not present.

You actually want to follow ia64’s lead here. When running in ‘auto-translate’ mode (i.e., on shadow page tables) then the guest address for a host mapping should not be interpreted as a virtual address but instead as a pseudo-physical address.

So you will be mapping a grant reference into the pseudo-physical space and then a guest PTE can map the appropriate pseudo-physical frame number in the usual way. The slightly tricky bit is working out how to encode a grant-mapping in the p2m table. My advice would be to use a page-not-present encoding (p2m table entries are the same format as page-table entries) as this then lets you define special encodings of your choice with most of the remaining bits.

Tim Deegan may be able to give more advice.

 -- Keir

On 2/7/07 21:25, "Roger Cruz" <rcruz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Hello,
 
I’m new to Xen and especially to the hypervisor code.  I’m working off a 3.0.4.1 base and have the following questions regarding the memory management code for an x86, 32-bit platform (capable of supporting PAE).  I’m doing some research into providing grant table hypercall support from a Windows 2003 HVM.  I have made all the necessary changes to allow the hypercall to make it into the hypervisor and execute the correct grant table ops.
 
I’m now testing the GNTTABOP_map_grant_ref with the GNTMAP_host_map and it correctly obtains the MFN from the grantor domain.  It then attempts to take the HVM host VA address (a windows kernel VA from the non-paged pool) and walk the guest’s page table to obtain the PFN.  I am building the hypervisor by simply typing “make xen” without any other configuration changes from a default source installation.
 
The first problem I encountered is that it appears the code assumes the guest to be in PAE mode.  In particular, guest_walk_tables() in xen/arch/x86/mm/shadow/multi.c, line 252 has this code snippet:
 
#else /* PAE only... */
    /* Get l3e from the cache of the guest's top level table */
    gw->l3e = (guest_l3e_t *)&v->arch.shadow.gl3e[guest_l3_table_offset(va)];
#endif /* PAE or 64... */
 
Which accesses the L3 entries fro the shadow page tables.  When I instrument this code, I get l3e to be 0 as shown below (the line #s won’t match because of the instrumentation).
 
(XEN) multi.c:236:d1 guest_walk_tables: va: 0x81699000.
(XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0.
(XEN) multi.c:263:d1 guest_walk_tables: l3e not present: 0x0.
(XEN) multi.c:574:d1 sh_guest_map_l1e: va:81699000
 
If I add the /PAE switch to the boot.ini file, then I can get past this problem.  Hence my statement that it appears the hypervisor is assuming guests are running with at least PAE mode enable, which may not be the case.  Could someone please guide me here?
 
The 2nd problem I encountered also has to do with walking the shadow page tables to obtain the MFN of the underlying Windows VA address. sh_guest_map_l1e(), Line 520 in the same file, has this code executed after it walks the guest page tables to obtain the walk_t gw variable.
 
    if ( gw.l2e &&
        (guest_l2e_get_flags(*gw.l2e) & _PAGE_PRESENT) &&
        !(guest_supports_superpages(v) && (guest_l2e_get_flags(*gw.l2e) & _PAGE_PSE)) )
 
 
(XEN) mm.c:2573:d1 grant host mapping: va:81696000 frame:0x15f140
(XEN) mm.c:2507:d1 grant va mapping: va:81696000
(XEN) multi.c:236:d1 guest_walk_tables: va: 0x81696000.
(XEN) multi.c:257:d1 guest_walk_tables: get l3e from cache: 0xff1a6ed0.
(XEN) multi.c:270:d1 guest_walk_tables: l3e flags: 0x1, pfn:0xe9a
(XEN) , mfn:0x9e13d<G><1>multi.c:285:d1 hypervisor l2e mapped address 0xfec8b058
(XEN) multi.c:315:d1 large pages. 0x1e3
(XEN) multi.c:574:d1 sh_guest_map_l1e: va:81696000
(XEN) multi.c:579:d1 sh_guest_map_l1e: gw.l2e flags:0x1e3, supports large 1
(XEN) multi.c:596:d1 pl1e :0x0,
(XEN) mm.c:2512:d1 Could not find L1 PTE for address 81696000
 
 
It looks like it specifically avoids mapping a superpage found in Windows PDE into the hypervisor’s virtual space, which I assume are 4KB-pages.  What puzzles me is that for a hypercall to read the arguments from the caller’s guest space, it uses __hvm_copy which calls shadow_gva_to_gfn() to walk the guest’s shadow page tables to get to the underlying MFN. Couldn’t this code here also do the same?
 
Thanks in advance for any insight into this area.
 
 
Roger Cruz
Principal SW Engineer
Marathon Technologies Corp.
978-489-1153


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.