[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH][VT] Patch to allow VMX domainsto be destroyedor shut down cleanly



Jiang, Yunhong wrote:
> I think the Xin Xiaohui's patch resolved most problem for the page
> left issue.
> 
> Also another small issue found is the on vmx_set_cr0. On Windows, the
> guest will enter/leave protected mode many times.  Following code
> cause problem.
>     if ((value & X86_CR0_PE) && (value & X86_CR0_PG) &&
> !paging_enabled) {
>         /*
>          * The guest CR3 must be pointing to the guest physical.
>          */
>         if ( !VALID_MFN(mfn = get_mfn_from_pfn(
>             d->arch.arch_vmx.cpu_cr3 >> PAGE_SHIFT)) ||
>              !get_page(pfn_to_page(mfn), d->domain) )
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         {
>             printk("Invalid CR3 value = %lx",
>             d->arch.arch_vmx.cpu_cr3); domain_crash_synchronous(); /*
>         need to take a clean path */ }
> 
> We should place the get_page to when guest set cr3 when not paging,
> i.e 
>> 
>    case 3:
>     {
>         unsigned long old_base_mfn, mfn;
> 
>         /*
>          * If paging is not enabled yet, simply copy the value to CR3.
>          */
>         if (!vmx_paging_enabled(d)) {
>               ..... get page here....
> 

More exactly should be:
        if (!vmx_paging_enabled(d)) {
                        get_page for new mfn
                        put_page for old mfn
        }

Thanks
Yunhong Jiang

> After above two change, windows destroied succesfully after
> create/login/open IE/destroy.
> 
> Thanks
> Yunhong Jiang
> 
> Khoa Huynh wrote:
>> Keir Fraser wrote:
>>> I mean forcibly decrement them to zero and free them right there and
>>> then. Of course, as you point out, the problem is that some of the
>>> pages are mapped in domain0. I'm not sure how we can distinguish
>>> tainted refcnts from genuine external references. Perhaps there's a
>>> proper way we should be destructing the full shadow pagetables such
>>> that the refcnts end up at zero.
>> 
>> Thanks for your comment.  I have done extensive tracing through
>> the domain destruction code in the hypervisor in the last few days.
>> 
>> The bottom line:  after domain destruction code in the hypervisor
>> is done, all shadow pages were indeed freed up - even though
>> the shadow_tainted_refcnts flag was set.  I now believe the
>> remaining pages are genuinely externally referenced (possibly
>> by the qemu device model still running in domain0).
>> 
>> Here are more details on what I have found:
>> 
>> Ideally, when we destroy or shut down a VMX domain, the general
>> page reference counts ended up at 0 in shadow mode, so that the
>> pages can be released properly from the domain.
>> 
>> I have traced quite a bit of code for different scenarios
>> involving Windows XP running in a VMX domain.  I only
>> did simple operations in Windows XP, but I tried to destroy
>> the VMX domain at different times (e.g. during Windows XP boot,
>> during simple operations, after Windows XP has been shutdown, etc.)
>> 
>> For non-VMX (Linux) domains, after we relinquish memory in
>> domain_relinquish_resources(), all pages in the domain's page
>> list indeed had reference count of 0 and were properly freed from
>> the xen heap - just like we expected.
>> 
>> For VMX (e.g., Windows XP) domains, after we relinquish memory in
>> domain_relinquish_resources(), depending on how many activities
>> were done in Windows XP, there were anywhere from 2 to 100 pages
>> remaining just before the domain's structures were freed up
>> by the hypervisor.  Most of these pages still have page
>> reference counts of 1, and therefore, could not be freed
>> from the heap by the hypervisor.  This prevents the rest
>> of the domain's resources from being released, and therefore,
>> 'xm list' still shows the VMX domains after they were destroyed.
>> 
>> In shadow mode, the following things could be reflected
>> in the page (general) reference counts:
>> 
>> (a) General stuff:
>>     - page is allocated (PGC_allocated)
>>     - page is pinned
>>     - page is pointed by CR3's
>> (b) Shadow page tables (l1, l2, hl2, etc.)
>> (c) Out-of-sync entries
>> (d) Grant table mappings
>> (e) External references (not through grant table)
>> (f) Monitor page table references (external shadow mode) (g)
>> Writable PTE predictions (h) GDTs/LDTs
>> 
>> So I put in a lot of instrumentation and tracing code,
>> and made sure that the above things were taken into
>> account and removed from the page reference counts
>> during the domain destruction code sequence in the
>> hypervisor.  During this code sequence, we disable
>> shadow mode (shadow_mode_disable()) and the
>> shadow_tainted_refcnts flag was set.  However,
>> much to my surprise, the page reference counts
>> were properly taken care of in shadow mode, and
>> all shadow pages (including those in l1, l2, hl2
>> tables and snapshots) were all freed up.
>> 
>> In particular, here's where each of the things
>> in the above list was taken into account during
>> the domain destruction code sequence in the
>> hypervisor:
>> 
>> (a) General stuff:
>>     - None of remaining pages have PGC_allocated
>>       flag set
>>     - None of remaining pages are still pinned
>>     - The monitor shadow ref was 0, and all
>>       pages pointed to by CR3's were taken care
>>       of in free_shadow_pages()
>> (b) All shadow pages (including those pages in
>>     l1, l2, hl2, snapshots) were freed properly.
>>     I implemented counters to track all shadow
>>     page promotions/allocations and demotions/
>>     deallocations throughout the hypervisor code,
>>     and at the end after we relinquished all domain
>>     memory pages, these counters did indeed
>>     return to 0 - as we expected.
>> (c) out-of-sync entries -> in free_out_of_sync_state()
>>     called by free_shadow_pages().
>> (d) grant table mappings -> the count of active
>>     grant table mappings is 0 after the domain
>>     destruction sequence in the hypervisor is
>>     executed.
>> (e) external references not mapped via grant table
>>     -> I believe that these include the qemu-dm
>>     pages which still remain after we relinquish
>>     all domain memory pages - as the qemu-dm may
>>     still be active after a VMX domain has been
>>     destroyed.
>> (f) external monitor page references -> all references
>>     from monitor page table are dropped in
>>     vmx_relinquish_resources(), and monitor table
>>     itself is freed in domain_destruct().  In fact,
>>     in my code traces, the monitor shadow reference
>>     count was 0 after the domain destruction code in     the
>> hypervisor. (g) writable PTE predictions -> I didn't see any pages in
>>     this category in my code traces, but if there
>>     are, they would be freed up in free_shadow_pages().
>> (h) GDTs/LDTs -> these were destroyed in destroy_gdt() and
>>     invalidate_shadow_ldt() called from domain_relinquish_    
>> resources(). 
>> 
>> Based on the code instrumentation and tracing above, I am
>> pretty confident that the shadow page reference counts
>> were handled properly during the domain destruction code
>> sequence in the hypervisor.  There is a problem in keeping
>> track of shadow page counts (domain->arch.shadow_page_count),
>> and I will submit a patch to fix this shortly.  However, this
>> does not really impact how shadow pages are handled.
>> 
>> Consequently, the pages that still remain after the domain
>> destruction code sequence in the hypervisor are externally
>> referenced and may belong to the qemu device model running
>> in domain0.  The fact that qemu-dm is still active for some
>> time after a VMX domain has been torn down in the hypervisor
>> is evident by examining the tools code (python).  In fact,
>> if I forcibly free these remaining pages from the xen heap,
>> the system/dom0 crashed.
>> 
>> Am I missing anything ? Your comments, suggestions, etc.,
>> are welcome!  Thanks for reading this rather long email :-)
>> 
>> Khoa H.
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.