[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Possible bug/question in xen-hptool?



Hi,

I was looking at using xen-hptool (tool/misc/xen-hptool.c) to make one page of a guest domain offline.Â

I created a guest domain on Xen unstable:â
# xen-mfndump dump-p2m 1Â
I have dom1's mfn of pfn (0x1d):
pfn=0x1d ==> mfn=0x14ee17 (type 0x0)

âRun `lookup-pte` to find the mfn of the pte of mfn (0x14ee17)â:
# xen-mfndump lookup-pte 1 0x14ee17
Â--- Lookig for PTEs mapping mfn 0x14ee17 for domain 1 ---
ÂGuest Width: 8, PT Levels: 4 P2M size: = 262144
 0x14ee17 <-- [0xd948e][29]: 0x1000014ee17027

âNow I use xen-hptool to make mfn (0x14ee17) offlineâ:
# xen-hptool mem-offline 0x14ee17
Prepare to offline MEMORY mfn 14ee17
DOM1: No suspend port, try live migration
Failed to suspend guest 1 for mfn 14ee17
â(Comment: I modified the code to bypass the suspension of the dom1. I should use libxl to suspend dom1 or use the event channel to notify dom1 to suspend as the original code does. But this is not the question/issue I'm talking about here right now and I don't think this will affect the following discussion/conclusion.)â
xc: error: Failure when submitting mmu updates: Internal error
xc: error: clear pte failed: Internal error
Memory mfn 14ee17 offlined successfully , this page is DOM1 page yet failed to be exchanged. current state is [PG_OFFLINE_PENDING, PG_OFFLINE_OWNED]
(XEN) mm.c:2004:d0v0 Error pfn d948e: rd=ffff83015d446000, od=ffff83017d8d0000
ââ
, caf=8000000000000004, taf=1400000000000002
(XEN) mm.c:3544:d0v0 Could not get page for normal update

âI looked into the do_mmu_update() @ xen/arch/x86/mm.c, the reason why this mmu_update fails is because the owner of the page table of mfn (0x14ee17), denoted as pt_dom, is domain 0, while the owner of the page of mfn (0x14ee17) is domain 1 in do_mmu_update().

After digging into it, I found the following code confused/suspicious:

Inside do_mmu_update() @ xen/arch/x86/mm.c,Â
pt_dom is assigned by the this line:Â Âif ( (pt_dom = foreigndom >> 16 ) != 0 ) .Â
However, in flush_mmu_updates() @ tools/libxc/xc_private.c, the foreigndom is assigned by the following line:Âhypercall.arg[3] =Âmmu->subject; where mmu->subject is the guest domain id of the page table.Â

The first question is:
Why should we use "foreigndom >>Â16" instead of "foreigndom" to get the pt_dom?
(When a page is marked offline, we can get the domid of the page via status, using status >>ÂPG_OFFLINE_OWNER_SHIFT. But why should we left shift 16 bits again in do_mmu_update?)
(I think this explains why pt_owner is treated as 0 because pt_owner was just using the default value which is the domain of current vcpu that runs the hypercall.)

pt_owner is retrieved by the following line :Â
if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )Â
My second question is:
Why should we use "pt_dom - 1" instead of Â"pt_dom" here?

If I set the old foreigndom (1) as (foreigndom << 16 | foreigndom) and pass the new foreigndom as the last parameter of do_mmu_update(), and change "pt_dom - 1" to "pt_dom", the xen-hptool will successfully make the mfn offline. Here is the output after issuing the command:Memory mfnÂ0x14ee17Âofflined successfully, this page is DOM1 page and being swapped successfully,Âcurrent state is [PG_OFFLINE_OFFLINED, PG_OFFLINE_OWNED]

I'm wondering if this is a bug in do_mmu_update() or Âat least some inconsistence is in the do_mmu_update() code?Â
Of course, this could also be because I misunderstood something. If so, could you please let me know what I misunderstood and how I should correct it?

Thank you very much for your time!

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.