[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-ia64-devel] RE: [PATCH] Patch to make latest hg multi-domainback to work



Hi, Dan,
        Could you elaborate more how your latest patch works differently and 
fix the potential issue?

-               *pteval = vcpu->arch.dtlb_pte;
+               if (vcpu->domain==dom0 && !in_tpa) *pteval = trp->page_flags;
+               else *pteval = vcpu->arch.dtlb_pte;
+               printf("DTLB MATCH... NEW, DOM%s, %s\n", vcpu->domain==dom0?
+                       "0":"U", in_tpa?"vcpu_tpa":"ia64_do_page_fault");

        The new limitation seems only for dom0, while dom0 has exactly same 
guest physical address as machine address. Based upon this assumption, 
trp->page_flags actually equals to guest pte (vcpu->arch.dtlb_pte)? So I'm not 
sure about the trick here behind.

Thanks,
Kevin

>-----Original Message-----
>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Tian, Kevin
>Sent: 2005年9月8日 17:16
>To: Magenheimer, Dan (HP Labs Fort Collins); Byrne, John (HP Labs)
>Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>Subject: [Xen-ia64-devel] RE: [PATCH] Patch to make latest hg multi-domainback 
>to
>work
>
>Still work for me.
>
>Thanks,
>Kevin
>
>>-----Original Message-----
>>From: Magenheimer, Dan (HP Labs Fort Collins)
>[mailto:dan.magenheimer@xxxxxx]
>>Sent: 2005年9月8日 4:57
>>To: Tian, Kevin; Byrne, John (HP Labs)
>>Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>>Subject: RE: [PATCH] Patch to make latest hg multi-domain back to work
>>
>>It appears that the patch below has created some instability
>>in domain0.  I regularly see a crash now in domain0 when
>>compiling linux.  I changed back to the old code and the
>>crash seems to go away.  Since it is unpredictable, I
>>changed back to the new code AND added printfs around
>>the new code in vcpu_translate and domain0 fails immediately after
>>the printf (but ONLY when it is called from ia64_do_page_fault...
>>its OK when called from vcpu_tpa).
>>
>>The attached patch returns stability to the system.  It
>>is definitely not a final patch (for example it's not
>>SMP-safe), but I thought I would
>>post it if anybody is trying to get some work done and
>>domain0 keeps crashing intermittently.
>>
>>Kevin, John, I still haven't succesfully reproduced your
>>multi-domain success, so please try this patch with
>>the second domain.
>>
>>Thanks,
>>Dan
>>
>>> -----Original Message-----
>>> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
>>> Sent: Friday, September 02, 2005 8:18 AM
>>> To: Magenheimer, Dan (HP Labs Fort Collins); Byrne, John (HP Labs)
>>> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>>> Subject: [PATCH] Patch to make latest hg multi-domain back to work
>>>
>>> I saw some intermittent/weird behavior on latest xen-ia64-unstable.hg
>>> (Rev 6461), where sometimes I can login into xenU shell, sometimes
>>> pending after "Mounting root fs...", and even sometimes the
>>> whole system
>>> is broken as following:
>>>
>>> (XEN) ia64_fault: General Exception: IA-64 Reserved
>>> Register/Field fault
>>> (data access): reflecting
>>> (XEN) $$$$$ PANIC in domain 1 (k6=f000000007fd8000): psr.ic off,
>>> delivering
>>> fault=5300,ipsr=0000121208026010,iip=a00000010000cd00,ifa=f000
>>> 000007fdfd
>>> 60,isr=00000a0c00000004,PSCB.iip*** ADD REGISTER DUMP HERE
>>> FOR DEBUGGING
>>> (XEN) BUG at domain.c:311
>>> (XEN) priv_emulate: priv_handle_op fails, isr=0000000000000000
>>> (XEN)
>>>
>>> Finally I found the root cause is that match_dtlb should return guest
>>> pte instead of machine pte, because translate_machine_pte will be
>>> invoked always after vcpu_translate. Translate_machine_pte assumes to
>>> accept a guest pte and will walk 3 level tables to get machine frame
>>> number. Why does it happen so scare?
>>>     - For xen0, guest pfn == machine pfn, so nothing happen
>>>     - For xenU, currently there's only one vtlb entry to cache
>>> latest inserted TC entry. Say current vtlb entry for VA1 has been
>>> inserted into machine TLB. Normally there'll be many itc issued before
>>> machine TC for VA1 is purged. Those insertion will change single vtlb
>>> entry. So in 99.99% case, once guest va is purged out of machine
>>> TLB/vhpt and trigger TLB miss again, match_tlb will fail.
>>>
>>> But there's also corner case where vtlb entry has not been updated but
>>> the machine TC entry for VA1 has been purged. In this case,
>>> if trying to
>>> access that VA1 immediately, match_dtlb will return true and then
>>> problematic code becomes the murderer.
>>>
>>> For example, sometimes I saw:
>>> (XEN) translate_domain_pte: bad mpa=000000007f170080 (>
>>> 0000000010004000),vadr=5fffff0000000080,pteval=000000007f17056
>>> 1,itir=000
>>> 0000000000038
>>> (XEN) lookup_domain_mpa: bad mpa 000000007f170080 (> 0000000010004000
>>> Above access happens when vcpu_translate tries to access guest SVHPT.
>>> You can saw 0x7f170080 is actually machine pfn. When 0x7f170080 is
>>> passed into translate_machine_pte, warning shows and it's
>>> finally mapped
>>> into machine pfn 0. (Maybe we can change such error condition
>>> to panic,
>>> instead of return incorrect pfn)
>>>
>>> Then things all went weird:
>>>  (XEN) translate_domain_pte: bad mpa=0000eef3f000e738 (>
>>> 0000000010004000),vadr=4000000000042738,pteval=f000eef3f000eef
>>> 3,itir=000
>>> 0000000026238
>>> (XEN) lookup_domain_mpa: bad mpa 0000eef3f000e738 (> 0000000010004000
>>>
>>> And finally GP fault happens. This error has actually hidden
>>> behind for
>>> a long time, but seldom triggered.
>>>
>>> John, please make a test on your side with all the patches I sent out
>>> today (Including the max_page one). I believe we can call it
>>> an end now.
>>> ;-)
>>>
>>> BTW, Dan, there's two heads on current xen-ia64-unstable.hg.
>>> Please do a
>>> merge.
>>>
>>> Signed-off-by Kevin Tian <Kevin.tian@xxxxxxxxx>
>>>
>>> diff -r 68d8a0a1aeb7 xen/arch/ia64/xen/vcpu.c
>>> --- a/xen/arch/ia64/xen/vcpu.c      Thu Sep  1 21:51:57 2005
>>> +++ b/xen/arch/ia64/xen/vcpu.c      Fri Sep  2 21:30:01 2005
>>> @@ -1315,7 +1315,8 @@
>>>     /* check 1-entry TLB */
>>>     if ((trp = match_dtlb(vcpu,address))) {
>>>             dtlb_translate_count++;
>>> -           *pteval = trp->page_flags;
>>> +           //*pteval = trp->page_flags;
>>> +           *pteval = vcpu->arch.dtlb_pte;
>>>             *itir = trp->itir;
>>>             return IA64_NO_FAULT;
>>>     }
>>>
>>> Thanks,
>>> Kevin
>>>
>
>_______________________________________________
>Xen-ia64-devel mailing list
>Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>http://lists.xensource.com/xen-ia64-devel

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.