[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-ia64-devel] RE: ar.unat[patch] fixed this ar.uant issue.[patch] fixed ar.unat save/restore issue


  • To: "Magenheimer, Dan \(HP Labs Fort Collins\)" <dan.magenheimer@xxxxxx>
  • From: "Xu, Anthony" <anthony.xu@xxxxxxxxx>
  • Date: Mon, 14 Nov 2005 18:37:09 +0800
  • Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 14 Nov 2005 10:37:06 +0000
  • List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
  • Thread-index: AcXf7/qlxt/UBLYXQj6idPbhpb4eQQAKxkpQAAOxG7AACK2+8AANzkbgABfse2AAt6FwgAAPttsgAL0t6+AAgrfsAA==
  • Thread-topic: ar.unat[patch] fixed this ar.uant issue.[patch] fixed ar.unat save/restore issue

Yes, this patch may make dom0 go through ltp test, 
Your logic to handle nat consumption fault is
If( register nat bit fault)
        Inject nat consumption fault to guest;
Else(means this nat page fault)
        Attempting to handle as privop
        If( it is privop)
                Return;
        Else
                Inject nat consumption fault to guest
        
When nat page fault happens, it is usually caused by an instruction which is 
accessing a page whose page attribute is nat page, so it must be ld or st 
instruction, it is definitely not privop instruction. So it is not necessary to 
attempt to handle nat fault as privop, we should inject it to guest directly.
There should be not register nat bit fault when running itp,
So the logic in my mind is,
If(register nat bit fault)
        Panic();
Else
        Inject nat consumption fault to guest.

If it panics, there should be some places nearby where ar.unat is not correctly 
handled. We should take this chance to fix all ar.unat related bugs.

>I am still not sure about the use of eml_unat.  I commented
>out your code (in ia64_handle_reflection) that sets it to zero

yes, you can comment this code, it was used for debugging ar.unat fault.



Thanks
-Anthony






>-----Original Message-----
>From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:dan.magenheimer@xxxxxx]
>Sent: 2005年11月12日 3:30
>To: Xu, Anthony
>Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>Subject: RE: ar.unat[patch] fixed this ar.uant issue.[patch] fixed ar.unat
>save/restore issue
>
>Anthony --
>
>I just committed a fix to allow nat consumption faults to
>be delivered again.  I think this is now necessary after
>the region0 virtual address fixes needed for ltp-mmap09.
>Without these nat fixes, ltp-getpeername01 reproducibly
>goes into an infinite loop reporting NaT errors (because
>the "return" in the reflection code doesn't result in
>the NaT getting reflected to the guest).
>
>I have left the printfs so any code that results in
>a inst/data page nat consumption fault (e.g. certain
>situations where the zero page is accessed) will be
>very chatty, but I think that's OK for now until we
>are sure we have fixed all NaT problems.
>
>I am still not sure about the use of eml_unat.  I commented
>out your code (in ia64_handle_reflection) that sets it to zero
>and Tony's checker program and getpeername01 still work.
>If this (setting eml_unat to zero) is handling some
>special case that I am not testing for, please let me
>know.
>
>Thanks,
>Dan
>
>> -----Original Message-----
>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>> Sent: Monday, November 07, 2005 6:30 PM
>> To: Magenheimer, Dan (HP Labs Fort Collins)
>> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: RE: ar.unat[patch] fixed this ar.uant issue.[patch]
>> fixed ar.unat save/restore issue
>>
>> See my comments,
>>
>> >-----Original Message-----
>> >From: Magenheimer, Dan (HP Labs Fort Collins)
>> [mailto:dan.magenheimer@xxxxxx]
>> >Sent: 2005年11月8日 2:07
>> >To: Xu, Anthony
>> >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >Subject: RE: ar.unat[patch] fixed this ar.uant issue.[patch]
>> fixed ar.unat
>> >save/restore issue
>> >
>> >Another NaT question...
>> >
>> >>I recall that some time ago (around the time of the merge)
>> >>you submitted some patches related to fixing ar.unat saving
>> >>and restoring.
>> >
>> >Another part of your earlier patch was a change in
>> >ia64_handle_reflection.  I still periodically get the
>> >message:
>> >
>> >   NaT fault... attempting to handle as privop
>> >
>> >Since your latest fix, Tony's regcheck tool no longer
>> >reports ar.unat as being saved/restored incorrectly.
>> >I was hoping that the above message would go away also,
>> >but it has not.  I see it a couple times at boot and
>> >a couple times for every linux compile (at the end so
>> >it is probably the linker or some other link-related
>> >tool).  I have also seen programs segfault after printing
>> >this message.  So I went to look at the Xen/ia64 code where
>> >this is printed.
>> >
>>
>> I have not seen nat consumptions and segmentations faults for
>> a long time, in your build test and ltp test. Otherwise, I'll
>> definitely try to fix that.
>>
>> >It doesn't look right to me.  There are two issues:
>> >
>> >1) Your patch added a "return"... I think this means that
>> >   NaT faults will never get reflected to a guest (even
>> >   Register NaT Consumption faults).
>>
>> Yes, you are right, we should inject Nat Consumption faults
>> to guest, but as I know there should be not NaT consumption
>> faults in linux, so I simply added a "return". I think the
>> best way is to add "panic" at this place, this will enforce
>> us to debug this issue rather than temporarily work around.
>>
>>
>> >2) Since a Instruction NaTPage Consumption fault has higher
>> >   priority than a Privileged Operation fault, I think the
>> >   original printf/priv_emulate code was intended to catch
>> >   this case and properly emulate a privileged instruction
>> >   on a NaTPage.  I think it may also be necessary if a Data
>> >   NaTPage Consumption fault is incurred when the privop
>> >   emulation code fetches the instruction.  (The code in
>> >   ia64_handle_reflection should probably check the ISR to
>> >   avoid calling priv_emulate for other kinds of NaT
>> >   Consumption though.)
>>
>> I have been being curious why use emulate function to handle
>> NaT consumption.
>> Now I understand, thank you for your detailed explain. Maybe
>> we need to put more comments in the confusing place like this.
>>
>>
>>
>> >You know more about NaT's than I do... could you recheck
>> >this code in ia64_handle_reflection please?  Do you have
>> >any test code that provokes any of these NaT faults?
>> >
>>
>> It' is very kind of you to say that, unfortunately I have not
>> seen those issues. What I suspect is dom0 does bank switch on
>> shared page but not consider ar.unat.
>>
>> Anyway, I'll try to provoke this fault, If I find, I'll
>> definitely fix it.
>>
>> >Thanks.
>> >Dan
>> >
>> >> -----Original Message-----
>> >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>> >> Sent: Friday, November 04, 2005 12:10 AM
>> >> To: Magenheimer, Dan (HP Labs Fort Collins)
>> >> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >> Subject: RE: ar.unat[patch] fixed this ar.uant issue.[patch]
>> >> fixed ar.unat save/restore issue
>> >>
>> >> >I am curious about the use of B1NATS in the code
>> >> >around this patch.  Under what circumstances does
>> >> >this get set/used?
>> >>
>> >> 1. emulate  bsw1, bsw0
>> >> 2. emulate rfi.
>> >> 3. inject fault to guest.
>> >>
>> >> There is similar unat code in
>> >> >fast_tick (default off) and fast_reflect (default on)
>> >> >and I am wondering if similar unat changes are needed
>> >> >and whether it is now OK to turn on HANDLE_AR_UNAT
>> >> >(which is now default off).
>> >> You are right, in above two cases you should also save
>> >> ar.unat to XSI_B1NATS_OFS after spilling the guest bank1to
>> >> share page. I had handled all this in C code. I didn't look
>> >> into fast hypercall code, It's hard to read due to I am not
>> >> good at assembly code. The principle of handling ar.unat is
>> >> obvious; every time you spill banking register you must save
>> >> corresponding ar.unat after it, every time you fill banking
>> >> register you must restore corresponding ar.unat before it.
>> >>
>> >> We don't need to clear all guest b0 registers and their's nat
>> >> bit. Because r16~r23 are preserved regs and r24~r31 are
>> >> scratch regs, we only need to restore r16~r23 rather than
>> >> clear r16~r23 to 0.
>> >>
>> >> Next time you enable some functions like hyper_ssm_i, when
>> >> you save bank1 regs you should also save bank1 unat.
>> >>
>> >> Below patch enables HANDLE_AR_UNAT.
>> >>
>> >>
>> >>
>> >> Signed-off-by Anthony Xu <Anthony.xu@xxxxxxxxx>
>> >>
>> >> Thanks,
>> >> Anthony.
>> >>
>> >>
>> >>
>> >>
>> >> >-----Original Message-----
>> >> >From: Magenheimer, Dan (HP Labs Fort Collins)
>> >> [mailto:dan.magenheimer@xxxxxx]
>> >> >Sent: 2005年11月3日 22:42
>> >> >To: Xu, Anthony
>> >> >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >> >Subject: RE: ar.unat[patch] fixed this ar.uant issue.
>> >> >
>> >> >Hi Anthony --
>> >> >
>> >> >I am curious about the use of B1NATS in the code
>> >> >around this patch.  Under what circumstances does
>> >> >this get set/used?  There is similar unat code in
>> >> >fast_tick (default off) and fast_reflect (default on)
>> >> >and I am wondering if similar unat changes are needed
>> >> >and whether it is now OK to turn on HANDLE_AR_UNAT
>> >> >(which is now default off).
>> >> >
>> >> >Thanks,
>> >> >Dan
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>> >> >> Sent: Thursday, November 03, 2005 1:08 AM
>> >> >> To: Magenheimer, Dan (HP Labs Fort Collins)
>> >> >> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >> >> Subject: RE: ar.unat[patch] fixed this ar.uant issue.
>> >> >>
>> >> >> Dan,
>> >> >> Last time, I used ar.unat register to restore guest general
>> >> >> register nat bit in hyper_rfi function for eliminating nat
>> >> >> bit consumption fault,but not restored ar.unat.
>> >> >>
>> >> >> Signed-off-by Anthony Xu <Anthony.xu@xxxxxxxxx>
>> >> >>
>> >> >> Thanks,
>> >> >> Anthony.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> >-----Original Message-----
>> >> >> >From: Magenheimer, Dan (HP Labs Fort Collins)
>> >> >> [mailto:dan.magenheimer@xxxxxx]
>> >> >> >Sent: 2005年11月3日 11:54
>> >> >> >To: Xu, Anthony
>> >> >> >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >> >> >Subject: RE: ar.unat
>> >> >> >
>> >> >> >> I can take a look at this, please send me regcheck utilty.
>> >> >> >>
>> >> >> >>
>> >> >> >> Thanks
>> >> >> >> Anthony
>> >> >> >
>> >> >> >Great, thanks!  Here's where I got Tony's regcheck tool.  If
>> >> >> >it's not still there, perhaps Tony can post it.
>> >> >> >
>> >> >> >By the way, if anyone tries this on a domU, Matt Chapman
>> >> >> >has a pending fix that resolves a FP save/restore issue.
>> >> >> >
>> >> >> >Thanks,
>> >> >> >Dan
>> >> >> >
>> >> >> >> -----Original Message-----
>> >> >> >> From: linux-ia64-owner@xxxxxxxxxxxxxxx
>> >> >> >> [mailto:linux-ia64-owner@xxxxxxxxxxxxxxx] On Behalf Of
>> >> Luck, Tony
>> >> >> >> Sent: Tuesday, March 01, 2005 4:33 PM
>> >> >> >> To: linux-ia64@xxxxxxxxxxxxxxx
>> >> >> >> Subject: RE: [patch 2.6.11-rc3-bk4] Correctly dereference
>> >> >> >> ia64_mca_data
>> >> >> >>
>> >> >> >> Back on February 9th, I wrote:
>> >> >> >> >I wrote a test program that loads up random values
>> >> into registers
>> >> >> >> >(just r1-r31, a bunch of stacked registers, and
>> >> f2-f127 for now)
>> >> >> >> >and then checks that all the registers haven't
>> changed value a
>> >> >> >> >few thousand times, before reloading with a new set
>> of random
>> >> >> >> >values.
>> >> >> >>
>> >> >> >> A few people asked whether I could post the program
>> ... it took
>> >> >> >> a while to get sign-off ... but that gave me time to
>> >> add "branch",
>> >> >> >> "predicate" and half a dozen "application" registers
>> to the mix,
>> >> >> >> plus make it print the name of the register that was
>> >> nuked (instead
>> >> >> >> of a number that required manual translation).
>> >> >> >>
>> >> >> >> I've tested it by using a debugger to zap one of each class
>> >> >> >> of register
>> >> >> >> that is being monitored to check that it works.
>> >> >> >>
>> >> >> >>
>> >> http://www.kernel.org/pub/linux/kernel/people/aegl/ia64regcheck.tgz
>> >> >> >>
>> >> >> >> Usage ... compile, and run a few copies.  If they all
>> >> >> "exit(0)" (which
>> >> >> >> may take a couple of days) the test passed.  Otherwise you
>> >> >> should see
>> >> >> >> the name of the register printed to stderr, and exit code 1.
>> >> >> >>
>> >> >> >> Apart from the MCA case, I haven't seen it report a problem
>> >> >> >> yet ... but
>> >> >> >> I've only run a few hours.
>> >> >> >>
>> >> >> >> -Tony
>> >> >>
>> >>
>>

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.