[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system



Hi,

I finally solved a hang on a 1TB box during our dom0 boot on xen 3.4.0,
that I'd been working on. The hang comes from:

calibrate_delay_direct():
....
        for (i = 0; i < MAX_DIRECT_CALIBRATION_RETRIES; i++) {
                pre_start = 0;
                start_jiffies = jiffies;
                while (jiffies <= (start_jiffies + tick_divider)) {
                        pre_start = start;
                        read_current_timer(&start);
                }
                read_current_timer(&post_start);
...


start_jiffies is set to : INITIAL_JIFFIES == 0xfffedb08

now, timer interrupt comes in and finding delta to be rather
huge (thanks to the page scrubbing of 1TB in xen), makes jiffies
wrap around. This causes hang in the loop, that would resolve after
say several days.

delta: 940b7d68a4, jiffies:00009f8b


I came up with fix (is there a reason it doesn't use 64bit values?) :

             while (jiffies <= (start_jiffies + tick_divider)) {
                   pre_start = start;
                   read_current_timer(&start);
+                  if (jiffies < start_jiffies)  /* jiffies wrapped */
+                          start_jiffies = jiffies;
             }


The other fix I thought of was to change INITIAL_JIFFIES to something 
sooner.


Would appreciate any help, I don't understand xen time management well.

thanks,
Mukesh


PS: I'm attaching output of 'xm debug-key t'.

Attachment: skew.out
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.