[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] time still going backwards
I've been running Pallas MPI benchmarks with several configurations, I just ran a test that errored out. I've run the benchmark successfully on Xen0 only (four nodes) and on XenU only (four nodes) with no Xen related errors and no benchmark errors. This time I ran it with two XenU's on each of the four nodes, each participating in two separate, simultaneous benchmark runs (two groups of four XenU's) and all bridged to the cluster LAN. Only one physical node had a problem (they are identical builds of Xen and XenLinux 2.4.27, last cset 1.1362, 2004-10-04 15:55:47+01:00). There was a group of messages late August with the same time went backwards errors, but this is a recent build. One thing is also that on this node Xen chose to host both guests on CPU 1 (and I know that at the exact moment of failure Xen1 was interacting with the only other one not to spread out the guests (it actually had all three Xen0,Xen1,Xen2 on CPU 0)). I have no clue if any of this information is helpful :-). (I am attempting another run with the same configuration right now) xm dmesg: (XEN) APIC error on CPU0: 00(02) (XEN) APIC error on CPU1: 00(02) (XEN) APIC error on CPU1: 02(02) (XEN) APIC error on CPU0: 02(02) (XEN) APIC error on CPU1: 02(01) (XEN) APIC error on CPU0: 02(02) Xen0 dmesg, just two error messages: Timer ISR: Time went backwards: -59799000 Timer ISR: Time went backwards: -48699000 (these filled the whole kernel ring buffer:) Xen1 dmesg, attached, time went backwards many times Xen2 dmesg, attached, time went backwards many times benchmark error, Xen1, presumably at the same time as Xen2.. (though on a different benchmark, the two groups of four actually lost sync after a while, I'm using the default CPU scheduler. I chalk that up to the weird cpu pinning that Xen/Xend chose for two of the physical nodes, I am going to pin those myself in the future) p3_827: p4_error: net_recv read: probable EOF on socket: 1 p1_777: p4_error: net_recv read: probable EOF on socket: 1 benchmark error, Xen2 p2_821: (347.806618) net_recv failed for fd = 4 p2_821: p4_error: net_recv read, errno = : 104 p3_769: p4_error: net_recv read: probable EOF on socket: 1 p1_766: (402.558327) net_recv failed for fd = 8 p1_766: p4_error: net_recv read, errno = : 104 Attachment:
error.dmesg.xen1 Attachment:
error.dmesg.xen2
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |