Xen project Mailing List

RE: [Xen-devel] disk io errors possibly caused by high network load?

To: "Ian Pratt" <Ian.Pratt@xxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: Moritz Möller <m.moeller@xxxxxxxxxxxx>

Date: Fri, 19 Sep 2008 14:59:30 +0200

Cc:

Delivery-date: Fri, 19 Sep 2008 06:00:11 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AckaTGDzATGLlbxxRXaBZvbEdrbgFgACImAQAAB97RA=

Thread-topic: [Xen-devel] disk io errors possibly caused by high network load?

We rebooted the machines really quickly because it was a productive system, so I didn't have the time to copy the logs, and on the disks I see nothing about this in the logfiles, propably because the IO was already down. The machines are Supermicro, Intel Xeon Quad or Dual-Quadcore, 8 to 32 GB RAM, and some have a mdraid setup with two SATA drives with the on board sata controller (intel ICH), other have a dedicated 3ware / AMCC 9660 or similar. The machines that crashed were on different power lines and connected to different switches, although on the same network segment. Also there were no physical interferences. The error was reported by domU and dom0 - both saying the disk would give a I/O error, but no specific information. Network card is intel e1000. Lsmod: -----Original Message----- From: Ian Pratt [mailto:Ian.Pratt@xxxxxxxxxxxxx] Sent: Friday, September 19, 2008 2:44 PM To: Moritz Möller; xen-devel@xxxxxxxxxxxxxxxxxxx Cc: Ian Pratt Subject: RE: [Xen-devel] disk io errors possibly caused by high network load? > we had a very strange situation yesterday. In one second, 13 of 25 xen > boxes died with disk errors (domU and dom0, something like end_request: > I/O error dev hda sector ...), but worked well again after a reboot. > > Some minutes before a technician plugged in a wrong cable, creating a > network loop - so the error could be caused by a high network io load. > The disks are okay, and the error occurred with both scsi raid > controllers and plain sata disks. This is quite remarkable -- I don't think anyone has reported anything similar before, despite there being many large xen deployments. Are you saying that IO errors were reported from both dom0 and the domU's? Did you actually track down the specific device major/minor that was reporting the error? Is there any network storage (e.g. iSCSI, AOE) in your setup? Ian > Here is some info of a host that crashed: > > root/mmoeller@srv002050:/root$ xm info > host : srv002050 > release : 2.6.21-2950.fc8xen > version : #1 SMP Tue Oct 23 12:23:33 EDT 2007 > machine : x86_64 > nr_cpus : 8 > nr_nodes : 1 > cores_per_socket : 4 > threads_per_core : 1 > cpu_mhz : 1866 > hw_caps : > bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 > total_memory : 8190 > free_memory : 12 > node_to_cpu : node0:0-7 > xen_major : 3 > xen_minor : 2 > xen_extra : .0 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : unavailable > cc_compiler : gcc version 4.1.2 20061115 (prerelease) > (Debian > 4.1.1-21) > cc_compile_by : root > cc_compile_domain : office.bigpoint.net > cc_compile_date : Tue Mar 11 13:57:28 CET 2008 > xend_config_format : 4 > root/mmoeller@srv002050:/root$ uname -r > 2.6.21-2950.fc8xen > > And here of a host that did not crash: > > root/mmoeller@srv006215:/root$ xm info > host : srv006215 > release : 2.6.21-2950.fc8xen > version : #1 SMP Tue Oct 23 12:23:33 EDT 2007 > machine : x86_64 > nr_cpus : 4 > nr_nodes : 1 > cores_per_socket : 4 > threads_per_core : 1 > cpu_mhz : 2394 > hw_caps : > bfebfbff:20100800:00000000:00000140:0000e3bd:00000000:00000001 > total_memory : 8190 > free_memory : 10 > node_to_cpu : node0:0-3 > xen_major : 3 > xen_minor : 2 > xen_extra : .0 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : unavailable > cc_compiler : gcc version 4.1.2 20061115 (prerelease) > (Debian > 4.1.1-21) > cc_compile_by : root > cc_compile_domain : office.bigpoint.net > cc_compile_date : Tue Mar 11 13:57:28 CET 2008 > xend_config_format : 4 > root/mmoeller@srv006215:/root$ uname -r > 2.6.21-2950.fc8xen > > Does someone have an idea how this could happen? > > > Thanks, > > > Moritz > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.