[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] blktap and file-backed qcow: crashes and bad performance?
No I think that we have a few weeks left in our league. Debbie plays 2-3 nights a week, and is in 2 different leagues. On Fri, 2006-08-11 at 16:59 +0200, Christoph Dwertmann wrote: > Hi! > > I'm running the latest Xen unstable x86_64 on a Dell Poweredge 1950 > Dual CPU Dual Core Xeon with 16GB RAM. I'm using file-backed sparse > qcow images as root filesystems for the Xen guests. All qcow images > are backed by the same image file (a 32bit Debian sid installation). > The Xen disk config looks like this: > > disk = [ 'tap:qcow:/home/images/%s.%d.qcow,xvda1,w' % (vmname, vmid)] > > Before that I use the qcow-create tool to create those qcow files. > > I use grub to boot Xen like this: > root (hd0,0) > kernel /boot/xen-3.0-unstable.gz com2=57600,8n1 console=com2 > dom0_mem=4097152 noreboot xenheap_megabytes=32 > module /boot/xen0-linux root=/dev/sda1 ro noapic console=tty0 > xencons=ttyS1 console=ttyS1 > module /boot/xen0-linux-initrd > > My goal is to run 100+ Xen guests, but this seems impossible. I > observe several things: > > - after creating a few Xen guests (and even after shutting them down), > my process list is cluttered with "tapdisk" processes that put full > load on all 8 virtual CPUs on the dom0. The system gets unuseable. > Killing the tapdisk processes also apparently destroys the qcow > images. > > - I (randomly?) get the messages "Error: (28, 'No space left on > device')" or "Error: Device 0 (vif) could not be connected. Hotplug > scripts not working." or even "Error: (12, 'Cannot allocate memory')" > on domU creation. There is plenty of disk space and RAM available at > that time. This mostly happens when creating more than 80 guests. > > - the dom0 will sooner or later crash with a message like this: > > ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at fs/aio.c:511 > invalid opcode: 0000 [1] SMP > CPU 0 > Modules linked in: ipt_MASQUERADE iptable_nat ip_nat ip_conntrack > nfnetlink ip_tables x_tables bridge dm_snapshot dm_mirror dm_mod > usbhid ide_cd sers > Pid: 46, comm: kblockd/0 Not tainted 2.6.16.13-xen-kasuari-dom0 #1 > RIP: e030:[<ffffffff8018f8ee>] <ffffffff8018f8ee>{__aio_put_req+39} > RSP: e02b:ffffffff803a89c8 EFLAGS: 00010086 > RAX: 00000000ffffffff RBX: ffff8800f43d7a80 RCX: 00000000f3bdc000 > RDX: 0000000000001458 RSI: ffff8800f43d7a80 RDI: ffff8800f62d1c80 > RBP: ffff8800f62d1c80 R08: 6db6db6db6db6db7 R09: ffff88000193d000 > R10: 0000000000000000 R11: ffffffff80153e48 R12: ffff8800f62d1ce8 > R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000000 > FS: 00002b9bf01bccb0(0000) GS:ffffffff80472000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 > Process kblockd/0 (pid: 46, threadinfo ffff8800005e4000, task > ffff8800005c57e0) > Stack: ffff8800f43d7a80 ffff8800f62d1c80 ffff8800f62d1ce8 ffffffff80190082 > ffff880004e83d10 ffff8800f4db7400 0000000000000200 ffff8800f4db7714 > ffff8800f4db7400 0000000000000001 > Call Trace: <IRQ> <ffffffff80190082>{aio_complete+297} > <ffffffff80195b0b>{finished_one_bio+159} > <ffffffff80195be8>{dio_bio_complete+150} > <ffffffff80195d24>{dio_bio_end_aio+32} > <ffffffff801cf1b7>{__end_that_request_first+328} > <ffffffff801d00ca>{blk_run_queue+50} > <ffffffff8800524d>{:scsi_mod:scsi_end_request+40} > <ffffffff880054fe>{:scsi_mod:scsi_io_completion+525} > <ffffffff880741ce>{:sd_mod:sd_rw_intr+598} > <ffffffff88005792>{:scsi_mod:scsi_device_unbusy+85} > <ffffffff801d1534>{blk_done_softirq+175} > <ffffffff80132544>{__do_softirq+122} > <ffffffff8010bada>{call_softirq+30} <ffffffff8010d231>{do_softirq+73} > <ffffffff8010d626>{do_IRQ+65} <ffffffff8023bf5a>{evtchn_do_upcall+134} > <ffffffff801d8a66>{cfq_kick_queue+0} > <ffffffff8010b60a>{do_hypervisor_callback+30} <EOI> > <ffffffff801d8a66>{cfq_kick_queue+0} > <ffffffff8010722a>{hypercall_page+554} > <ffffffff8010722a>{hypercall_page+554} > <ffffffff801dac97>{kobject_get+18} > <ffffffff8023b7aa>{force_evtchn_callback+10} > <ffffffff8800641d>{:scsi_mod:scsi_request_fn+935} > <ffffffff801d8adc>{cfq_kick_queue+118} > <ffffffff8013d3e6>{run_workqueue+148} > <ffffffff8013db18>{worker_thread+0} > <ffffffff80140abd>{keventd_create_kthread+0} > <ffffffff8013dc08>{worker_thread+240} > <ffffffff80125cdb>{default_wake_function+0} > <ffffffff80140abd>{keventd_create_kthread+0} > <ffffffff80140abd>{keventd_create_kthread+0} > <ffffffff80140d61>{kthread+212} <ffffffff8010b85e>{child_rip+8} > <ffffffff80140abd>{keventd_create_kthread+0} > <ffffffff80140c8d>{kthread+0} > <ffffffff8010b856>{child_rip+0} > > Code: 0f 0b 68 c3 9b 2f 80 c2 ff 01 85 c0 74 07 31 c0 e9 09 01 00 > RIP <ffffffff8018f8ee>{__aio_put_req+39} RSP <ffffffff803a89c8> > <0>Kernel panic - not syncing: Aiee, killing interrupt handler! > (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. > > Is it just my setup or > - does Xen not scale at all to 100+ machines? > - does blktap not scale at all? > - is blktap with qcow very unstable right now? > > Thank you for any pointers, > _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |