[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [hvm] lost hd interrupts running native SMP guests
[I changed the subject line to reflect the current topic of conversation. Maybe someone else is seeing this as well?] I'm using the 64-bit SMP hypervisor, running on dual-CPU VT machines (Dell 380, Dell SC430). Then I boot a dual-VCPU HVM guest (with VCPUs bound to CPUs 0 and 1, respectively), running RedHat Enterprise Linux 4 U2 (64bit, smp kernel). Redhat calls this a 2.6.9 kernel, but it includes a bunch of cherrypicked patches from later versions (through roughly 2.6.12, as I remember). I'm enabling both APIC and ACPI in the hvm domain builder, and using the 2-processor BIOS. If I tell the Linux kernel "noapic" so that it avoids using the IOAPIC, I boot and run just fine. Without "noapic", I'm getting into userspace and able to access the (QEMU-emulated) hd. But typically while running my rc3.d scripts, I get: "hda: dma_timer_expiry dma_status == 0x64", which stops any further progress. I've tried disabling dma for hda in the guest ("ide=nodma"), and it still hangs this time with no "dma_timer_expiry" message (and sometimes a "hda: lost interrupt" msg, though I don't see that right now). I tried the patch you just sent, but that doesn't seem to help (even when combined with my vioapic locking). FWIW, I've attached my vioapic locking patch. I haven't been able to verify this code yet, nor have I even given it a good look-over since I first wrote it ... (This is *not* intended to be checked in yet.) Dave On 5/18/06, Jiang, Yunhong <yunhong.jiang@xxxxxxxxx> wrote: >As I mentioned, I have a very similar patch to make the IOAPIC code >SMP safe. But since (even with these changes) I still see a huge >number of lost hda interrupts when using the IOAPIC on SMP guests, I >haven't been able to test it yet. I assume others see the same >problems with the IOAPIC?? (I'll be diving into this soon -- >probably tonight or tomorrow. At this point I have no clue what's >going wrong.) On which situation will the IOAPIC has a lot of hd lost interrupt? What's the guest kernel version are you using? I remember some old version kernel has problem. Also there is a bug on the round robin code.Current code will always leads interrupt to vcpu 0. Followed is the fix for it. But this fix cause problem for timer interrupt, I'm not sure the cause, but I suspect it is because the timer is injected in flood. The below fix is based one of my another APIC patch , so not sure if you can apply it directly, but I think you can figure out the changes easily. Thanks Yunhong Jiang diff -r 86d8246c6aff xen/arch/x86/hvm/vlapic.c --- a/xen/arch/x86/hvm/vlapic.c Wed May 17 23:15:36 2006 +0100 +++ b/xen/arch/x86/hvm/vlapic.c Thu May 18 22:30:06 2006 +0800 @@ -308,8 +308,15 @@ struct vlapic* apic_round_robin(struct d old = next = d->arch.hvm_domain.round_info[vector]; - do { - /* the vcpu array is arranged according to vcpu_id */ + /* the vcpu array is arranged according to vcpu_id */ + do + { + next ++; + if ( !d->vcpu[next] || + !test_bit(_VCPUF_initialised, &d->vcpu[next]->vcpu_flags) || + next == MAX_VIRT_CPUS ) + next = 0; + if ( test_bit(next, &bitmap) ) { target = d->vcpu[next]->arch.hvm_vcpu.vlapic; @@ -321,12 +328,6 @@ struct vlapic* apic_round_robin(struct d } break; } - - next ++; - if ( !d->vcpu[next] || - !test_bit(_VCPUF_initialised, &d->vcpu[next]->vcpu_flags) || - next == MAX_VIRT_CPUS ) - next = 0; } while ( next != old ); d->arch.hvm_domain.round_info[vector] = next; ~ > >Dave > >_______________________________________________ >Xen-devel mailing list >Xen-devel@xxxxxxxxxxxxxxxxxxx >http://lists.xensource.com/xen-devel > Attachment:
vioapic-smp-safety.patch _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |