[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen4.2 S3 regression?
Attached is a new run for new boot (pre-s3) first suspend / resume cycle (s3-first) second (failing) suspend / resume cycle (s3-second) To go into greater detail on the kernel used - It is a 3.2.23 kernel based off of the Ubuntu 12.04 git tree here http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=summary To that, I also have some of Konrad's branches - specifically /devel/ioperm /devel/acpi-s3.v7 /stable/misc (mostly for the microcode fixes) /stable/for-linus-fixes-3.3 /stable/for-linus-3.3 /devel/ttm.dma_pool.v2.9 /stable/for-x86 On top of that, are some more patches specific to our operations, not terribly interesting here, but I can provide them, if necessary. The 3.5 tree I tested with has a similar makeup - with some fewer branches from Konrad. On Wed, Aug 8, 2012 at 6:39 AM, Ben Guthro <ben@xxxxxxxxxx> wrote: > Thanks for taking the time to reply. > > I'm out of the office today, so don't have direct access to the > machine in question until tomorrow... but I'll do my best to answer > (inline below) and I'll follow up tomorrow with concrete answers. > > On Wed, Aug 8, 2012 at 4:35 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote: >>>>> On 07.08.12 at 22:14, Ben Guthro <ben@xxxxxxxxxx> wrote: >>> Any suggestions on how best to chase this down? >>> >>> The first S3 suspend/resume cycle works, but the second does not. >>> >>> On the second try, I never get any interrupts delivered to ahci. >>> (at least according to /proc/interrupts) >>> >>> >>> syslog traces from the first (good) and the second (bad) are attached, >>> as well as the output from the "*" debug Ctrl+a handler in both cases. >> >> You should have provided this also for the state before the >> first suspend. The state after the first resume already looks >> corrupted (presumably just not as badly): > > I'll be able to send this tomorrow. > >> >> (XEN) PCI-MSI interrupt information: >> (XEN) MSI 26 vec=71 lowest edge assert log lowest dest=00000001 >> mask=0/1/-1 >> (XEN) MSI 27 vec=00 fixed edge deassert phys lowest dest=00000001 >> mask=0/1/-1 >> ^^ >> (XEN) MSI 28 vec=29 lowest edge assert log lowest dest=00000001 >> mask=0/1/-1 >> (XEN) MSI 29 vec=79 lowest edge assert log lowest dest=00000001 >> mask=0/1/-1 >> (XEN) MSI 30 vec=81 lowest edge assert log lowest dest=00000001 >> mask=0/1/-1 >> (XEN) MSI 31 vec=99 lowest edge assert log lowest dest=00000001 >> mask=0/1/-1 >> >> so this is likely the reason for thing falling apart on the second >> iteration: >> >> (XEN) Interrupt Remapping: supported and enabled. >> (XEN) Interrupt remapping table (nr_entry=0x10000. Only dump P=1 entries >> here): >> (XEN) SVT SQ SID DST V AVL DLM TM RH DM FPD P >> (XEN) 0000: 1 0 f0f8 00000001 38 0 1 0 1 1 0 1 >> ... >> (XEN) 0014: 1 0 00d8 00000001 a1 0 1 0 1 1 0 1 >> (XEN) 0015: 1 0 00fa 00000001 00 0 0 0 0 0 0 1 >> ^ ^ ^ >> (XEN) 0016: 1 0 f0f8 00000001 31 0 1 1 1 1 0 1 >> (XEN) 0017: 1 0 00a0 00000001 a9 0 1 0 1 1 0 1 >> (XEN) 0018: 1 0 0200 00000001 b1 0 1 0 1 1 0 1 >> (XEN) 0019: 1 0 00c8 00000001 c9 0 1 0 1 1 0 1 >> >> Surprisingly in both cases we get (with the other vector fields varying >> accordingly) >> >> (XEN) IRQ: 26 affinity:0001 vec:71 type=PCI-MSI status=00000010 >> in-flight=0 domain-list=0:279(-S--), >> (XEN) IRQ: 27 affinity:0001 vec:21 type=PCI-MSI status=00000010 >> in-flight=0 domain-list=0:278(-S--), >> ^^ >> (XEN) IRQ: 28 affinity:0001 vec:29 type=PCI-MSI status=00000010 >> in-flight=0 domain-list=0:277(-S--), >> (XEN) IRQ: 29 affinity:0001 vec:79 type=PCI-MSI status=00000010 >> in-flight=0 domain-list=0:276(-S--), >> (XEN) IRQ: 30 affinity:0001 vec:81 type=PCI-MSI status=00000010 >> in-flight=0 domain-list=0:275(PS--), >> (XEN) IRQ: 31 affinity:0001 vec:99 type=PCI-MSI status=00000010 >> in-flight=0 domain-list=0:274(PS--), >> >> The interrupt in question belongs to 0000:00:1f.2, i.e. the >> AHCI contoller. > > This would be consistent with what I've observed. > >> >> Unfortunately I can't make sense of the kernel side config space >> restore messages - an offset of 1 gets reported for the device in >> question (and various other odd offsets exist), yet 3.5's >> drivers/pci/pci.c:pci_restore_config_space_range() calls >> pci_restore_config_dword() with an offset that's always divisible >> by 4. Could you clarify which kernel version you were using here? >> We first need to determine whether the kernel corrupts something >> (after all, config space isn't protected from Dom0 modifications) - >> if that's the case, we may need to understand why older Xen was >> immune against that. If that's not the case, adding some extra >> logging to Xen's pci_restore_msi_state() would seem the best >> first step, plus (maybe) logging of Dom0 post-resume config space >> accesses to the device in question. > > This particular failure is using linux-3.2.23 + some of Konrad's > branches that haven't been merged into mainline (s3 branches, are > probably the most appropriate here) > >> >> The most likely thing happening (though unclear where) is that >> the corresponding struct msi_msg instance gets cleared in the >> course of the first resume (but after the corresponding interrupt >> remapping entry already got restored). >> >> Jan >> Attachment:
xen-dump-s3-second.txt Attachment:
xen-dump-s3-first.txt Attachment:
xen-dump-pre-s3.txt _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |