Xen project Mailing List

Re: [Xen-devel] Xen4.2 S3 regression?

From: Ben Guthro <ben@xxxxxxxxxx>

Date: Wed, 8 Aug 2012 06:39:48 -0400

Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Thomas Goetz <thomas.goetz@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 08 Aug 2012 10:40:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thanks for taking the time to reply. I'm out of the office today, so don't have direct access to the machine in question until tomorrow... but I'll do my best to answer (inline below) and I'll follow up tomorrow with concrete answers. On Wed, Aug 8, 2012 at 4:35 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote: >>>> On 07.08.12 at 22:14, Ben Guthro <ben@xxxxxxxxxx> wrote: >> Any suggestions on how best to chase this down? >> >> The first S3 suspend/resume cycle works, but the second does not. >> >> On the second try, I never get any interrupts delivered to ahci. >> (at least according to /proc/interrupts) >> >> >> syslog traces from the first (good) and the second (bad) are attached, >> as well as the output from the "*" debug Ctrl+a handler in both cases. > > You should have provided this also for the state before the > first suspend. The state after the first resume already looks > corrupted (presumably just not as badly): I'll be able to send this tomorrow. > > (XEN) PCI-MSI interrupt information: > (XEN) MSI 26 vec=71 lowest edge assert log lowest dest=00000001 > mask=0/1/-1 > (XEN) MSI 27 vec=00 fixed edge deassert phys lowest dest=00000001 > mask=0/1/-1 > ^^ > (XEN) MSI 28 vec=29 lowest edge assert log lowest dest=00000001 > mask=0/1/-1 > (XEN) MSI 29 vec=79 lowest edge assert log lowest dest=00000001 > mask=0/1/-1 > (XEN) MSI 30 vec=81 lowest edge assert log lowest dest=00000001 > mask=0/1/-1 > (XEN) MSI 31 vec=99 lowest edge assert log lowest dest=00000001 > mask=0/1/-1 > > so this is likely the reason for thing falling apart on the second > iteration: > > (XEN) Interrupt Remapping: supported and enabled. > (XEN) Interrupt remapping table (nr_entry=0x10000. Only dump P=1 entries > here): > (XEN) SVT SQ SID DST V AVL DLM TM RH DM FPD P > (XEN) 0000: 1 0 f0f8 00000001 38 0 1 0 1 1 0 1 > ... > (XEN) 0014: 1 0 00d8 00000001 a1 0 1 0 1 1 0 1 > (XEN) 0015: 1 0 00fa 00000001 00 0 0 0 0 0 0 1 > ^ ^ ^ > (XEN) 0016: 1 0 f0f8 00000001 31 0 1 1 1 1 0 1 > (XEN) 0017: 1 0 00a0 00000001 a9 0 1 0 1 1 0 1 > (XEN) 0018: 1 0 0200 00000001 b1 0 1 0 1 1 0 1 > (XEN) 0019: 1 0 00c8 00000001 c9 0 1 0 1 1 0 1 > > Surprisingly in both cases we get (with the other vector fields varying > accordingly) > > (XEN) IRQ: 26 affinity:0001 vec:71 type=PCI-MSI status=00000010 > in-flight=0 domain-list=0:279(-S--), > (XEN) IRQ: 27 affinity:0001 vec:21 type=PCI-MSI status=00000010 > in-flight=0 domain-list=0:278(-S--), > ^^ > (XEN) IRQ: 28 affinity:0001 vec:29 type=PCI-MSI status=00000010 > in-flight=0 domain-list=0:277(-S--), > (XEN) IRQ: 29 affinity:0001 vec:79 type=PCI-MSI status=00000010 > in-flight=0 domain-list=0:276(-S--), > (XEN) IRQ: 30 affinity:0001 vec:81 type=PCI-MSI status=00000010 > in-flight=0 domain-list=0:275(PS--), > (XEN) IRQ: 31 affinity:0001 vec:99 type=PCI-MSI status=00000010 > in-flight=0 domain-list=0:274(PS--), > > The interrupt in question belongs to 0000:00:1f.2, i.e. the > AHCI contoller. This would be consistent with what I've observed. > > Unfortunately I can't make sense of the kernel side config space > restore messages - an offset of 1 gets reported for the device in > question (and various other odd offsets exist), yet 3.5's > drivers/pci/pci.c:pci_restore_config_space_range() calls > pci_restore_config_dword() with an offset that's always divisible > by 4. Could you clarify which kernel version you were using here? > We first need to determine whether the kernel corrupts something > (after all, config space isn't protected from Dom0 modifications) - > if that's the case, we may need to understand why older Xen was > immune against that. If that's not the case, adding some extra > logging to Xen's pci_restore_msi_state() would seem the best > first step, plus (maybe) logging of Dom0 post-resume config space > accesses to the device in question. This particular failure is using linux-3.2.23 + some of Konrad's branches that haven't been merged into mainline (s3 branches, are probably the most appropriate here) > > The most likely thing happening (though unclear where) is that > the corresponding struct msi_msg instance gets cleared in the > course of the first resume (but after the corresponding interrupt > remapping entry already got restored). > > Jan > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.