[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Xen-users] substantial shutdown delay for PV guests with PCI -passthrough



Adding xen-devel. Full thread starts at
http://lists.xen.org/archives/html/xen-users/2014-03/msg00102.html

On Mon, 2014-03-17 at 19:13 +0100, Atom2 wrote:

> > Any chance you could try 4.3.2, or even 4.4.0?
> Unfortunately neither of these versions are currently available as 
> stable ebuilds for my distribution, but I assume it shouldn't be long 
> before there's some movement.

Looking at the diff to tools/libxl/libxl_pci.c I don't see any pertinent
looking fixes so it seems probably this issue still exists.

> >
> >> The system is capable of vt-d and uses a Xeon E3-1260L processor.
> >>
> >> Do these observations ring a bell with anybody or is this even expected
> >> behaviour. If this is not normal - which I would expect as I have not
> >> been able to find any information relating to substantial delays during
> >> shutdown - how would I go about getting to the grounds of this?
> >
> > My guess would be that xl process which is managing the domain destroy
> > is waiting for something (perhaps pciback) to confirm shutdown for each
> > device and this is timing out in series, leading to the delays. You
> > might find something in the logs /var/log/xen pointing to something like
> > this.
> >
> > If not then if you start the guest with "xl -vvv create -F <cfg>" then
> > the xl process which is monitoring the domain will stay in the
> > foreground and be logging to stdout (I think). If you then issue the
> > shutdown from another shell perhaps there will be some obvious gaps in
> > the logs as things shutdown which might help.
> That worked and there also was some output - please find the log from 
> start to finnish attached to this mail. I have marked various points in 
> the log: First the point where the startup was done and the domU was 
> live and secondly those 4 points in time (or rather output) where the 
> 10s delay occured.

Quoting the relevant bit for -devel, full log is at
http://lists.xen.org/archives/html/xen-users/2014-03/txtl6VscE4NMf.txt:

        Domain 3 has shut down, reason code 0 0x0
        Action for shutdown reason code 0 is destroy
        Domain 3 needs to be cleaned up: destroying the domain
        libxl: debug: libxl.c:1252:libxl_domain_destroy: ao 0x7f1dddf2b850: 
create: how=(nil) callback=(nil) poller=0x7f1dddf2cd70
        libxl: error: libxl_pci.c:1248:do_pci_remove: xc_domain_irq_permission 
irq=17
        
        <NOTE: at this point a 10s pause happens>
        
        libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend 
/local/domain/0/backend/pci/3/0 not ready
        libxl: debug: libxl_pci.c:173:libxl__device_pci_remove_xenstore: pci 
backend at /local/domain/0/backend/pci/3/0 is not ready
        libxl: error: libxl_pci.c:1248:do_pci_remove: xc_domain_irq_permission 
irq=16
        
        <NOTE: at this point a 10s pause happens>
        
        libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend 
/local/domain/0/backend/pci/3/0 not ready
        libxl: debug: libxl_pci.c:173:libxl__device_pci_remove_xenstore: pci 
backend at /local/domain/0/backend/pci/3/0 is not ready
        [repeat for more devices]

Do you get anything in "xl dmesg" or dom0's "dmesg" corresponding to
these events?

Looking at do_pci_remove after the call to xc_domain_irq_permission
(which fails, but I don't think that relates to the delay) we then call
(conditionally) libxl__device_pci_reset, xc_deassign_device,
libxl__device_pci_remove_common and libxl__device_pci_remove_xenstore,
with no logging to indicate which we are calling (not helpful!).

The "is not ready" message comes from libxl__device_pci_remove_xenstore
which calls libxl__wait_for_backend. The latter has been rewritten a bit
since 4.3.1 but not in a way which I think would affect this case.
libxl__wait_for_backend does have a usleep(10000) in it -- which is
certainly the source of the delay, but I'd like to explain how we got to
waiting like that anyway (IanJ: do you have PCI on your hitlist for
asyncing up?)

This thing about pciback not being ready rings a bell. I've cc'd a few
folks who I think might remember more.

While the domain is happily running can you provide the output of
"xenstore-ls -fp" -- I'm curious what state pciback is in. It should be
4, if not then that would be the problem.

> BTW: I don't know whether it makes any difference, but I am only using 
> xen-pciback.hide=(bb:dd.f)(...) on the grub command line for a number of 
> devices including those that I pass through to this domU - there's 
> nothing else happening in the dom0 with those devices priot to starting 
> the domU and there are also no driver modules available for any of the 
> hidden hardware (except for one of the hidden USB Controllers of the 
> motherboard which is also passed through) in dom0.

I don't think that should matter here.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.