[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Strange issue: DomU not saving when using direct HW access, plus non-working after restart



Hi Mark,

thanks a lot lot for your answer!
Since it is somewhat of a productionbox (all email and web services running over it) I'll need to restart it some coming night.

I'll definitely keep you updated.

Thx again.

Regards
Falko


Mark Williamson schrieb:
I do have a real strange problem here:

My environment: Xen 3.02 on SuSE 10.1

In dom0 I disable eth0 with the following lines in /etc/init.d/boot.local:
/sbin/modprobe pciback
/bin/echo -n 0000:01:00.0 > /sys/bus/pci/drivers/e1000/unbind
/bin/echo -n 0000:01:00.0 > /sys/bus/pci/drivers/pciback/new_slot
/bin/echo -n 0000:01:00.0 > /sys/bus/pci/drivers/pciback/bind


Than I start my domU with the following parameters:
[...]
pci = [ '01:00.0' ]
dhcp = 'dhcp'
[...]

Basically every thing's fine so far. domU is booting a accessing the net
via dhcp over it's HW assigned eth0.

Cool.

But when I reboot dom0 it tries to save domU (which seems to be OK).
After rebooting dom0 starts to restart domU which fails and results in a
"cold boot" of domU (incl. file check etc on its boot).

Now if I try to save and restore the domU manually it fails and I get
this messages:
Error: pci: Invalid config setting bus: none

Even stranger:
If I then try to start domU manually with xm create domU -c, dhcp is
just not working!
domU finds the assigned HW (eth0) but is not able to set up the network
at all! And I can't get domU back to work until I reboot the whole
system (dom0) completely!

Suspend / resume isn't supported for domains that have direct access to PCI devices - I'm surprised the tools even allow it (they probably shouldn't!).

It's strange that subsequently starting the domain manually also fails - are you sure that the domain you attempted to restore wasn't still hanging around somewhere? If it really is failing when there are no other domains fighting for that card, it could be that the state of the ethernet card (or, I guess, maybe that of the Xen PCI pciback driver) has been messed up by the failed operations and that's why you need a whole machine reboot.

The simple fix is to disable the automatic suspend/resume of that domain on reboot; have it shutdown and reboot by dom0 instead. Other domains that don't have direct hardware access may still be safely suspend-resumed.

Something that I'd be interested in is whether once you've got to the wedged state of requiring a dom0 reboot, whether you could bring up that ethernet device in dom0 (by rebinding it back to the e1000 driver). This would tell us if the device is wedged, vs pciback. Please note that trying this (or starting new driver domains once you've got into the wedged state or doing a resume of a saved driver domain either explicitly or at dom0 reboot) is quite possibly going to send weird commands to your NIC; I'd not expect this to actually harm modern hardware but it's not impossible you could get some instability / corruption on the host system (not just the domU).

So, if it's *not* an important / production box containing any useful data, I'd be interested if you could experiment a bit more - otherwise just disable the automatic suspend/resume on dom0 reboot for that domain and your problem will be solved.

Does that answer your question? It's great to have users / testers of the driver domains functionality, so please let us know how you get on!

Cheers,
Mark


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.