[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Re: PoD issue
On Fri, Feb 19, 2010 at 1:53 AM, Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx> wrote: >> On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap >> <George.Dunlap@xxxxxxxxxxxxx> wrote: >> > Yeah, the OSS tree doesn't get the kind of regression testing it >> > really needs at the moment. I was using the OSS balloon drivers when >> > I implemented and submitted the PoD code last year. I didn't have any >> > trouble then, and I was definitely using up all of the memory. But I >> > haven't done any testing on OSS since then, basically. >> > >> >> Is it expected that booting HVM guests with maxmem > memory is >> unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily >> crash the guest and occasionally the entire server. > > Obviously the platform should never crash, and that's very concerning. > > Are you running a balloon driver in the guest? It's essential that you do, > because it needs to get in fairly early in the guest boot and allocate the > difference between maxmem and target memory. The populate-on-demand code > exists just to cope with things like the memory scrubber running ahead of the > balloon driver. If you're not running a balloon driver the guest is doomed to > crash as soon as it tries using more than target memory. > > All of this requires coordination between the tool stack, PoD code, and PV > drivers so that sufficient memory gets ballooned out. I expect the > combination that has had most testing is the XCP toolstack and Citrix PV > windows drivers. > Initially I was using the XCP 0.1.1 WinPV drivers (win server 2003 sp2) and the guest crashed when I tried to install software via emulated cdrom. Nothing about the crash was reported in the qemu log file and xend.log wasn't very helpful either but here's the relevant portion: [2010-02-17 20:42:49 4253] DEBUG (DevController:139) Waiting for devices vtpm. [2010-02-17 20:42:49 4253] INFO (XendDomain:1182) Domain win2 (30) unpaused. [2010-02-17 20:48:05 4253] WARNING (XendDomainInfo:1888) Domain has crashed: name=win2 id=30. [2010-02-17 20:48:06 4253] DEBUG (XendDomainInfo:2734) XendDomainInfo.destroy: domid=30 [2010-02-17 20:48:06 4253] DEBUG (XendDomainInfo:2209) Destroying device model I unsuccessfully attempted the install several more times then tried copying files from the emulated cd which also crashed the guest each time. I wasn't even thinking about the fact that I had set maxmem/pod so I blamed the xcp winpv drivers and switched to gplpv (0.10.0.138). Same crashes with gplpv. At this point I hadn't checked 'xm dmesg' which was the only place that the pod/p2m error is reported so I changed to pure HVM mode and tried to copy the files from emulated cd. That's when the real trouble started. The rdp and vnc connections to the guest froze as did the ssh to the dom0. This server was also hosting 7 linux pv guests. I could ping the guests and partially load some of their websites but couldn't login via ssh. I suspeced that the HDDs were overloaded causing disk io to block the guests. I was on site so I went to check server and was shocked to find no disk activity. The monitor output was blank and I couldnt wake it up. Maybe the usb keyboard was unable to be enumerated because I couldnt even toggle the numlock, etc after several reconnections. I power cycled the host and checked the logs but there was no evidence of a crash other than one of the software raid devices being unclean on startup. Perhaps there was interesting data logged to 'xm dmesg' or waiting to be written to disk at the time of the crash. I'm afraid this server/mb is incapable of logging data to the serial port. I've attempted to do so several times both before and after this crash. Of course the simple fix is to remove maxmem from the domU config file for the time being. Eventually people will use pod on production systems. Relying on the guest to have a solid balloon driver is unacceptable. A guest could accidentally (or otherwise) remove the pv drivers to bring down an entire host. When I can free up a server with serial logging for testing I will try to reproduce this crash. Keith Coleman _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |