[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2
On 11/15/15 7:05 PM, Atom2 wrote: > Am 15.11.15 um 21:12 schrieb Doug Goldstein: >> On 11/14/15 6:14 PM, Atom2 wrote: >>> Am 14.11.15 um 21:32 schrieb Andrew Cooper: >>>> On 14/11/2015 00:16, Atom2 wrote: >>>>> Am 13.11.15 um 11:09 schrieb Andrew Cooper: >>>>>> On 13/11/15 07:25, Jan Beulich wrote: >>>>>>>>>> On 13.11.15 at 00:00, <ariel.atom2@xxxxxxxxxx> wrote: >>>>>>>> Am 12.11.15 um 17:43 schrieb Andrew Cooper: >>>>>>>>> On 12/11/15 14:29, Atom2 wrote: >>>>>>>>>> Hi Andrew, >>>>>>>>>> thanks for your reply. Answers are inline further down. >>>>>>>>>> >>>>>>>>>> Am 12.11.15 um 14:01 schrieb Andrew Cooper: >>>>>>>>>>> On 12/11/15 12:52, Jan Beulich wrote: >>>>>>>>>>>>>>> On 12.11.15 at 02:08, <ariel.atom2@xxxxxxxxxx> wrote: >>>>>>>>>>>>> After the upgrade HVM domUs appear to no longer work - >>>>>>>>>>>>> regardless >>>>>>>>>>>>> of the >>>>>>>>>>>>> dom0 kernel (tested with both 3.18.9 and 4.1.7 as the dom0 >>>>>>>>>>>>> kernel); PV >>>>>>>>>>>>> domUs, however, work just fine as before on both dom0 kernels. >>>>>>>>>>>>> >>>>>>>>>>>>> xl dmesg shows the following information after the first >>>>>>>>>>>>> crashed HVM >>>>>>>>>>>>> domU which is started as part of the machine booting up: >>>>>>>>>>>>> [...] >>>>>>>>>>>>> (XEN) Failed vm entry (exit reason 0x80000021) caused by >>>>>>>>>>>>> invalid guest >>>>>>>>>>>>> state (0). >>>>>>>>>>>>> (XEN) ************* VMCS Area ************** >>>>>>>>>>>>> (XEN) *** Guest State *** >>>>>>>>>>>>> (XEN) CR0: actual=0x0000000000000039, >>>>>>>>>>>>> shadow=0x0000000000000011, >>>>>>>>>>>>> gh_mask=ffffffffffffffff >>>>>>>>>>>>> (XEN) CR4: actual=0x0000000000002050, >>>>>>>>>>>>> shadow=0x0000000000000000, >>>>>>>>>>>>> gh_mask=ffffffffffffffff >>>>>>>>>>>>> (XEN) CR3: actual=0x0000000000800000, target_count=0 >>>>>>>>>>>>> (XEN) target0=0000000000000000, target1=0000000000000000 >>>>>>>>>>>>> (XEN) target2=0000000000000000, target3=0000000000000000 >>>>>>>>>>>>> (XEN) RSP = 0x0000000000006fdc (0x0000000000006fdc) RIP = >>>>>>>>>>>>> 0x0000000100000000 (0x0000000100000000) >>>>>>>>>>>> Other than RIP looking odd for a guest still in non-paged >>>>>>>>>>>> protected >>>>>>>>>>>> mode I can't seem to spot anything wrong with guest state. >>>>>>>>>>> odd? That will be the source of the failure. >>>>>>>>>>> >>>>>>>>>>> Out of long mode, the upper 32bit of %rip should all be zero, >>>>>>>>>>> and it >>>>>>>>>>> should not be possible to set any of them. >>>>>>>>>>> >>>>>>>>>>> I suspect that the guest has exited for emulation, and there >>>>>>>>>>> has been a >>>>>>>>>>> bad update to %rip. The alternative (which I hope is not the >>>>>>>>>>> case) is >>>>>>>>>>> that there is a hardware errata which allows the guest to >>>>>>>>>>> accidentally >>>>>>>>>>> get it self into this condition. >>>>>>>>>>> >>>>>>>>>>> Are you able to rerun with a debug build of the hypervisor? >>> [big snip] >>>>>>>>>> Now _without_ the debug USE flag, but with debug information in >>>>>>>>>> the binary (I used splitdebug), all is back to where >>>>>>>>>> the problem >>>>>>>>>> started off (i.e. the system boots without issues >>>>>>>>>> until such >>>>>>>>>> time it starts a HVM domU which then crashes; PV >>>>>>>>>> domUs are >>>>>>>>>> working). I have attached the latest "xl dmesg" >>>>>>>>>> output with the >>>>>>>>>> timing information included. >>>>>>>>>> >>>>> I hope any of this makes sense to you. >>>>> >>>>> Again many thanks and best regards >>>>> >>>> Right - it would appear that the USE flag is definitely not what you >>>> wanted, and causes bad compilation for Xen. The do_IRQ disassembly >>>> you sent is a the result of disassembling a whole block of zeroes. >>>> Sorry for leading you on a goose chase - the double faults will be the >>>> product of bad compilation, rather than anything to do with your >>>> specific problem. >>> Hi Andrew, >>> there's absolutely no need to appologize as it is me who asked for help >>> and you who generously stepped in and provided it. I really do >>> appreciate your help and it is for me, as the one seeking help, to >>> provide all the information you deem necessary and you ask for. >>>> However, the final log you sent (dmesg) is using a debug Xen, which is >>>> what I was attempting to get you to do originally. >>> Next time I know better how to arrive at a debug XEN. It's all about >>> learning. >>>> We still observe that the VM ends up in 32bit non-paged mode but with >>>> an RIP with bit 32 set, which is an invalid state to be in. However, >>>> there was nothing particularly interesting in the extra log >>>> information. >>>> >>>> Please can you rerun with "hvm_debug=0xc3f", which will cause far more >>>> logging to occur to the console while the HVM guest is running. That >>>> might show some hints. >>> I haven't done that yet - but please see my next paragraph. If you are >>> still interested in this, for whatever reason, I am clearly more than >>> happy to rerun with your suggested option and provide that information >>> as well. >>>> Also, the fact that this occurs just after starting SeaBIOS is >>>> interesting. As you have switched versions of Xen, you have also >>>> switched hvmloader, which contains the SeaBIOS binary embedded in it. >>>> Would you be able to compile both 4.5.1 and 4.5.2 and switch the >>>> hvmloader binaries in use. It would be very interesting to see >>>> whether the failure is caused by the hvmloader binary or the >>>> hypervisor. (With `xl`, you can use >>>> firmware_override="/full/path/to/firmware" to override the default >>>> hvmloader). >>> Your analysis was absolutely spot on. After re-thinking this for a >>> moment, I thought going down that route first would make a lot of sense >>> as PV guests still do work and one of the differences to HVM domUs is >>> that the former do _not_ require SeaBIOS. Looking at my log files of >>> installed packages confirmed an upgrade from SeaBIOS 1.7.5 to 1.8.2 in >>> the relevant timeframe which obviously had not made it to the hvmloader >>> of xen-4.5.1 as I did not re-compile xen after the upgrade of SeaBIOS. >>> >>> So I re-compiled xen-4.5.1 (obviously now using the installed SeaBIOS >>> 1.8.2) and the same error as with xen-4.5.2 popped up - and that seemed >>> to strongly indicate that there indeed might be an issue with SeaBIOS as >>> this probably was the only variable that had changed from the original >>> install of xen-4.5.1. >>> >>> My next step was to downgrade SeaBIOS to 1.7.5 and to re-compile >>> xen-4.5.1. Voila, the system was again up and running. While still >>> having SeaBIOS 1.7.5 installed, I also re-compiled xen-4.5.2 and ... you >>> probably guessed it ... the problem was gone: The system boots up with >>> no issues and everything is fine again. >>> >>> So in a nutshell: There seems to be a problem with SeaBIOS 1.8.2 >>> preventing HVM doamins from successfully starting up. I don't know what >>> this is triggered from, if this is specific to my hardware or whether >>> something else in my environment is to blame. >>> >>> In any case, I am again more than happy to provide data / run a few >>> tests should you wish to get to the grounds of this. >>> >>> I do owe you a beer (or any other drink) should you ever be at my >>> location (i.e. Vienna, Austria). >>> >>> Many thanks again for your analysis and your first class support. Xen >>> and their people absolutely rock! >>> >>> Atom2 >> I'm a little late to the thread but can you send me (you can do it >> off-list if you'd like) the USE flags you used for xen, xen-tools and >> seabios? Also emerge --info. You can kill two birds with one stone by >> using emerge --info xen. > Hi Doug, > here you go: Thanks. I'll use your configuration as a test point to update a few things with regard to the Gentoo ebuilds. I'm not the maintainer of Xen and SeaBIOS but I don't think the maintainers will have much issue with the changes. > USE flags: > app-emulation/xen-4.5.2-r1::gentoo USE="-custom-cflags -debug -efi > -flask -xsm" > app-emulation/xen-tools-4.5.2::gentoo USE="hvm pam pygrub python qemu > screen system-seabios -api -custom-cflags -debug -doc -flask (-ocaml) > -ovmf -static-libs -system-qemu" PYTHON_TARGETS="python2_7" > sys-firmware/seabios-1.7.5::gentoo USE="binary" So looking at how SeaBIOS and friends are built I think we have an issue that needs to be addressed. That being said, you wouldn't have this issue if you did USE="-system-seabios -system-qemu". I believe you would also be ok if you had done USE="system-seabios system-qemu". But after a quick look at everything USE="system-seabios -system-qemu" will definitely do the wrong thing. > emerge --info: Please see the attached file >> I'm not too familiar with the xen ebuilds but I was pretty sure that >> xen-tools is what builds hvmloader and it downloads a copy of SeaBIOS >> and builds it so that it remains consistent. But obviously your >> experience shows otherwise. > You are right, it's xen-tools that builds hvmloader. If I remember > correctly, the "system-seabios" USE flag (for xen-tools) specifies > whether sys-firmware/seabios is used and the latter downloads SeaBIOS in > it's binary form provided its "binary" USE flag is set. At least that's > my understanding. >> I'm looking at some ideas to improve SeaBIOS packaging on Gentoo and >> your info would be helpful. > Great. Whatever makes gentoo and xen stronger will be awesome. What > immediately springs to mind is to create a separate hvmloader package > and slot that (that's just an idea and probably not fully thought > through, but ss far as I understood Andrew, it would then be possible to > specify the specific firmware version [i.e. hvmloader] to use on xl's > command line by using firmware_override="full/path/to/firmware"). > > I also found out that an upgrade to sys-firmware/seabios obviously does > not trigger an automatic re-emerge of xen-tools and thus hvmloader. > Shouldn't this also happen automatically as xen-tools depends on seabios? > > Thanks and best regards Atom2 > > > P.S. If you prefer to take this off-list, just reply to my mail address. -- Doug Goldstein Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |