[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC] KEXEC: allocate crash note buffers at boot time
On 30/11/11 09:20, Jan Beulich wrote: >>>> On 29.11.11 at 19:56, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >> As I have little to no knowledge of this stage of the boot process, is >> this a sensible way to be setting up the per_cpu areas? I have a >> sneaking suspicion that it will fall over if a CPU is onlined after >> boot, and may also fall over if a CPU is offlined and reonlined later. >> There appears to be no infrastructure currently in place for this type >> of initialization, which is quite possibly why the code exists in its >> current form. > I actually wonder how you came to those 4 statements you make in > the description - none of these seem to me like they are really an > issue (this would instead be plain bugs in Dom0). Did you actually look > at the existing Dom0 implementation(s)? > > Further, while not being a huge waste of memory, it still is one in case > kexec gets never enabled, especially when considering a Dom0 kernel > that's being built without CONFIG_KEXEC (or an incapable on, like any > pv-ops kernel to date). So I also conceptually question that change. > > Jan We (XenServer) have had many cases of the kexec path failing on customer boxes under weird and seemingly inexplicable circumstances. This is why I am working on trying to bullet-proofing the entire path. We have 1 ticket where the contents of a crash note is clearly bogus (a PRSTATUS is not 2GB long). We have a ticket where it appears that the kdump kernel has failed to reassemble /proc/vmcore from elfcorehdr as a few pcpus worth of crash notes are missing. I seem to remember a ticket from before my time with a crash while writing the crash notes in Xen itself. We even have a ticket stating that you get different crash notes depending on whether you crash using the Xen debug keys (crash notes appear completely bogus) or /proc/sysrq-trigger in dom0 (seems to be fine). All of these are uncertain on reproducibility (except the final one which was shown to reproduce on Xen-3.x and not on Xen-4.x so was not investigated further) and have a habit of being unreproducible on any of our local hardware, which makes fixing the problems tricky. So yes - the 4 points I have made are certainly not regular or common behavior, but given some of the tickets we have, I am fairly sure it is not a bug-free path. I have checked the 2.6.32 implementation of dom0's side of this and agree that it looks ok. However, it is my opinion that relying on a certain hypercalling pattern from dom0 is a perilous route for Xen, whether it is likely for that pattern to change in the future or not. Having said all of this, I agree with your second paragraph. As already noted in my other email in this thread, I need to change the implementation of this, so I will key the initial allocation of memory on whether crashkernel= has been passed. This should be sufficient indication as to whether the user minds having the space allocated or not. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |