[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] [HVM] Corruption of buffered_io_page
> -----Original Message----- > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of > Trolle Selander > Sent: 07 December 2006 09:51 > To: xen-devel@xxxxxxxxxxxxxxxxxxx > Subject: Re: [Xen-devel] [HVM] Corruption of buffered_io_page > > I thought i had replied to the list, but apparently gmail's > default reply action goes to the last poster, not the mailing > list. Ian got these answers already, but I'll cut & paste to > make sure it gets to the general list as well: > > > > Distro is FC6, compiler is gcc-4.1.1, guest os is OS/2, > worload is the boot process. :) > No drivers are loaded yet at this stage - it happens > fairly early in the boot. However, after I added the > corruption-catching "padding" struct member, the boot does in > fact progress to the driver loading stage, although with > severely corrupted boot-logo graphics. > Since it currently happens reproducibly after a > specific "no op" vmexit (read from an unused port), Mats's > suggestion of marking the iopage read-only sounds doable if I > insert code to set the page readonly when this specific > vmexit occurs. From what I saw when running qemu in the > debugger, there's no "proper" use of the page about to occur, > so the only thing that will write to it should be whatever is > doing it erroneously. I'll try that tomorrow. > > One correction: I managed to confuse myself a bit here. > The very last vmexit_ioio at which the guest stalls is a read > from 0x1f7, but when that io happens, the iopage is already > corrupted, and that's why it stalls - qemu-dm is "stuck" and > never performs the io. The port 0x23 is the io preceeding > that one - the last one that "gets through", which is why > that was the one I've used to trace things. > > I don't know if it's any clue to anyone, but the bad > value that gets written into read_pointer is 0x1df1000. One thing is for sure, it's not a page-table entry. But it could be the value of a physical page-address. Is this value in any of the registers around the time of the crash? > > > > Now to what you said - I thought Keir's patch to fixed up all > the segment base = 0 assumptions in x86_emulate? At least > we're past the problem that was causing that I posted about > before. I must confess I never actually looked at the code, > because Keir said the patch would fix all the segment base = > 0 assumptions, and once the patch showed up in mercurial and > I built from that changeset, I didn't hit the seg.base != 0 > problem anymore. There was patch(es) from Jan Beulich to fix the HVM side of seg.base != 0, but as far as I've seen, Keir hasn't posted his "big patch" yet - it's possible that Keir could send you a "private patch". This patch fixed x86_emulate.c, which isn't in the HVM section, it's the part that fixes up page-table writes (which you may not see many of at the early part of boot, so it may not be an issue, of course). Whilst marking the page read-only and trapping on the page-fault is indeed doable, I'd also add a check on every vmexit and just at the time of doing vmrun (in the C code just before calling svm_asm_do_resume() or some such). -- Mats > > > On 12/7/06, Petersson, Mats <Mats.Petersson@xxxxxxx> wrote: > > > > > -----Original Message----- > > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx > > [mailto: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx > <mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx> ] On Behalf Of > Ian Pratt > > Sent: 06 December 2006 22:05 > > To: Trolle Selander; xen-devel@xxxxxxxxxxxxxxxxxxx > > Subject: RE: [Xen-devel] [HVM] Corruption of buffered_io_page > > > > > read_pointer is the first member of buffered_ioreq_t, so on > > the hunch > > that > > > the corruption was occuring by something other than > a wrong value > > actually > > > being written into the structure member, either overflowing > > a previous > > > structure in memory or a pointer var mistake. I > thus added a 64bit > > dummy > > > member to "pad" the buffered_ioreq_t structure at the > > start, and as I > > had > > > suspected, the bad value does get written into this > dummy member > > rather > > > than the read_pointer. I haven't (yet) been able to track > > down what it > > is > > > that actually writes the bad value, and any help > finding it would be > > > welcome. > > > > What compiler are you using? What guest OS? Are you using PV > > or emulated > > drivers? Any idea if there are particular workloads > that provoke the > > problem? > > I'll answer for Trolle as best as I can: > Compiler: gcc 4.1 I believe. > Guest OS: OS/2 > Drivers would be emulated ones. > I think it's failing during initial boot, as Trolle > hasn't told me "It > works" yet... ;-) > > By the way, I'm still a bit worried that this is caused > by segment base > != 0 in x86_emulate.c - this can cause all sorts of > "interesting" > interaction between the page-table updates and actual > memory being > affected. > > -- > Mats > > > > Best, > > Ian > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxxxxxxxx > <mailto:Xen-devel@xxxxxxxxxxxxxxxxxxx> > > http://lists.xensource.com/xen-devel > > > > > > > > > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |