[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns



>>> On 07.09.15 at 11:36, <wei.liu2@xxxxxxxxxx> wrote:
> On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
>> >>> On 06.09.15 at 22:05, <wei.liu2@xxxxxxxxxx> wrote:
>> > The original implementation of populate_pfns didn't consider the same
>> > pfn can be present multiple times in the array. The mechanism to prevent
>> > populating the same pfn multiple times only worked if the recurring pfn
>> > appeared in different batches.
>> > 
>> > This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
>> > which has several ptes pointing to same pfn, which results in an array
>> > containing recurring pfn.
>> 
>> Since you must have debugged this, and since the bisector appears
>> to have fingered a patch of mine on the Linux side which triggered
>> this, would you mind explaining this a little more? In particular I'm
>> worried that this may point out some other bug in Linux, as in the
>> context of the change there - dealing with the 1:1 mapping - I can't
>> see a legitimate reason for multiple PTEs to reference the same PFN.
>> 
> 
> Sure. I can try to explain this as clear as possible. Note that I didn't
> even look at Linux side changes because at that point I was sure there
> was a bug in migration v2.
> 
> So there is a step called normalise_page in migration v2. It's nop for
> HVM guest. For PV guest, it only cares about page table frames. To
> normalise a page table frame, the core idea is to replace all MFNs in
> page tables to PFNs inside the guest.
> 
> When restoring, there is a step called localise_page, which again is a
> nop for HVM guest. For PV guest, it does the reverse of normalise_page.
> It goes through all page table frames, extract all PFNs pointed to by
> PTEs in such frames, populate them, then reconstruct page tables.
> 
> What I discovered is that PTEs inside one page table frame contained the
> same PFN (something like fd42). The original implementation of toolstack
> populate_pfns didn't consider such scenario. As for what that PFN
> referred to, I wasn't sure and I didn't really care about that.

That's unfortunate, as that's precisely the information I was after,
since - as said - taking the repetition of the same PFN together with
what the triggering Linux change is about, it smells like there's
something wrong on the Linux side too. Do you at least recall how
many times that same PFN got repeated?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.