[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns



On 07/09/15 10:36, Wei Liu wrote:
> On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
>>>>> On 06.09.15 at 22:05, <wei.liu2@xxxxxxxxxx> wrote:
>>> The original implementation of populate_pfns didn't consider the same
>>> pfn can be present multiple times in the array. The mechanism to prevent
>>> populating the same pfn multiple times only worked if the recurring pfn
>>> appeared in different batches.
>>>
>>> This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
>>> which has several ptes pointing to same pfn, which results in an array
>>> containing recurring pfn.
>>
>> Since you must have debugged this, and since the bisector appears
>> to have fingered a patch of mine on the Linux side which triggered
>> this, would you mind explaining this a little more? In particular I'm
>> worried that this may point out some other bug in Linux, as in the
>> context of the change there - dealing with the 1:1 mapping - I can't
>> see a legitimate reason for multiple PTEs to reference the same PFN.
>>
> 
> Sure. I can try to explain this as clear as possible. Note that I didn't
> even look at Linux side changes because at that point I was sure there
> was a bug in migration v2.
> 
> So there is a step called normalise_page in migration v2. It's nop for
> HVM guest. For PV guest, it only cares about page table frames. To
> normalise a page table frame, the core idea is to replace all MFNs in
> page tables to PFNs inside the guest.
> 
> When restoring, there is a step called localise_page, which again is a
> nop for HVM guest. For PV guest, it does the reverse of normalise_page.
> It goes through all page table frames, extract all PFNs pointed to by
> PTEs in such frames, populate them, then reconstruct page tables.
> 
> What I discovered is that PTEs inside one page table frame contained the
> same PFN (something like fd42). The original implementation of toolstack
> populate_pfns didn't consider such scenario. As for what that PFN
> referred to, I wasn't sure and I didn't really care about that.
> 
> Let me know if you need more information.

I am somewhat amazed that this worked at all since finding multiple PTEs
with the same PFN in a match of 1024 pages would be really common.

But this bug would be avoided if the page table pages were more often
located towards the end of RAM such that the PFNs were already populated
as part of the normal page transfer.

I suppose it is possible that the Linux commit found by the bisector
results in changes to where page table pages are allocated.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.