[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS



On 16/08/12 12:07, Stefano Stabellini wrote:
> On Wed, 15 Aug 2012, David Vrabel wrote:
>> On 15/08/12 14:55, Stefano Stabellini wrote:
>>> On Wed, 15 Aug 2012, David Vrabel wrote:
>>>> On 14/08/12 15:12, Attilio Rao wrote:
>>>>> On 14/08/12 14:57, David Vrabel wrote:
>>>>>> On 14/08/12 13:24, Attilio Rao wrote:
>>>> After looking at it some more, I think this pv-ops is unnecessary. How
>>>> about the following patch to just remove it completely?
>>>>
>>>> I've only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning
>>>> is sound.
>>>
>>> Do you have more then 4G to dom0 on those boxes?
>>
>> I've tested with 6G now, both 64-bit and 32-bit with HIGHPTE.
>>
>>> It certainly fixed a serious crash at the time it was introduced, see
>>> http://marc.info/?l=linux-kernel&m=129901609503574 and
>>> http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big
>>> changed in kernel_physical_mapping_init, I think we still need it.
>>> Depending on the e820 of your test box, the kernel could crash (or not),
>>> possibly in different places.

FYI, it looks like pgt_buf is now located low down, which is why these
changes worked for me.  Possibly this changed as part of a memblock
refactor.

>>>>>> Having said that, I couldn't immediately see where pages in (end, 
>>>>>> pgt_buf_top] was getting set RO.  Can you point me to where it's 
>>>>>> done?
>>>>>>
>>>>>
>>>>> As mentioned in the comment, please look at xen_set_pte_init().
>>>>
>>>> xen_set_pte_init() only ensures it doesn't set the PTE as writable if it
>>>> is already present and read-only.
>>>
>>> look at mask_rw_pte and read the threads linked above, unfortunately it
>>> is not that simple.
>>
>> Yes, I was remembering what 32-bit did here.
>>
>> The 64-bit version is a bit confused and it often ends up /not/ clearing
>> RW for the direct mapping of the pages in the pgt_buf because any
>> existing RW mappings will be used as-is.  See phys_pte_init() which
>> checks for an existing mapping and only sets the PTE if it is not
>> already set.
> 
> not all the pagetable pages might be already mapped, even if they are
> already hooked into the pagetable

Yes, I think this is easy to handle though.

    /*
     * Make sure that active page table pages are not mapped RW.
     */
    if (is_early_ioremap_ptep(ptep)) {
        /*
         * If we're updating an early ioremap PTE, then this PFN may
         * already be in the linear mapping.  If it is, use the
         * existing RW bit.
         */
        unsigned int level;
        pte_t *linear_pte;

        linear_pte = lookup_address(__va(PFN_PHYS(pfn)), &level);
        if (linear_pte && !pte_write(*linear_pt))
            pte = pte_wrprotect(pte);
    } else if (pfn >= pgt_buf_start && pfn < pgt_buf_end) {
        /*
         * The PFN may not be mapped but may be hooked into the page
         * tables.  Make sure this new mapping is read-only.
         */
        pte = pte_wrprotect(pte);
    }

However, the real subtlety are page tables that are mapped as they
themselves are hooked in.

As as example, let's say pgt_buf_start (s) is on a 1 GiB boundary and
pgt_buf_top (t) is below the next 2 MiB boundary.  We're mapping memory
with 4 KiB pages from s to s + 4 MiB.  This requires a new PUD and two
new PMDs.

To map this region:

1. Allocate a new PUD. (@ e = pgt_buf_end)

2. Allocate a new PMD. (@ e + 1)

3. Fill in this PMD's PTEs.  This covers pgt_buf so sets (s, e + 1) as RO.

4. Call pmd_populate() to hook-in this PMD.  This does not set e+1 as RO
(but this is ok as it already is).

5. Allocate a new PMD  (@ e + 2)

6. Fill in this PMD's PTEs.

7. Call pmd_populate() to hook in this PMD.  This does not set e+2 as RO
as the region isn't mapped yet.

8. Call pud_populate() to hook in this PUD.  This sets e as RO but e + 2
is still RW.

9. Boom!

It may be possible to fixup the permissions as the pages are hooked in.
 i.e., if this page's entries cover (pgt_buf_start, pgt_buf_end], walk
the entries and any child tables and fixup the permissions of the leaf
entries.

This would walk the tables a few times unless we were careful to only
walk them when hooking a page into an active table.

It was fun trying to understand this, but I think I'll give up now...

David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.