[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing

On Mon, 2013-08-05 at 12:11 +0100, Stefano Stabellini wrote:
> On Mon, 5 Aug 2013, Jaeyong Yoo wrote:
> > > -----Original Message-----
> > > From: Stefano Stabellini [mailto:stefano.stabellini@xxxxxxxxxxxxx]
> > > Sent: Monday, August 05, 2013 1:28 AM
> > > To: Jaeyong Yoo
> > > Cc: xen-devel@xxxxxxxxxxxxx
> > > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write
> > > fault for dirty-page tracing
> > > 
> > > On Thu, 1 Aug 2013, Jaeyong Yoo wrote:
> > > > Add handling write fault in do_trap_data_abort_guest for dirty-page
> > > tracing.
> > > > Rather than maintaining a bitmap for dirty pages, we use the avail bit
> > > in p2m entry.
> > > > For locating the write fault pte in guest p2m, we use virtual-linear
> > > > page table that slots guest p2m into xen's virtual memory.
> > > >
> > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@xxxxxxxxxxx>
> > > 
> > > Looks good to me.
> > > I would appreciated some more comments in the code to explain the inner
> > > working of the vlp2m.
> > I got it.
> > 
> > One question: If you see patch #6, it implements the allocation and free of
> > vlp2m memory (xen/arch/arm/vlpt.c) which is almost the same to vmap
> > allocation (xen/arch/arm/vmap.c). To be honest, I copied vmap.c and change 
> > the virtual address start/end points and the name. While I was doing that, 
> > I think it would be better if we naje a common interface, something like 
> > Virtual address allocator. That is, if we create a virtual address allocator
> > 
> > giving the VA range from A to B, the allocator allocates the VA in between 
> > A and B. And, we initialize the virtual allocator instance at boot stage.
> Good question. I think it might be best to improve the current vmap
> (it's actually xen/common/vmap.c) so that we can have multiple vmap
> instances for different virtual address ranges at the same time.

Before we go off and do that:

I don't think this patch implements a linear p2m mapping in the sense in
which I intended it when I suggested it. The patch implements a manual
lookup with a kind of cache of the resulting mapping, I think.

A linear mapping means inserting the current p2m base pointer into Xen's
own pagetables in such a way that you can access a leaf node of the p2m
by dereferencing a virtual address. Given this setup there should be no
need for on-demand mapping as part of the log-dirty stuff, all the
smarts happen at context switch time.

Normally a linear memory map is done by creating a loop in the page
tables, i.e. HTTBR[N] would contain an entry which referenced HTTBR
again. In this case we actually have a separate p2m table which we want
to stitch into the normal tables, which makes it a bit different to the
classical case.

Lets assume both Xen's page tables and the 2pm are two level, to
simplify the ascii art.

So for the P2M you have:
`-------> P2M FIRST
          `----------> P2M SECOND
                       `-------------GUEST RAM

Now if we arrange that Xen's page tables contains the VTTBR in a top
level page table slot:
`-------> VTTBR
          `----------> P2M FIRST
                       `-------------P2M SECOND, ACCESSED AS XEN RAM

So now Xen can access the leaf PTE's of the P2M directly just by using
the correct virtual address.

This can be slightly tricky if P2M FIRST can contain super page
mappings, since you need to arrange to stop a level sooner to get the
correct PT entry. This means we need to arrange for a second virtual
address region which maps to that, by arranging for a loop in the page
table, e.g.

`-------> HTTBR
          `----------> VTTBR
                       `-------------P2M FIRST, ACCESSED AS XEN RAM

Under Xen, which uses LPAE and 3-level tables, I think the P2M SECOND
would require 16 first level slots in the xen_second tables, which need
to be context switched, the regions needed to hit the super page
mappings would need slots too. If we use the gap between 128M and 256M
in the Xen memory map then that means we are using
xen_second[64..80]=p2m[0..16] for the linear map of the p2m leaf nodes.
We can then use xen_second[80..144] to point back to xen_second allowing
xen_second[64..80] to be dereferenced and create the loop needed for
mapping for the superpage ptes in the P2M.

So given 

We have in the Xen mappings:

(*) here we only care about XEN_SECOND[64..80] but the loop maps
XEN_SECOND[0..512], a larger region which we can safely ignore.

So if my maths is correct this means Xen can access P2M THIRD entries at
virtual addresses 0x8000000..0xa000000 and P2M SECOND entries at
0x12000000..0x14000000, which means that the fault handler just needs to
lookup the P2M SECOND to check it isn't super page mapping and then
lookup P2M FIRST to mark it dirty etc.

If for some reason we also need to access P2M FIRST efficiently we could
add a third region, but I don't think we will be doing 1GB P2M mappings
for the time being.

It occurs to me now that with 16 slots changing on context switch and a
further 16 aliasing them (and hence requiring maintenance too) for the
super pages it is possible that the TLB maintenance at context switch
might get prohibitively expensive. We could address this by firstly only
doing it when switching to/from domains which have log dirty mode
enabled and then secondly by seeing if we can make use of global or
locked down mappings for the static Xen .text/.data/.xenheap mappings
and therefore allow us to use a bigger global flush.

In hindsight it might be the case that doing the domain_map_page walk on
each lookup might be offset by the need to do all that TLB maintenance
on context switch. It may be that this is something which we can only
resolve by measuring?

BTW, eventually we will have a direct map of all RAM for 64-bit only, so
we would likely end up with different schemes for p2m lookups for the
two sub arches, since the 64-bit direct map case the domain_map_page is
very cheap.

I hope my description of a linear map makes sense, hard to do without a
whiteboard ;-)


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.