[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing



On Fri, 2013-11-15 at 11:26 +0900, Jaeyong Yoo wrote:
> > -----Original Message-----
> > From: Ian Campbell [mailto:Ian.Campbell@xxxxxxxxxx]
> > Sent: Wednesday, November 13, 2013 1:56 AM
> > To: Jaeyong Yoo
> > Cc: xen-devel@xxxxxxxxxxxxx
> > Subject: Re: [Xen-devel] [PATCH RESEND v5 5/6] xen/arm: Implement
> > hypercall for dirty page tracing
> > 
> > On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:
> > > Add hypercall (shadow op: enable/disable and clean/peek dirtied page
> > bitmap).
> > > It consists of two parts: dirty page detecting and saving.
> > > For detecting, we setup the guest p2m's leaf PTE read-only and
> > > whenever the guest tries to write something, permission fault happens
> > and traps into xen.
> > > The permission-faulted GPA should be saved for the toolstack (when it
> > > wants to see which pages are dirtied). For this purpose, we temporarily
> > save the GPAs into bitmap.
> > >
> > > Changes from v4:
> > > 1. For temporary saving dirty pages, use bitmap rather than linked list.
> > > 2. Setup the p2m's second level page as read-write in the view of xen's
> > memory access.
> > >    It happens in p2m_create_table function.
> > >
> > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@xxxxxxxxxxx>
> > > ---
> > >  xen/arch/arm/domain.c           |  14 +++
> > >  xen/arch/arm/domctl.c           |   9 ++
> > >  xen/arch/arm/mm.c               | 103 +++++++++++++++++++-
> > >  xen/arch/arm/p2m.c              | 206
> > ++++++++++++++++++++++++++++++++++++++++
> > >  xen/arch/arm/traps.c            |   9 ++
> > >  xen/include/asm-arm/domain.h    |   7 ++
> > >  xen/include/asm-arm/mm.h        |   7 ++
> > >  xen/include/asm-arm/p2m.h       |   4 +
> > >  xen/include/asm-arm/processor.h |   2 +
> > >  9 files changed, 360 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index
> > > c0b5dd8..0a32301 100644
> > > --- a/xen/arch/arm/domain.c
> > > +++ b/xen/arch/arm/domain.c
> > > @@ -215,6 +215,12 @@ static void ctxt_switch_to(struct vcpu *n)
> > >      WRITE_SYSREG(hcr, HCR_EL2);
> > >      isb();
> > >
> > > +    /* for dirty-page tracing
> > > +     * XXX: how do we consider SMP case?
> > > +     */
> > > +    if ( n->domain->arch.dirty.mode )
> > > +        restore_vlpt(n->domain);
> > 
> > This is an interesting question. xen_second is shared between all pcpus,
> > which means that the vlpt is currently only usable from a single physical
> > CPU at a time.
> > 
> > Currently the only per-cpu area is the domheap region from 2G-4G. We could
> > steal some address space from the top or bottom of there?
> 
> oh right hmm. Then, how about place the vlpt area starting from 2G (between 
> dom
> heap and xen heap), and let the map_domain_page map the domain page starting 
> from
> the VLPT-end address?
> 
> For this, I think just changing DOMHEAP_VIRT_START to some place (maybe 
> 0x88000000)
> would be suffice.

You might find that stealing from the start needs a bit more thought in
the PT setup code wrt offsets and alignments and assumption that it
starts on a 1GB boundary etc. In which case stealing the space from the
end might turn out to be simpler by reducing DOMHEAP_ENTRIES. It would
be good to keep DOMHEAP_ENTRIES a power of two I suspect.

> > > +static inline void mark_dirty_bitmap(struct domain *d, paddr_t addr)
> > > +{
> > > +    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;
> > > +    int bit_index = PFN_DOWN(addr - ram_base);
> > > +    int page_index = bit_index >> (PAGE_SHIFT + 3);
> > 
> > +3?
> 
> +3 is for changing the bit-index to byte-index. I think it would be better to
> use BYTE_SHIFT kind of thing.

Or a comment, yes.

[...]
> > > +    vlp2m_pte = get_vlpt_3lvl_pte(addr);
> > > +    if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 &&
> > > +         vlp2m_pte->p2m.avail == 0 /* reuse avail bit as read-only */ )
> > > +    {
> > > +        lpae_t pte = *vlp2m_pte;
> > > +        pte.p2m.write = 1;
> > > +        write_pte(vlp2m_pte, pte);
> > 
> > Do we not need the p2m lock around here somewhere?
> 
> Oh, I think this brings an another question. In the dirty-page tracing phase,
> would it be OK to permit changes in p2m page table (such as adding new pages)?
> In that case, VLPT and dirty-bitmap should be re-prepared for taking account 
> this change.

The p2m can certainly change during migration -- e.g. grant table ops
resulting from IO.

Those changes should all call back into the p2m. Hopefully this is in
the common grant table code etc but you should check, especially that
the ARM implementation of the callback isn't a NULL!

Or maybe this is done in arch code -- best to check when and where x86
does this I think.

> 
> > 
> > > +        flush_tlb_local();
> > 
> > If SMP was working this would need to be inner shareable to cope with
> > vcpus running on other pcpus.
> 
> You meant the p2m page table should be inner shareable?

I mean the TLB flush would need to be to all cpus in the inner shareable
domain, not just the local processor.

> > 
> > > +    }
> > > +    else
> > > +    {
> > > +        spin_lock(&d->arch.dirty.lock);
> > > +        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> > > +        {
> > > +            int j = 0;
> > > +            uint8_t *bitmap;
> > > +            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
> > > +                                 d->arch.dirty.bitmap[i],
> > > +                                 bitmap_size < PAGE_SIZE ? bitmap_size :
> > > +
> > > + PAGE_SIZE);
> > 
> > The bitmap is always a mutliple of page_size, isn't it?
> 
> But the actually used bits inside of the bitmap are not multiple of page_size 
> I think.

Oh right. I think it is up to us to declare in the interface whether the
supplied buffer should be rounded up to a page boundary or not. There's
a pretty good chance that the userspace allocations will all be entire
pages anyhow.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.