[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PML (Page Modification Logging) design for Xen

On Fri, Feb 13, 2015 at 10:50 AM, Kai Huang <kai.huang@xxxxxxxxxxxxxxx> wrote:
> On 02/12/2015 08:34 PM, Tim Deegan wrote:
>> Hi,
>> Thanks for posting this design!
>> At 16:28 +0800 on 11 Feb (1423668493), Kai Huang wrote:
>>> Design
>>> ======
>>> - PML feature is used globally
>>> A new Xen boot parameter, say 'opt_enable_pml', will be introduced to
>>> control PML feature detection, and PML feature will only be detected if
>>> opt_enable_pml = 1. Once PML feature is detected, it will be used for dirty
>>> logging for all domains globally. Currently we don't support to use PML on
>>> basis of per-domain as it will require additional control from XL tool.
>> Sounds good.  I agree that there's no point in making this a per-VM
>> feature.
>>> - PML enable/disable for particular Domain
>>> PML needs to be enabled (allocate PML buffer, initialize PML index, PML
>>> base address, turn PML on VMCS, etc) for all vcpus of the domain, as PML
>>> buffer and PML index are per-vcpu, but EPT table may be shared by vcpus.
>>> Enabling PML on partial vcpus of the domain won't work. Also PML will only
>>> be enabled for the domain when it is switched to dirty logging mode, and it
>>> will be disabled when domain is switched back to normal mode. As looks vcpu
>>> number won't be changed dynamically during guest is running (correct me if I
>>> am wrong here), so we don't have to consider enabling PML for new created
>>> vcpu when guest is in dirty logging mode.
>> No - you really ought to handle enabling this for new VCPUs.  There
>> have been cases in the past where VMs are put into log-dirty mode
>> before their VCPUs are assigned, and there might be again.
> "Assigned" here means created?
>> It ought to be easy to handle, though - just one more check and
>> function call on the vcpu setup path.
> I think "check and function call" means check function call to enable PML on
> this vcpu? Then what if enabling PML for vcpu fails (possible as it needs to
> allocate 4K PML buffer)? It's better to choose to roll back to use write
> protection instead of indicating failure of creating the vcpu. But in this
> case there will be problem if the domain has already been in log dirty mode
> as we might already have EPT table setup with D-bit clear for logdirty
> range, which means we need to re-check the logdirty ranges and re-set EPT
> table to be read-only.  Does this sound reasonable?

Hi Tim, all,

Do you have comments on this?

If my above understanding is true, to me it's a little bit complicated
to enable PML for domain on demand when it switches to log-dirty mode.
Another approach is we  enable PML for vcpu unconditionally (if PML
feature is detected of course) when vcpu is created, and if enabling
PML failed, vcpu will just  not be created. This approach simplifies
the logic to handle failure of enabling PML for vcpu, as there is no
need to roll back to write protection for other vcpus when enabling
PML fails. The disadvantage is PML will be enabled for guest during
guest's entire run-time, and there will be an additional 4K buffer
allocated for each vcpu even  guest is not in log-dirty mode. And we
also need to manually set D-bit to 1 for guest memory not in log-dirty
mode to avoid unnecessary GPA logging (ex, when guest memory is just
populated).  Btw, this approach is the approach we already did for

Do you have any suggestion here?


>>> After PML is enabled for the domain, we only need to clear EPT entry's
>>> D-bit for guest memory in dirty logging mode. We achieve this by checking if
>>> PML is enabled for the domain when p2m_ram_rx changed to p2m_ram_logdirty,
>>> and updating EPT entry accordingly. However, for super pages, we still write
>>> protect them in case of PML as we still need to split super page to 4K page
>>> in dirty logging mode.
>> IIUC, you are suggesting leaving superpages handled as they are now,
>> with read-only EPTEs, and only using PML for single-page mappings.
>> That seems good. :)
>>> - PML buffer flush
>>> There are two places we need to flush PML buffer. The first place is PML
>>> buffer full VMEXIT handler (apparently), and the second place is in
>>> paging_log_dirty_op (either peek or clean), as vcpus are running
>>> asynchronously along with paging_log_dirty_op is called from userspace via
>>> hypercall, and it's possible there are dirty GPAs logged in vcpus' PML
>>> buffers but not full. Therefore we'd better to flush all vcpus' PML buffers
>>> before reporting dirty GPAs to userspace.
>>> We handle above two cases by flushing PML buffer at the beginning of all
>>> VMEXITs. This solves the first case above, and it also solves the second
>>> case, as prior to paging_log_dirty_op, domain_pause is called, which kicks
>>> vcpus (that are in guest mode) out of guest mode via sending IPI, which
>>> cause VMEXIT, to them.
>> I would prefer to flush only on buffer-full VMEXITs and handle the
>> peek/clear path by explicitly reading all VCPUs' buffers.  That avoids
>> putting more code on the fast paths for other VMEXIT types.
> OK. But looks this requires a new interface like paging_flush_log_dirty,
> called at beginning of paging_log_dirty_op? This is actually what I wanted
> to avoid originally.
>>> This also makes log-dirty radix tree more updated as PML buffer is
>>> flushed on basis of all VMEXITs but not only PML buffer full VMEXIT.
>>> - Video RAM tracking (and partial dirty logging for guest memory range)
>>> Video RAM is in dirty logging mode unconditionally during guest's
>>> run-time, and it is partial memory range of the guest. However, PML operates
>>> on the whole guest memory (the whole valid EPT table, more precisely), so we
>>> need to choose whether to use PML if only partial guest memory ranges are in
>>> dirty logging mode.
>>> Currently, PML will be used as long as there's guest memory in dirty
>>> logging mode, no matter globally or partially. And in case of partial dirty
>>> logging, we need to check if the logged GPA in PML buffer is in dirty
>>> logging range.
>> I think, as other people have said, that you can just use PML for this
>> case without any other restrictions.  After all, mappings for non-VRAM
>> areas ought not to have their D-bits clear anyway.
> Agreed.
> Thanks,
> -Kai
>> Cheers,
>> Tim.
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.