[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 COLOPre 16/26] tools/libx{l, c}: add back channel to libxc



On 01/07/15 11:42, Ian Campbell wrote:
> On Wed, 2015-07-01 at 10:38 +0800, Yang Hongyang wrote:
>> On 06/30/2015 06:10 PM, Ian Campbell wrote:
>>> On Thu, 2015-06-25 at 14:25 +0800, Yang Hongyang wrote:
>>>> We need to send secondary's dirty page pfns back to primary.
>>> In v2 Ian asked (<21888.2988.774072.32946@xxxxxxxxxxxxxxxxxxxxxxxx>):
>>>
>>>          In the pdf
>>>             http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>>          linked from the wiki page
>>>             http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>>          it says that the secondary keeps a copy of the original contents of
>>>          its dirty pages.  So I don't understand why you need to send the 
>>> dirty
>>>          bitmap to the primary.
>>>
>>> Which I don't see an answer for in my archive. Have I missed (or
>>> misplaced) the answer?
>> Sorry, seems that I misplaced the answer to:
>> [PATCH v2 COLOPre 09/13] tools/libxl: Update libxl_save_msgs_gen.pl to 
>> support 
>> return data from xl to xc
>>
>>    > Thanks for this.  I would have some comments on the details, but first
>>    > I want to properly understand your use case.  So while I'm the author
>>    > and maintainer of this save helper, I won't review this in detail just
>>    > yet.  I'm following the thread about what this is for...
>>
>>      We need to send secondary's dirty page pfn back to primary. Primary will
>>      then send pages that are both dirtied on primary/secondary to secondary.
>>      in this way the secondary's memory will be consistent with primary.
>>
>>      As we disscussed in [PATCH v2 COLOPre 04/13] tools/libxc: export 
>> xc_bitops.h
>>      If we move this operation to libxc layer, this patch could be dropped.
> This doesn't seem to be a response to Ian's question which I quoted
> above.
>
> The crux of the question is that the design contained in those links
> does not appear to require a back channel, because it does not require a
> dirty bitmap to go from secondary to primary. Asserting a need to do so
> does not answer the question.

It very definitely does require a dirty bitmap moving from the secondary
to the primary.

Lets see whether I can try explaining it in a different way.

In COLO mode, both VMs are running, and are considered in sync if the
visible network traffic is identical.  After some time, they fall out of
sync.

At this point, the two VMs have definitely diverged.  Lets call the
primary dirty bitmap set A, while the secondary dirty bitmap set B.

Sets A and B are different.

Under normal migration, the page data for set A will be sent form the
primary to the secondary.

However, the set difference B - A (lets call this C) is out-of-date on
the secondary (with respect to the primary) and will not be sent by the
primary, as it was not memory dirtied by the primary.  The secondary
needs the page data for C to reconstruct an exact copy of the primary at
the checkpoint.

The secondary cannot calculate C as it doesn't know A.  Instead, the
secondary must send B to the primary, at which point the primary
calculates the union of A and B (lets call this D) which is all the
pages dirtied by both the primary and the secondary, and sends all page
data covered by D.

In the general case, D is a superset of both A and B.  Without the
backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
copy of the primary.

~Andrew

P.S. I have suggested an investigation of the CoW support in Xen as a
potential optimisation, as this could be used to prevent the secondary
losing C, but this is very definitely future work and not appropriate at
this point in COLO.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.