[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V4] X86/vMCE: handle broken page with regard to migration


From: dunlapg@xxxxxxxxx [mailto:dunlapg@xxxxxxxxx] On Behalf Of George Dunlap
Sent: Saturday, November 24, 2012 12:26 AM
To: Liu, Jinsong
Cc: Ian Campbell; Ian Jackson; Jan Beulich; xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] [PATCH V4] X86/vMCE: handle broken page with regard to migration

On Fri, Nov 23, 2012 at 12:25 AM, Liu Jinsong <jinsong.liu@xxxxxxxxx> wrote:
This patch handles broken page wrt migration. Generally there are below cases:
1. broken page occurs before migration
2. broken page occurs during migration
  2.1 broken page occurs during migration but not at the last iteration
  2.2 broken page occurs at the last iteration of migration

For case 1, at the sender the broken page will be mapped but not copied
  to target (otherwise it may trigger more serious error, say, SRAR error).
  While its pfn_type and pfn number will be transferred to target so that
  target take appropriate action.

For case 2.1, at the sender mce handler marks the broken page to dirty
  bitmap, so that at next iteration, its pfn_type and pfn number will be
  transferred to the target and then take appropriate action.

For case 2.2, at the sender it adds a check to see if vMCE occurs at the
  last iteration. If yes, it will do more iteration(s) so that the broken
  page's pfn_type and pfn number will be transferred to target.
  Another point is, if guest save to disk and during which vMCE occurs,
  it also need do more iteration(s).

For all cases at the target (if migration not aborted by vMCE):
  Target will populates pages for guest. As for the case of broken page,
  we prefer to keep the type of the page for the sake of seamless migration.
  Target will set p2m as p2m_ram_broken for broken page. If guest access
  the broken page again it will kill itself as expected.

All above description is based on the assumption that migration will success.
However, for case 2 there are scenario that may result in guest or hypervisor
crash. When pfn_type detecting fail to get p2m_ram_broken (vMCE occur after the
detecting) and read the broken page, guest/hypervisor may survive or crash,
depending on error nature and how guest/hypervisor handle it. If guest/hypervisor
survive, migration is OK since it will transfer pfn_type to the target at next
iter and then prevent further harm at target. If guest/hypervisor crash it
definitely needn't care migration any more. Unfortunately we have no way to
predict it, so what we can do is to do the best to handle it, after all we
cannot forbid migration for fear that it may crash guest/hypervisor.

Patch version history:
  - adjust variables and patch description based on feedback
  - handle pages broken at the last iteration
  - migrate continue when broken page occur during migration,
    via marking broken page to dirty bitmap
  - migration abort when broken page occur during migration
  - transfer pfn_type to target for broken page occur before migration

Suggested-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
Signed-off-by: Liu Jinsong <jinsong.liu@xxxxxxxxx>
Acked-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

Technically, since you changed part of the code I acked, you should have removed this ack.  But now I've read the patch:

Acked-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

Thanks Jinsong!

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.