Xen project Mailing List

Re: [Xen-devel] [PATCH 4/5] x86/alternative: Implement NMI/#MC-safe patching

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Wed, 31 Jan 2018 04:00:36 -0700

Cc: RossLagerwall <ross.lagerwall@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 31 Jan 2018 11:00:53 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

>>> On 30.01.18 at 20:26, <andrew.cooper3@xxxxxxxxxx> wrote: > On 30/01/18 08:39, Jan Beulich wrote: >>>>> On 29.01.18 at 16:38, <andrew.cooper3@xxxxxxxxxx> wrote: >>> + /* >>> + * We are the CPU performing the patching, and might have ended up >>> here by >>> + * hitting a breakpoint. >>> + * >>> + * Either way, we need to complete particular patch to make forwards >>> + * progress. This logic is safe even if executed recursively in the >>> + * breakpoint handler; the worst that will happen when normal execution >>> + * resumes is that we will rewrite the same bytes a second time. >>> + */ >>> + >>> + /* First, insert a breakpoint to prevent execution of the patch site. >>> */ >>> + i->addr[0] = 0xcc; >>> + smp_wmb(); >> This is necessary, but not sufficient when replacing more than a >> single insn: The other CPU may be executing instructions _after_ >> the initial one that is being replaced, and ... >> >>> + /* Second, copy the remaining instructions into place. */ >>> + memcpy(i->addr + 1, i->opcode + 1, i->len - 1); >> ... this may be altering things underneath its feet. > > Hmm. > > It is completely impossible to recover if another CPU hits the middle of > this memcpy(). It is impossible to synchronise a specific write against > the remote CPU pipelines. It is not completely impossible, but the solution would be awkward to use: If the code doing the patching knew instruction boundaries, it could put breakpoints onto all of them. Or maybe it isn't all that bad to use - we have an insn decoder after all. > The only way to be safe is to guarantee that CPUs can't hit the code to > begin with. > > On AMD, we can use STGI/CLGI to do this sensibly, and really really > inhibit all interrupts. Not really for #MC - clear GIF may also result in shutdown when one would need delivering. > For Intel, we can fix the NMI problem by > rendezvousing and running the patching loop in NMI context, at which > point the hardware latch will prevent further NMIs. > > However, there is literally nothing we can do to prevent #MC from > arriving. We can stop servicing #MC by disabling CR4.MCE, but then the > processor will shut down. Not a good idea indeed. > We can't put a big barrier at the head of #MC handler which delays > processing, because we have no way to ensure that processors aren't > already later in the #MC handler. Furthermore, attempting to do so > heightens the risk of hitting a shutdown condition from a second queued #MC. > > > The chance of hitting an NMI/#MC collide with patching is already > minuscule. Alternatives and livepatching have been used (by XenServer > alone) in this form for nearly 3 years, across millions of servers, > without a single report of such a collision. The chance of an #MC > collision is far less likely than an NMI collision, and in the best case. > > While we can arrange and test full NMI safety, we cannot test #MC safety > at all. Any code to try and implement #MC safety is going to be > complicated, and make things worse if an #MC does hit. > > Therefore, I agree with the Linux view that trying to do anything for > #MC safety is going to be worse than doing nothing at all. I agree. But as said before - the immediate goal ought to be to make alternatives patching safe, and that doesn't require any of these considerations. >>> @@ -153,7 +231,31 @@ void init_or_livepatch add_nops(void *insns, unsigned >>> int len) >>> void init_or_livepatch noinline >>> text_poke(void *addr, const void *opcode, size_t len, bool live) >>> { >>> - memcpy(addr, opcode, len); >>> + if ( !live || len == 1 ) >>> + { >>> + /* >>> + * If we know *addr can't be executed, or we are patching a single >>> + * byte, it is safe to use a straight memcpy(). >>> + */ >>> + memcpy(addr, opcode, len); >> Is it really worth special casing this? Whether to actually ack >> patches 2 and 3 depends on that. > > Yes, and even more so if we are going to use an NMI rendezvous. We > definitely don't want to have an NMI rendezvous while applying > alternatives as part of livepatch preparation. Well, again - live patching should be the second step here. >>> + }; >>> + smp_wmb(); >>> + active_text_patching = true; >>> + smp_wmb(); >>> + text_poke_live(NULL); >>> + smp_wmb(); >>> + active_text_patching = false; >> Perhaps better to zap live_poke_info.cpu again here? That could >> in fact replace the separate active_text_patching flag. > > No - it cant because... > >>> --- a/xen/arch/x86/traps.c >>> +++ b/xen/arch/x86/traps.c >>> @@ -119,6 +119,8 @@ boolean_param("ler", opt_ler); >>> #define stack_words_per_line 4 >>> #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)regs->rsp) >>> >>> +bool active_text_patching; >> Why here rather than in alternative.c? > > ... in the !LIVEPATCH case, alternative.c is an .init.o and the build > fails because of a nonzero bss. > > We cannot have a possibly-init variable referenced in the head of an > exception handler. I'd like to revisit this once the live patching aspect was split off of the more immediate alternatives patching part. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.