[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs



On 14/01/25 13:13, Sean Christopherson wrote:
> On Tue, Jan 14, 2025, Valentin Schneider wrote:
>> text_poke_bp_batch() sends IPIs to all online CPUs to synchronize
>> them vs the newly patched instruction. CPUs that are executing in userspace
>> do not need this synchronization to happen immediately, and this is
>> actually harmful interference for NOHZ_FULL CPUs.
>
> ...
>
>> This leaves us with static keys and static calls.
>
> ...
>
>> @@ -2317,11 +2334,20 @@ static void text_poke_bp_batch(struct text_poke_loc 
>> *tp, unsigned int nr_entries
>>       * First step: add a int3 trap to the address that will be patched.
>>       */
>>      for (i = 0; i < nr_entries; i++) {
>> -            tp[i].old = *(u8 *)text_poke_addr(&tp[i]);
>> -            text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE);
>> +            void *addr = text_poke_addr(&tp[i]);
>> +
>> +            /*
>> +             * There's no safe way to defer IPIs for patching text in
>> +             * .noinstr, record whether there is at least one such poke.
>> +             */
>> +            if (is_kernel_noinstr_text((unsigned long)addr))
>> +                    cond = NULL;
>
> Maybe pre-check "cond", especially if multiple ranges need to be checked?  
> I.e.
>
>               if (cond && is_kernel_noinstr_text(...))
>> +
>> +            tp[i].old = *((u8 *)addr);
>> +            text_poke(addr, &int3, INT3_INSN_SIZE);
>>      }
>>
>> -    text_poke_sync();
>> +    __text_poke_sync(cond);
>>
>>      /*
>>       * Second step: update all but the first byte of the patched range.
>
> ...
>
>> +/**
>> + * is_kernel_noinstr_text - checks if the pointer address is located in the
>> + *                    .noinstr section
>> + *
>> + * @addr: address to check
>> + *
>> + * Returns: true if the address is located in .noinstr, false otherwise.
>> + */
>> +static inline bool is_kernel_noinstr_text(unsigned long addr)
>> +{
>> +    return addr >= (unsigned long)__noinstr_text_start &&
>> +           addr < (unsigned long)__noinstr_text_end;
>> +}
>
> This doesn't do the right thing for modules, which matters because KVM can be
> built as a module on x86, and because context tracking understands transitions
> to GUEST mode, i.e. CPUs that are running in a KVM guest will be treated as 
> not
> being in the kernel, and thus will have IPIs deferred.  If KVM uses a static 
> key
> or branch between guest_state_enter_irqoff() and guest_state_exit_irqoff(), 
> the
> patching code won't wait for CPUs to exit guest mode, i.e. KVM could 
> theoretically
> use the wrong static path.
>

AFAICT guest_state_{enter,exit}_irqoff() are only used in noinstr functions
and thus such a static key usage should at the very least be caught and
warned about by objtool - when this isn't built as a module.

I never really thought about noinstr sections for modules; I can get
objtool to warn about a non-noinstr allowed key being used in
e.g. vmx_vcpu_enter_exit() just by feeding it the vmx.o:

arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_vcpu_enter_exit.isra.0+0x0: 
dummykey: non-RO static key usage in noinstr

...but that requires removing a lot of code first because objtool stops
earlier in its noinstr checks as it hits functions it doesn't have full
information on, e.g.

arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_vcpu_enter_exit+0x21c: call to 
__ct_user_enter() leaves .noinstr.text section

__ct_user_enter() *is* noinstr, but you don't get that from just the header 
prototype.

> I don't expect this to ever cause problems in practice, because patching code 
> in
> KVM's VM-Enter/VM-Exit path that has *functional* implications, while CPUs are
> actively running guest code, would be all kinds of crazy.  But I do think we
> should plug the hole.
>
> If this issue is unique to KVM, i.e. is not a generic problem for all modules 
> (I
> assume module code generally isn't allowed in the entry path, even via NMI?), 
> one
> idea would be to let KVM register its noinstr section for text poking.




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.