[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: kernel BUG around vmap/vfree - xen_enter_lazy_mmu()/xen_leave_lazy_mmu() - Linux 7.0-rc1



On 08/05/2026 10:53, Juergen Gross wrote:
> [...]
>
> But now I think I have found the real culprit in lazy_mmu_mode_enable():
>
> static inline void lazy_mmu_mode_enable(void)
> {
>         struct lazy_mmu_state *state = &current->lazy_mmu_state;
>
>         if (in_interrupt() || state->pause_count > 0)
>                 return;
>
>         VM_WARN_ON_ONCE(state->enable_count == U8_MAX);
>
>         if (state->enable_count++ == 0)
>                 arch_enter_lazy_mmu_mode();
> }
>
> Consider a preemption just before calling arch_enter_lazy_mmu_mode(). The
> enable_count will be 1 now, but there was no switch to lazy mode yet.
>
> When the task becomes active again, context switch handling will see lazy
> mode enabled (enable_count > 0), so it will call
> arch_enter_lazy_mmu_mode().
> And then the task resumes and is calling arch_enter_lazy_mmu_mode()
> another
> time.

Agreed, this must be the problem. I did wonder whether the lack of
atomicity would cause trouble...

arm64 isn't impacted because it tracks related state in task_struct
only. powerpc and sparc do use percpu variables but that shouldn't
matter as they disable preemption in the entire lazy MMU section.

>
> The only chance I'm seeing to avoid that would be to disable preemption
> around all instances of testing a condition and then enabling or
> disabling
> lazy mmu mode.

I don't immediately see why we would need such a big hammer. If we
revert commit 291b3abed657 ("x86/xen: use lazy_mmu_state when
context-switching"), then arch_{start,end}_context_switch() should once
again do the right thing for Xen since the TIF_LAZY_MMU_UPDATES flag is
separate from lazy_mmu_state. I think it looks like this:

lazy_mmu_mode_enable()
    state->enable_count++
    <PREEMPT>
        arch_start_context_switch() 
            xen_lazy_mode == XEN_LAZY_NONE -> do nothing
        
        <other task runs; this task is scheduled again>

        arch_end_context_switch() 
            TIF_LAZY_MMU_UPDATES not set -> do nothing

        <exception return>
    enter_lazy(XEN_LAZY_MMU)

Nothing else should be checking lazy MMU state during the context switch.

Does that make sense?

- Kevin



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.