[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 2/3] xen/domain: fix UBSAN null pointer dereference in vcpu_info_reset()





On 5/20/26 4:21 PM, Jan Beulich wrote:
On 20.05.2026 15:40, Oleksii Kurochko wrote:


On 5/20/26 2:03 PM, Jan Beulich wrote:
On 20.05.2026 13:33, Oleksii Kurochko wrote:


On 5/19/26 1:53 PM, Jan Beulich wrote:
On 19.05.2026 13:22, Oleksii Kurochko wrote:
On 5/19/26 12:55 PM, Oleksii Kurochko wrote:
On 5/19/26 11:37 AM, Jan Beulich wrote:
On 19.05.2026 10:39, Oleksii Kurochko wrote:
vcpu_info_reset() maps v->vcpu_info_area.map to the per-vcpu slot inside
the domain's shared_info page for vcpus with id < XEN_LEGACY_MAX_VCPUS,
and falls back to dummy_vcpu_info for vcpus beyond that limit.

However, it does not guard against d->shared_info being NULL.  The
shared_info() macro expands to a member access through d->shared_info,
so when an architecture does not allocate a shared_info page the
dereference triggers UBSAN:
      UBSAN: Undefined behaviour in common/domain.c:325:10
      member access within null pointer of type 'struct shared_info_t'

Extend the existing fallback condition to also cover the case where no
shared_info page has been allocated, mapping the vcpu to dummy_vcpu_info
instead. This is the correct behaviour: dummy_vcpu_info already serves
as the safe stand-in for vcpus that have no usable shared_info slot.

Fixes: 295514ff75506 ("common: convert vCPU info area registration")

I question this, largely (but not only) because I also ...

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
Reviewed-by: Baptiste Le Duc <baptiste.le-duc@xxxxxxxxxx>
---
RISC-V does not allocate a shared_info page at the momemnt because its
guests run in dom0less mode and do not use the Xen PV ABI, so
d->shared_info remains NULL throughout domain lifetime.

... question this mode of operation. Yes, you may (for now) be able to
get
away without, but e.g. event channels will want supporting at some point.
Which will require a shared info page. Better put that in place right
away,
even if the guests you test with don't use it (yet). Certain other common
code also assumes d->shared_info to never be NULL for an alive domain.


Would it be fine than to allocate it in arch_domain_create() ... :

        if ( (d->shared_info = alloc_xenheap_pages(0, 0)) == NULL )
            goto fail;

        clear_page(d->shared_info);

... but without calling share_xen_page_with_guest() after that
allocation as share_xen_page_with_guest() isn't implemented at the moment?

I would have said "yes" here, but ...

Or could it be an option for all arch-s move allocation of
d->shared_info to domain_create() in common just after arch_domain_create()?

... Andrew's reply pretty much rules out not only this option, but the
shared-info-page concept as a whole (for RISC-V). See my reply there. In
the meantime, the change as suggested may then indeed be what we want to
go with, albeit (a) with a better description and (b) perhaps covering
all d->shared_info uses.

Looking at guest kernel code (Linux), FIFO is tried first, so if RISC-V
is going to support only FIFO, d->shared_info could legally be NULL.

Looking at the Xen side, if an architecture decides to support only
FIFO, d->shared_info is touched only in vcpu_info_reset(), which is
called from vcpu_create().

All other places where d->shared_info is accessed should not be
reachable except for one case in event_fifo.c: when a guest issues the
EVTCHNOP_init_control hypercall, setup_ports() reads from shared_info(d,
evtchn_pending):
     static void setup_ports(struct domain *d, unsigned int prev_evtchns)
     {
     ...
             if ( guest_test_bit(d, port, &shared_info(d, evtchn_pending))
                 evtchn->pending = true;
     ...
         }
     }

This looks like it handles the transition from the 2L ABI to the FIFO
ABI: if a guest started with 2L and then switched to FIFO, any events
already pending in shared_info(d, evtchn_pending) need to be migrated to
FIFO's per-channel evtchn->pending flag. But it looks like I am missing
something here as I mentioned at the start that Linux uses or FIFO or 2L.

Am I missing something?

Quite likely you aren't, but I didn't check. My earlier "covering all" may
well resolve to merely stating things accordingly in the patch description.

If either FIFO or 2L can be used, shouldn't guest_test_bit(d, port,
&shared_info(d, evtchn_pending)) in setup_ports() be dropped? If FIFO
was chosen by Linux, there won't be any events in &shared_info(d,
evtchn_pending), so it is essentially dead code that could just be
dropped.

Why would it be dead code? Who said that a guest couldn't to 2L for a
while, then switch to FIFO? Think of boot loaders, for example.

I jsut based my assumption on Linux use case, if generally such switch is okay then I will add to my original patch what I suggested here ...


Jan

Or would it be better to leave it and skip only if
d->shared_info is allocated: if ( d->shared_info && guest_test_bit(...)
)  to cover the case when a guest wants to switch from 2L to FIFO (if
that is even a possible case at all, since as I mentioned above, the
guest (Linux) chooses the event ABI once and it stays for its lifetime)?

... + refactor commit message:
```
xen/domain: fix UBSAN null pointer dereference of d->shared_info

It is legal to have d->shared_info equal to NULL for architectures which
support only the FIFO ABI for event channel management.

Having d->shared_info == NULL leads to a UBSAN issue on such architectures:
  UBSAN: Undefined behaviour in common/domain.c:325:10
         member access within null pointer of type 'struct shared_info_t'

vcpu_info_reset() maps v->vcpu_info_area.map to the per-vcpu slot inside
the domain's shared_info page for vcpus with id < XEN_LEGACY_MAX_VCPUS,
and falls back to dummy_vcpu_info for vcpus beyond that limit.
Extend the existing fallback condition to also cover the case where no
shared_info page has been allocated, mapping the vcpu to dummy_vcpu_info
instead. This is the correct behaviour: dummy_vcpu_info already serves
as the safe stand-in for vcpus that have no usable shared_info slot.

Additionally, if an architecture supports only the FIFO ABI, setup_ports()
should be updated to avoid a NULL pointer dereference of d->shared_info,
since in that case there will be no pending events in
shared_info->evtchn_pending and the pending flag of the FIFO event channel does not need to be set to true.
```

~ Oleksii




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.