[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes
On 17.12.2024 16:55, Jürgen Groß wrote: > On 17.12.24 16:19, Jan Beulich wrote: >> On 17.12.2024 15:22, Juergen Gross wrote: >>> Add a bitmap with one bit per possible domid indicating the respective >>> domain has changed its state (created, deleted, dying, crashed, >>> shutdown). >>> >>> Registering the VIRQ_DOM_EXC event will result in setting the bits for >>> all existing domains and resetting all other bits. >>> >>> As the usage of this bitmap is tightly coupled with the VIRQ_DOM_EXC >>> event, it is meant to be used only by a single consumer in the system, >>> just like the VIRQ_DOM_EXC event. >> >> I'm sorry, but I need to come back to this. I thought I had got convinced >> that only a single entity in the system can bind this vIRQ. Yet upon >> checking I can't seem to find what would guarantee this. In particular >> binding a vIRQ doesn't involve any XSM check. Hence an unprivileged entity >> could, on the assumption that the interested privileged entity (xenstore) >> is already up and running, bind and unbind this vIRQ, just to have the >> global map freed. What am I overlooking (which would likely want stating >> here)? > > I think you are not overlooking anything. > > I guess this can easily be handled by checking that the VIRQ_DOM_EXC handling > domain is the calling one in domain_[de]init_states(). Note that global virqs > are only ever sent to vcpu[0] of the handling domain, so rebinding the event > to another vcpu is possible, but doesn't make sense. No, that's precluded by if ( virq_is_global(virq) && (vcpu != 0) ) return -EINVAL; afaict. That doesn't, however, preclude multiple vCPU-s from trying to bind the vIRQ to vCPU 0. >>> V5: >>> - domain_init_states() may be called only if evtchn_bind_virq() has been >>> called validly (Jan Beulich) >> >> I now recall why I had first suggested the placement later in the handling: >> You're now doing the allocation with yet another lock held. It's likely not >> the end of the world, but ... >> >>> @@ -138,6 +139,60 @@ bool __read_mostly vmtrace_available; >>> >>> bool __read_mostly vpmu_is_available; >>> >>> +static DEFINE_SPINLOCK(dom_state_changed_lock); >>> +static unsigned long *__read_mostly dom_state_changed; >>> + >>> +int domain_init_states(void) >>> +{ >>> + const struct domain *d; >>> + int rc = -ENOMEM; >>> + >>> + spin_lock(&dom_state_changed_lock); >>> + >>> + if ( dom_state_changed ) >>> + bitmap_zero(dom_state_changed, DOMID_FIRST_RESERVED); >>> + else >>> + { >>> + dom_state_changed = xvzalloc_array(unsigned long, >>> + >>> BITS_TO_LONGS(DOMID_FIRST_RESERVED)); >> >> ... already this alone wasn't nice, and could be avoided (by doing the >> allocation prior to acquiring the lock, which of course complicates the >> logic some). >> >> What's perhaps less desirable is that ... >> >>> @@ -494,6 +495,15 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, >>> evtchn_port_t port) >>> goto out; >>> } >>> >>> + if ( virq == VIRQ_DOM_EXC ) >>> + { >>> + rc = domain_init_states(); >>> + if ( rc ) >>> + goto out; >>> + >>> + deinit_if_err = true; >>> + } >>> + >>> port = rc = evtchn_get_port(d, port); >>> if ( rc < 0 ) >>> { >> >> ... the placement here additionally introduces lock nesting when really >> the two locks shouldn't have any relationship. >> >> How about giving domain_init_states() a boolean parameter, such that it >> can be called twice, with the first invocation moved back up where it >> was, and the second one put ... >> >>> @@ -527,6 +537,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, >>> evtchn_port_t port) >>> out: >>> write_unlock(&d->event_lock); >>> >>> + if ( rc && deinit_if_err ) >>> + domain_deinit_states(); >>> + >>> return rc; >>> } >> >> ... down here, not doing any allocation at all (only the clearing), and >> hence eliminating the need to deal with its failure? (Alternatively >> there could of course be a split into an init and a reset function.) >> >> There of course is the chance of races with such an approach. I'd like >> to note though that with the placement of the call in the hunk above >> there's a minor race, too (against ... >> >>> @@ -730,6 +743,9 @@ int evtchn_close(struct domain *d1, int port1, bool >>> guest) >>> struct vcpu *v; >>> unsigned long flags; >>> >>> + if ( chn1->u.virq == VIRQ_DOM_EXC ) >>> + domain_deinit_states(); >> >> ... this and the same remote vCPU then immediately binding the vIRQ >> again). Hence yet another alternative would appear to be to drop the >> new global lock and use d->event_lock for synchronization instead >> (provided - see above - that only a single entity can actually set up >> all of this). That would pretty much want to have the allocation kept >> with the lock already held (which isn't nice, but as said is perhaps >> tolerable), but would at least eliminate the undesirable lock nesting. >> >> Re-use of the domain's event lock is at least somewhat justified by >> the bit array being tied to VIRQ_DOM_EXEC. >> >> Thoughts? > > With my suggestion above I think there is no race, as only the domain handling > VIRQ_DOM_EXC could alloc/dealloc dom_state_changed. Yet still it could be multiple vCPU-s therein to try to in parallel. > Using d->event_lock for synchronization is not a nice option IMO, as it would > require to take the event_lock of the domain handling VIRQ_DOM_EXEC when > trying > to set a bit for another domain changing state. Well, yes, it's that domain's data that's to be modified, after all. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |