| 
    
 [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 2/5] xen: add bitmap to indicate per-domain state changes
 On 17.12.2024 16:55, Jürgen Groß wrote:
> On 17.12.24 16:19, Jan Beulich wrote:
>> On 17.12.2024 15:22, Juergen Gross wrote:
>>> Add a bitmap with one bit per possible domid indicating the respective
>>> domain has changed its state (created, deleted, dying, crashed,
>>> shutdown).
>>>
>>> Registering the VIRQ_DOM_EXC event will result in setting the bits for
>>> all existing domains and resetting all other bits.
>>>
>>> As the usage of this bitmap is tightly coupled with the VIRQ_DOM_EXC
>>> event, it is meant to be used only by a single consumer in the system,
>>> just like the VIRQ_DOM_EXC event.
>>
>> I'm sorry, but I need to come back to this. I thought I had got convinced
>> that only a single entity in the system can bind this vIRQ. Yet upon
>> checking I can't seem to find what would guarantee this. In particular
>> binding a vIRQ doesn't involve any XSM check. Hence an unprivileged entity
>> could, on the assumption that the interested privileged entity (xenstore)
>> is already up and running, bind and unbind this vIRQ, just to have the
>> global map freed. What am I overlooking (which would likely want stating
>> here)?
> 
> I think you are not overlooking anything.
> 
> I guess this can easily be handled by checking that the VIRQ_DOM_EXC handling
> domain is the calling one in domain_[de]init_states(). Note that global virqs
> are only ever sent to vcpu[0] of the handling domain, so rebinding the event
> to another vcpu is possible, but doesn't make sense.
No, that's precluded by
    if ( virq_is_global(virq) && (vcpu != 0) )
        return -EINVAL;
afaict. That doesn't, however, preclude multiple vCPU-s from trying to bind
the vIRQ to vCPU 0.
>>> V5:
>>> - domain_init_states() may be called only if evtchn_bind_virq() has been
>>>    called validly (Jan Beulich)
>>
>> I now recall why I had first suggested the placement later in the handling:
>> You're now doing the allocation with yet another lock held. It's likely not
>> the end of the world, but ...
>>
>>> @@ -138,6 +139,60 @@ bool __read_mostly vmtrace_available;
>>>   
>>>   bool __read_mostly vpmu_is_available;
>>>   
>>> +static DEFINE_SPINLOCK(dom_state_changed_lock);
>>> +static unsigned long *__read_mostly dom_state_changed;
>>> +
>>> +int domain_init_states(void)
>>> +{
>>> +    const struct domain *d;
>>> +    int rc = -ENOMEM;
>>> +
>>> +    spin_lock(&dom_state_changed_lock);
>>> +
>>> +    if ( dom_state_changed )
>>> +        bitmap_zero(dom_state_changed, DOMID_FIRST_RESERVED);
>>> +    else
>>> +    {
>>> +        dom_state_changed = xvzalloc_array(unsigned long,
>>> +                                           
>>> BITS_TO_LONGS(DOMID_FIRST_RESERVED));
>>
>> ... already this alone wasn't nice, and could be avoided (by doing the
>> allocation prior to acquiring the lock, which of course complicates the
>> logic some).
>>
>> What's perhaps less desirable is that ...
>>
>>> @@ -494,6 +495,15 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, 
>>> evtchn_port_t port)
>>>           goto out;
>>>       }
>>>   
>>> +    if ( virq == VIRQ_DOM_EXC )
>>> +    {
>>> +        rc = domain_init_states();
>>> +        if ( rc )
>>> +            goto out;
>>> +
>>> +        deinit_if_err = true;
>>> +    }
>>> +
>>>       port = rc = evtchn_get_port(d, port);
>>>       if ( rc < 0 )
>>>       {
>>
>> ... the placement here additionally introduces lock nesting when really
>> the two locks shouldn't have any relationship.
>>
>> How about giving domain_init_states() a boolean parameter, such that it
>> can be called twice, with the first invocation moved back up where it
>> was, and the second one put ...
>>
>>> @@ -527,6 +537,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, 
>>> evtchn_port_t port)
>>>    out:
>>>       write_unlock(&d->event_lock);
>>>   
>>> +    if ( rc && deinit_if_err )
>>> +        domain_deinit_states();
>>> +
>>>       return rc;
>>>   }
>>
>> ... down here, not doing any allocation at all (only the clearing), and
>> hence eliminating the need to deal with its failure? (Alternatively
>> there could of course be a split into an init and a reset function.)
>>
>> There of course is the chance of races with such an approach. I'd like
>> to note though that with the placement of the call in the hunk above
>> there's a minor race, too (against ...
>>
>>> @@ -730,6 +743,9 @@ int evtchn_close(struct domain *d1, int port1, bool 
>>> guest)
>>>           struct vcpu *v;
>>>           unsigned long flags;
>>>   
>>> +        if ( chn1->u.virq == VIRQ_DOM_EXC )
>>> +            domain_deinit_states();
>>
>> ... this and the same remote vCPU then immediately binding the vIRQ
>> again). Hence yet another alternative would appear to be to drop the
>> new global lock and use d->event_lock for synchronization instead
>> (provided - see above - that only a single entity can actually set up
>> all of this). That would pretty much want to have the allocation kept
>> with the lock already held (which isn't nice, but as said is perhaps
>> tolerable), but would at least eliminate the undesirable lock nesting.
>>
>> Re-use of the domain's event lock is at least somewhat justified by
>> the bit array being tied to VIRQ_DOM_EXEC.
>>
>> Thoughts?
> 
> With my suggestion above I think there is no race, as only the domain handling
> VIRQ_DOM_EXC could alloc/dealloc dom_state_changed.
Yet still it could be multiple vCPU-s therein to try to in parallel.
> Using d->event_lock for synchronization is not a nice option IMO, as it would
> require to take the event_lock of the domain handling VIRQ_DOM_EXEC when 
> trying
> to set a bit for another domain changing state.
Well, yes, it's that domain's data that's to be modified, after all.
Jan
 
 
  | 
  
![]()  | 
            
         Lists.xenproject.org is hosted with RackSpace, monitoring our  |