[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 16/16] xen/events: use the FIFO-based ABI if available



On 14/10/13 20:30, Boris Ostrovsky wrote:
> On 10/08/2013 08:49 AM, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@xxxxxxxxxx>
>>
>> Implement all the event channel port ops for the FIFO-based ABI.
>>
>> If the hypervisor supports the FIFO-based ABI, enable it by
>> initializing the control block for the boot VCPU and subsequent VCPUs
>> as they are brought up and on resume.  The event array is expanded as
>> required when event ports are setup.
[...]
>> --- a/drivers/xen/events/events.c
>> +++ b/drivers/xen/events/events.c
[...]
>> @@ -1636,7 +1637,13 @@ void xen_callback_vector(void) {}
>>     void __init xen_init_IRQ(void)
>>   {
>> -    xen_evtchn_2l_init();
>> +    int ret;
>> +
>> +    ret = xen_evtchn_fifo_init();
>> +    if (ret < 0) {
>> +        printk(KERN_INFO "xen: falling back to n-level event channels");
>> +        xen_evtchn_2l_init();
>> +    }
> 
> Should we provide users with ability to choose which mechanism to use?
> Is there any advantage in staying with 2-level? Stability, I guess,
> would be one.

If someone can demonstrate a use case where 2-level is better then we
could consider an option.  I don't think we want options for new
software features just because they might be buggy.

>> --- /dev/null
>> +++ b/drivers/xen/events/events_fifo.c
[...]
>> +#define BM(w) ((unsigned long *)(w))
> 
> This could go into a header file (events_internal.h?) since 2-level uses
> it as well.

Although they look the same they're converting between different types.
 xen_ulong_t in the 2-level case and event_word_t in the fifo-based case
so I would prefer this to be local to both files.

>> +
>> +    if (i >= MAX_EVENT_ARRAY_PAGES)
>> +        return -EINVAL;
>> +
>> +    while (i >= event_array_pages) {
>> +        void *array_page;
>> +        struct evtchn_expand_array expand_array;
>> +
>> +        /* Might already have a page if we've resumed. */
>> +        array_page = event_array[event_array_pages];
>> +        if (!array_page) {
>> +            array_page = (void *)__get_free_page(GFP_KERNEL);
>> +            if (array_page == NULL)
>> +                goto error;
>> +            event_array[event_array_pages] = array_page;
>> +        }
>> +
>> +        /* Mask all events in this page before adding it. */
>> +        init_array_page(array_page);
>> +
>> +        expand_array.array_gfn = virt_to_mfn(array_page);
>> +
>> +        ret = HYPERVISOR_event_channel_op(EVTCHNOP_expand_array,
>> &expand_array);
>> +        if (ret < 0)
>> +            goto error;
>> +
>> +        event_array_pages++;
> 
> Should this increment happen in the 'if(!array_page)' clause?

No. event_array_pages is the number of pages Xen is aware of.  Note how
we zero it when resuming on a new domain with the FIFO-based ABI
initially disabled.

>> +    }
>> +    return 0;
>> +
>> +  error:
>> +    if (event_array_pages == 0)
>> +        panic("xen: unable to expand event array with initial page
>> (%d)\n", ret);
>> +    else
>> +        printk(KERN_ERR "xen: unable to expand event array (%d)\n",
>> ret);
>> +    free_unused_array_pages();
> 
> Do you need to clean up in the hypervisor as well?

There's noting to clean up in the hypervisor here.
free_unused_array_pages() is freeing array pages that Xen is not aware of.

>> +static void evtchn_fifo_mask(unsigned port)
>> +{
>> +    event_word_t *word = event_word_from_port(port);
>> +    if (word)
>> +        sync_set_bit(EVTCHN_FIFO_MASKED, BM(word));
> 
> You are testing 'word' here but not in the routines above (or below).

I think the test can be removed.  The common code used to try and mask
all events even if there were no array pages yet, but it doesn't do this
any more.

>> +}
>> +
>> +static void evtchn_fifo_unmask(unsigned port)
>> +{
>> +    event_word_t *word = event_word_from_port(port);
>> +
>> +    BUG_ON(!irqs_disabled());
>> +
>> +    sync_clear_bit(EVTCHN_FIFO_MASKED, BM(word));
>> +    if (sync_test_bit(EVTCHN_FIFO_PENDING, BM(word))) {
>> +        struct evtchn_unmask unmask = { .port = port };
>> +        (void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask, &unmask);
>> +    }
>> +}
> 
> 2-level unmasking is somewhat more elaborate, with it trying to avoid
> races on pending events. Is this not a concern here?

The 2-level unmask is trying to avoid doing a hypercall as an
optimization.  This optimization is not possible so the code here is
much simpler.

>> +    if (head == 0) {
>> +        rmb(); /* Ensure word is up-to-date before reading head. */
>> +        head = control_block->head[priority];
>> +    }
>> +
>> +    port = head;
>> +    word = event_word_from_port(port);
> 
> Do you need to check for 'word!=NULL'? You don't check it in
> clear_linked() (which is maybe where this should be done).

I don't think so.  The kernel trusts Xen to only set valid LINK fields.

>> +static void evtchn_fifo_resume(void)
>> +{
>> +    unsigned cpu;
>> +
>> +    for_each_possible_cpu(cpu) {
>> +        void *control_block = per_cpu(cpu_control_block, cpu);
>> +        struct evtchn_init_control init_control;
>> +        int ret;
>> +
>> +        if (!control_block)
>> +            continue;
>> +
>> +        /*
>> +         * If this CPU is offline, take the opportunity to
>> +         * free the control block while it is not being
>> +         * used.
>> +         */
>> +        if (!cpu_online(cpu)) {
>> +            free_page((unsigned long)control_block);
>> +            per_cpu(cpu_control_block, cpu) = NULL;
>> +            continue;
>> +        }
> 
> Have you tested offlining/onlining CPUs (lots of them)? I am asking
> because I see EVTCHNOP_init_control both here
> and in init_control_block() but I don't see anything that would "deinit"
> control block for which you are freeing the page above.

It's not possible to "deinit" a control block.  The hypervisor
deliberately doesn't provide an operation for this.

Note that evtchn_fifo_resume() is called when the guest is resumed in a
new domain which does not have any control blocks initialized yet. So,
in the case above, we're freeing a control block that Xen isn't aware of
yet.

>> +    int ret = 0;
>> +
>> +    switch (action) {
>> +    case CPU_UP_PREPARE:
>> +        if (!per_cpu(cpu_control_block, cpu))
>> +            ret = evtchn_fifo_init_control_block(cpu);
>> +        break;
>> +    default:
>> +        break;
>> +    }
> 
> What happens when you offline a CPU?

All the control blocks remain initialized, available for use when the
CPU is onlined again.  This is no different to the per-VCPU shared info.

This does all work fine[1].

David

[1] Once I fixed a recent bug I introduced into patch 10 which would
accidentally trash the IPIs/VIRQs for VCPU 0 instead of the offlined
VCPU.  Oops.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.