Xen project Mailing List

Re: [Xen-devel] [PATCH 16/16] xen/events: use the FIFO-based ABI if available

To: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>

From: David Vrabel <david.vrabel@xxxxxxxxxx>

Date: Tue, 15 Oct 2013 19:58:52 +0100

Cc: Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Tue, 15 Oct 2013 18:59:17 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 14/10/13 20:30, Boris Ostrovsky wrote: > On 10/08/2013 08:49 AM, David Vrabel wrote: >> From: David Vrabel <david.vrabel@xxxxxxxxxx> >> >> Implement all the event channel port ops for the FIFO-based ABI. >> >> If the hypervisor supports the FIFO-based ABI, enable it by >> initializing the control block for the boot VCPU and subsequent VCPUs >> as they are brought up and on resume. The event array is expanded as >> required when event ports are setup. [...] >> --- a/drivers/xen/events/events.c >> +++ b/drivers/xen/events/events.c [...] >> @@ -1636,7 +1637,13 @@ void xen_callback_vector(void) {} >> void __init xen_init_IRQ(void) >> { >> - xen_evtchn_2l_init(); >> + int ret; >> + >> + ret = xen_evtchn_fifo_init(); >> + if (ret < 0) { >> + printk(KERN_INFO "xen: falling back to n-level event channels"); >> + xen_evtchn_2l_init(); >> + } > > Should we provide users with ability to choose which mechanism to use? > Is there any advantage in staying with 2-level? Stability, I guess, > would be one. If someone can demonstrate a use case where 2-level is better then we could consider an option. I don't think we want options for new software features just because they might be buggy. >> --- /dev/null >> +++ b/drivers/xen/events/events_fifo.c [...] >> +#define BM(w) ((unsigned long *)(w)) > > This could go into a header file (events_internal.h?) since 2-level uses > it as well. Although they look the same they're converting between different types. xen_ulong_t in the 2-level case and event_word_t in the fifo-based case so I would prefer this to be local to both files. >> + >> + if (i >= MAX_EVENT_ARRAY_PAGES) >> + return -EINVAL; >> + >> + while (i >= event_array_pages) { >> + void *array_page; >> + struct evtchn_expand_array expand_array; >> + >> + /* Might already have a page if we've resumed. */ >> + array_page = event_array[event_array_pages]; >> + if (!array_page) { >> + array_page = (void *)__get_free_page(GFP_KERNEL); >> + if (array_page == NULL) >> + goto error; >> + event_array[event_array_pages] = array_page; >> + } >> + >> + /* Mask all events in this page before adding it. */ >> + init_array_page(array_page); >> + >> + expand_array.array_gfn = virt_to_mfn(array_page); >> + >> + ret = HYPERVISOR_event_channel_op(EVTCHNOP_expand_array, >> &expand_array); >> + if (ret < 0) >> + goto error; >> + >> + event_array_pages++; > > Should this increment happen in the 'if(!array_page)' clause? No. event_array_pages is the number of pages Xen is aware of. Note how we zero it when resuming on a new domain with the FIFO-based ABI initially disabled. >> + } >> + return 0; >> + >> + error: >> + if (event_array_pages == 0) >> + panic("xen: unable to expand event array with initial page >> (%d)\n", ret); >> + else >> + printk(KERN_ERR "xen: unable to expand event array (%d)\n", >> ret); >> + free_unused_array_pages(); > > Do you need to clean up in the hypervisor as well? There's noting to clean up in the hypervisor here. free_unused_array_pages() is freeing array pages that Xen is not aware of. >> +static void evtchn_fifo_mask(unsigned port) >> +{ >> + event_word_t *word = event_word_from_port(port); >> + if (word) >> + sync_set_bit(EVTCHN_FIFO_MASKED, BM(word)); > > You are testing 'word' here but not in the routines above (or below). I think the test can be removed. The common code used to try and mask all events even if there were no array pages yet, but it doesn't do this any more. >> +} >> + >> +static void evtchn_fifo_unmask(unsigned port) >> +{ >> + event_word_t *word = event_word_from_port(port); >> + >> + BUG_ON(!irqs_disabled()); >> + >> + sync_clear_bit(EVTCHN_FIFO_MASKED, BM(word)); >> + if (sync_test_bit(EVTCHN_FIFO_PENDING, BM(word))) { >> + struct evtchn_unmask unmask = { .port = port }; >> + (void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask, &unmask); >> + } >> +} > > 2-level unmasking is somewhat more elaborate, with it trying to avoid > races on pending events. Is this not a concern here? The 2-level unmask is trying to avoid doing a hypercall as an optimization. This optimization is not possible so the code here is much simpler. >> + if (head == 0) { >> + rmb(); /* Ensure word is up-to-date before reading head. */ >> + head = control_block->head[priority]; >> + } >> + >> + port = head; >> + word = event_word_from_port(port); > > Do you need to check for 'word!=NULL'? You don't check it in > clear_linked() (which is maybe where this should be done). I don't think so. The kernel trusts Xen to only set valid LINK fields. >> +static void evtchn_fifo_resume(void) >> +{ >> + unsigned cpu; >> + >> + for_each_possible_cpu(cpu) { >> + void *control_block = per_cpu(cpu_control_block, cpu); >> + struct evtchn_init_control init_control; >> + int ret; >> + >> + if (!control_block) >> + continue; >> + >> + /* >> + * If this CPU is offline, take the opportunity to >> + * free the control block while it is not being >> + * used. >> + */ >> + if (!cpu_online(cpu)) { >> + free_page((unsigned long)control_block); >> + per_cpu(cpu_control_block, cpu) = NULL; >> + continue; >> + } > > Have you tested offlining/onlining CPUs (lots of them)? I am asking > because I see EVTCHNOP_init_control both here > and in init_control_block() but I don't see anything that would "deinit" > control block for which you are freeing the page above. It's not possible to "deinit" a control block. The hypervisor deliberately doesn't provide an operation for this. Note that evtchn_fifo_resume() is called when the guest is resumed in a new domain which does not have any control blocks initialized yet. So, in the case above, we're freeing a control block that Xen isn't aware of yet. >> + int ret = 0; >> + >> + switch (action) { >> + case CPU_UP_PREPARE: >> + if (!per_cpu(cpu_control_block, cpu)) >> + ret = evtchn_fifo_init_control_block(cpu); >> + break; >> + default: >> + break; >> + } > > What happens when you offline a CPU? All the control blocks remain initialized, available for use when the CPU is onlined again. This is no different to the per-VCPU shared info. This does all work fine[1]. David [1] Once I fixed a recent bug I introduced into patch 10 which would accidentally trash the IPIs/VIRQs for VCPU 0 instead of the offlined VCPU. Oops. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.