[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 13/13] xen/events: use the FIFO-based ABI if available



On Tue, Sep 24, 2013 at 05:48:58PM +0100, David Vrabel wrote:
> On 24/09/2013 16:50, Konrad Rzeszutek Wilk wrote:
> > On Fri, Sep 13, 2013 at 06:00:01PM +0100, David Vrabel wrote:
> >> +
> >> +  error:
> >> +  if (event_array_pages == 0)
> >> +          panic("xen: unable to expand event array with initial page 
> >> (%d)\n", ret);
> > 
> > Shouldn't we just printk and return a negative value so we can use
> > the old style event channels?
> 
> No, once you've called EVTCHNOP_init_control for at least one VCPU the
> hypervisor has switched ABIs and there is (deliberatly[*]) no mechanism
> to switch back.
> 
> [*] it greatly simplifies the hypervisor ABI and implementation if we
> don't have to unwind a half initialized switch over.  This failure case
> will never happen in practise as the hypervisor will run out of domheap
> pages for the guest memory before it runs out of global mapping space.

OK. And that is nicely mentioned in the design doc which I failed to spot
until now.
> 
> >> +  else
> >> +          printk(KERN_ERR "xen: unable to expand event array (%d)\n", 
> >> ret);
> >> +  free_unused_array_pages();
> >> +  return ret;
> >> +}
> >> +
> >> +static void fifo_bind_to_cpu(struct irq_info *info, int cpu)
> >> +{
> >> +  /* no-op */
> > 
> > Perhaps you can also say: "The event array is shared between all of the
> > guests VCPUs." (from design-E).
> 
> That's not relevant here.  This is a no-op because there is no guest
> side state necessary for tracking which VCPU an event channel is bound
> to -- this is all done by the per-VCPU queues.

Ah, gotcha.
> 
> >> +static void fifo_unmask(int port)
> >> +{
> >> +  unsigned int cpu = get_cpu();
> >> +  bool do_hypercall = false;
> >> +  bool evtchn_pending = false;
> >> +
> >> +  BUG_ON(!irqs_disabled());
> >> +
> >> +  if (unlikely((cpu != cpu_from_evtchn(port))))
> >> +          do_hypercall = true;
> > 
> > Since the event array is shared across all of the vCPUs is this still
> > needed?
> 
> The shared event array isn't relevant (the 2-level ABI also has shared
> pending and mask arrays).
> 
> But, since a hypercall is always required if an event is pending then we
> can omit the above check -- it's not possible (as in the 2-level case)
> to optimize out the hypercall if we're already on the VCPU.
> 
> >> +static uint32_t clear_linked(volatile event_word_t *word)
> >> +{
> >> +    event_word_t n, o, w;
> > 
> > How about just 'new', 'old', 'temp' ?
> 
> The letters match the design doc.

Since you have to modify the design document anyway would it make
sense to alter it as well to have less cryptic variables?

> 
> >> +
> >> +static void fifo_handle_events(int cpu)
> >> +{
> >> +  struct evtchn_fifo_control_block *control_block;
> >> +  uint32_t ready;
> >> +  int q;
> >> +
> >> +  control_block = per_cpu(cpu_control_block, cpu);
> >> +
> >> +  ready = xchg(&control_block->ready, 0);
> >> +
> >> +  while (ready & EVTCHN_FIFO_READY_MASK) {
> >> +          q = find_first_bit(BM(&ready), EVTCHN_FIFO_MAX_QUEUES);
> >> +          consume_one_event(cpu, control_block, q, &ready);
> >> +          ready |= xchg(&control_block->ready, 0);
> > 
> > Say priority 0 is where VIRQ_TIMER is set and the TIMER is set to
> > be incredibly short.
> > 
> > Couldn't this cause a scenario where the guest clears the ready in
> > consume_one_event (clear_bit(0, BM(ready)) and the hypervisor
> > at the same time can set the bit. We then process the IRQ (virq_timer).
> > 
> > Then we get to the xchg here and end up with the ready having the
> > priority 0 bit set again. 
> > 
> > And then loop again and again..
> 
> Don't do that then.  This is no different to the behaviour of hardware
> interrupts.

I was thinking there might be some mitigation techinque - and there is.

The Linux kernel would disable the IRQ, which hopefully means that this
defective port ends up being masked.


> 
> >> +          ret = HYPERVISOR_event_channel_op(EVTCHNOP_init_control,
> >> +                                            &init_control);
> >> +          if (ret < 0)
> >> +                  BUG();
> > 
> > So if this is run after migration to an older hypevisor won't we
> > blow up here? Shouldn't we just exit with an return and use the
> > old style classis events?
> 
> Migrating to older hypervisors has never been supported.  You can only
> go forwards.

OK.
> 
> >> +static int __cpuinit fifo_cpu_notification(struct notifier_block *self,
> >> +                                     unsigned long action, void *hcpu)
> >> +{
> >> +  int cpu = (long)hcpu;
> >> +  int ret = 0;
> >> +
> >> +  switch (action) {
> >> +  case CPU_UP_PREPARE:
> >> +          if (!per_cpu(cpu_control_block, cpu))
> >> +                  ret = fifo_init_control_block(cpu);
> >> +          break;
> >> +  default:
> >> +          break;
> >> +  }
> >> +  return ret < 0 ? NOTIFY_BAD : NOTIFY_OK;
> > 
> > I think you should return NOTIFY_OK.
> > 
> > Say you do this:
> > 
> > 1) Launch guest on a new hypervisor (which has FIFO ABI)
> > 2) Bring down all CPUs but one.
> > 3) Migrate to an older hypervisor (no FIFI ABI)
> > 4) Bring up all of the CPUs.
> > 
> > We shouldn't return NOTIFY_BAD b/c the hypervisor we are resuming
> > on is too old. We should just continue on without the FIFO mechanism
> > in place.
> 
> Again, migrating to an older hypervisor has never been supported.  Also,
> once we have switched to the FIFO-based ABI for one VCPU we cannot go
> back and it's better to fail to bring up the VCPU than have one that
> cannot receive events.

OK.
> 
> David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.