[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xen-netback: make feature-rx-notify mandatory -- Breaks stubdoms
On 10/12/14 16:20, Ian Campbell wrote: > On Wed, 2014-12-10 at 15:29 +0000, David Vrabel wrote: >> On 10/12/14 15:07, Ian Campbell wrote: >>> On Wed, 2014-12-10 at 14:12 +0000, David Vrabel wrote: >>>> On 10/12/14 13:42, John wrote: >>>>> David, >>>>> >>>>> This patch you put into 3.18.0 appears to break the latest version of >>>>> stubdomains. I found this out today when I tried to update a machine to >>>>> 3.18.0 and all of the domUs crashed on start with the dmesg output like >>>>> this: >>>> >>>> Cc'ing the lists and relevant netback maintainers. >>>> >>>> I guess the stubdoms are using minios's netfront? This is something I >>>> forgot about when deciding if it was ok to make this feature mandatory. >>> >>> Oh bum, me too :/ >>> >>>> The patch cannot be reverted as it's a prerequisite for a critical >>>> (security) bug fix. I am also unconvinced that the no-feature-rx-notify >>>> support worked correctly anyway. >>>> >>>> This can be resolved by: >>>> >>>> - Fixing minios's netfront to support feature-rx-notify. This should be >>>> easy but wouldn't help existing Xen deployments. >>> >>> I think this is worth doing in its own right, but as you say it doesn't >>> help existing users. >>> >>>> - Reimplement feature-rx-notify support. I think the easiest way is to >>>> queue packets on the guest Rx internal queue with a short expiry time. >>> >>> Right, I don't think we especially need to make this case good (so long >>> as it doesn't reintroduce a security hole!). >>> >>> In principal we aren't really obliged to queue at all, but since all the >>> infrastructure for queuing and timing out all exists I suppose it would >>> be simple enough to implement and a bit less harsh. >>> >>> Given we now have XENVIF_RX_QUEUE_BYTES and rx_drain_timeout_jiffies we >>> don't have the infinite queue any more. So does the expiry in this case >>> actually need to be shorter than the norm? Does it cause any extra >>> issues to keep them around for tx_drain_timeout_jiffies rather than some >>> shorter time? >> >> If the internal guest rx queue fills and the (host) tx queue is stopped, >> it will take tx_drain_timeout for the thread to wake up and notice if >> the frontend placed any rx requests on the ring. This could potentially >> end up where you shovel 512k through stall for 10 s, put another 512k >> through, stall for 10 s again and so on. > > Ah, true, that's not so great. > > What about if we don't queue at all(*) if rx-notify isn't supported, i.e > just drop the packet on the floor in start_xmit if the ring is full? > Would that be so bad? It would surely be simple... There needs to be a queue between start_xmit and the rx thread so checking for ring state in start_xmit doesn't help here since the internal queue can fill before the thread wakes and begins to drain it. netback could complete the request directly in start_xmit, avoiding the internal queue but not allowing for any batching but I don't think it is a good idea to add a different data path for this mode. > (*) Not counting the "queue" which is the ring itself. > >> The rx stall detection will also need to be disabled since there would >> be no way for the frontend to signal rx ready. > > Agreed. > > Could be trivially argued to be safe if we were just dropping packets on > ring overflow... David _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |