[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen-netback: make feature-rx-notify mandatory -- Breaks stubdoms

On 10/12/14 15:07, Ian Campbell wrote:
> On Wed, 2014-12-10 at 14:12 +0000, David Vrabel wrote:
>> On 10/12/14 13:42, John wrote:
>>> David,
>>> This patch you put into 3.18.0 appears to break the latest version of
>>> stubdomains. I found this out today when I tried to update a machine to
>>> 3.18.0 and all of the domUs crashed on start with the dmesg output like
>>> this:
>> Cc'ing the lists and relevant netback maintainers.
>> I guess the stubdoms are using minios's netfront?  This is something I
>> forgot about when deciding if it was ok to make this feature mandatory.
> Oh bum, me too :/
>> The patch cannot be reverted as it's a prerequisite for a critical
>> (security) bug fix.  I am also unconvinced that the no-feature-rx-notify
>> support worked correctly anyway.
>> This can be resolved by:
>> - Fixing minios's netfront to support feature-rx-notify. This should be
>> easy but wouldn't help existing Xen deployments.
> I think this is worth doing in its own right, but as you say it doesn't
> help existing users.
>> - Reimplement feature-rx-notify support.  I think the easiest way is to
>> queue packets on the guest Rx internal queue with a short expiry time.
> Right, I don't think we especially need to make this case good (so long
> as it doesn't reintroduce a security hole!).
> In principal we aren't really obliged to queue at all, but since all the
> infrastructure for queuing and timing out all exists I suppose it would
> be simple enough to implement and a bit less harsh.
> Given we now have XENVIF_RX_QUEUE_BYTES and rx_drain_timeout_jiffies we
> don't have the infinite queue any more. So does the expiry in this case
> actually need to be shorter than the norm? Does it cause any extra
> issues to keep them around for tx_drain_timeout_jiffies rather than some
> shorter time?

If the internal guest rx queue fills and the (host) tx queue is stopped,
it will take tx_drain_timeout for the thread to wake up and notice if
the frontend placed any rx requests on the ring.  This could potentially
end up where you shovel 512k through stall for 10 s, put another 512k
through, stall for 10 s again and so on.

The rx stall detection will also need to be disabled since there would
be no way for the frontend to signal rx ready.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.