[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] DomU's network interface will hung when Dom0 running 32bit
On Tue, Oct 15, 2013 at 05:34:57PM +0800, jianhai luan wrote: > > On 2013-10-15 16:43, Ian Campbell wrote: > >On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote: > >>On 2013-10-14 19:19, Wei Liu wrote: > >>>On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote: > >>>>Hi Ian, > >>>> I meet the DomU's network interface hung issue recently, and have > >>>>been working on the issue from that time. I find that DomU's network > >>>>interface, which send lesser package, will hung if Dom0 running > >>>>32bit and DomU's up-time is very long. I think that one jiffies > >>>>overflow bug exist in the function tx_credit_exceeded(). > >>>> I know the inline function time_after_eq(a,b) will process jiffies > >>>>overflow, but the function have one limit a should little that (b + > >>>>MAX_SIGNAL_LONG). If a large than the value, time_after_eq will > >>>>return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit > >>>>machine. > >>>> If DomU's network interface send lesser package (<0.5k/s if > >>>>jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out > >>>>(credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now, > >>>>next_credit) will failure (should be true). So one timer which will > >>>>not be trigger in short time, and later process will be aborted when > >>>>timer_pending(&vif->credit_timeout) is true. The result will be > >>>>DomU's network interface will be hung in long time (> 40days). > >>>> Please think about the below scenario: > >>>> Condition: > >>>> Dom0 running 32-bit and HZ = 1000 > >>>> vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit > >>>>= 0xffffffff, vif->credit_usec=0 jiffies=0 > >>>> vif receive lesser package (DomU send lesser package). If the > >>>>value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55 > >>>>hours. jiffies will large than 0x7ffffff. we guess jiffies = > >>>>0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and > >>>>one time which expire is 0xfffffff will be pended into system. So > >>>>the interface will hung until jiffies recount 0xffffffff (that will > >>>>need very long time). > >>>If I'm not mistaken you meant time_after_eq(now, next_credit) in > >>>netback. How does next_credit become 0xffffffff? > >>I only assume the value is 0xfffffff, and the value of next_credit > >>isn't point. If the delta between now and next_credit larger than > >>ULONG_MAX, time_after_eq will do wrong judge. > >So it sounds like we need a timer which is independent of the traffic > >being sent to keep credit_timeout.expires rolling over. > > > >Can you propose a patch? > > Because credit_timeout.expire always after jiffies, i judge the > value over the range of time_after_eq() by time_before(now, > vif->credit_timeout.expires). please check the patch. I don't think this really fix the issue for you. You still have chance that now wraps around and falls between expires and next_credit. In that case it's stalled again. Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |