[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
- To: G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx>
- From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
- Date: Mon, 10 Jan 2022 15:53:32 +0100
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lvAYzWc+IjyBTaIe18RY2tsPcxFALWaome1TknGHqFo=; b=ER/s+CDnoHo2mvQxMO11JcwtTRHuu1yNF/fDEdrD4gnzE9NwnUHydjf+7Gm+RbP4PFROybcYxcrHq3VjbealDYyiVP7z98DwH7Tc5codPATbTfzpZE3JJp0daFXJfWrhFZ+iugwQAslb+rbDcpDBLch5HOG1IYG8GUhSBWF173wzG3pPXdwPusTlimnin0f3gIZc0kFc3Jt5vsS+ZsXsFrfRZ0SSTD/tJMCpYQHE31u4xcpljitRsASmfoaMjL4lpjuUZKV/x+nHbmqvJ2+Dc0vGYzYzZt5LUwspJENb9WWiWGgCoMa1K5hORQAWp+M1NbqUkItUfiPvmzGbZuhFTA==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kvC8uw9K/Jc+7AbpoV/IZhWS+HvLTmz9kF6h1YgwxW9WqtCQdbfWEm0cTU7cS/cuyX1l0X4sU7cuficK6F3JjwvflDvhFHmPZjnfMpTJE9872Xh+zcqV1DkxnI5anqXCPHuSE62ck3KtzlgWJ6ZrBu7exM4ZOGmNVwJTUEN9O/3LI1U63vqpw/rABUm7sTJM7JZF42I3ShVB/uu/f+Zxkkcs/EAp+zfNfKx+onqNwfpSrqb3Dhu43Jan47la6NTgvJeKYzZ0OkwzIRGRFOOOda7K6Mwk7tqVh5Xql0mc+TSwq64Ex7t8qna8ZdSowGCKgpls24qUxlRu9k5tINafgQ==
- Authentication-results: esa4.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
- Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
- Delivery-date: Mon, 10 Jan 2022 14:53:53 +0000
- Ironport-data: A9a23:PD/vDqmHcRdGFdFyASq7gy3o5gxiIURdPkR7XQ2eYbSJt1+Wr1Gzt xIXXj/UaK2OYmr3KdkjYIzgpkJUvcPXmIVmGgdq/yExHiMWpZLJC+rCIxarNUt+DCFioGGLT Sk6QoOdRCzhZiaE/n9BClVlxJVF/fngqoDUUYYoAQgsA180IMsdoUg7wbRh29cw2YLR7z6l4 rseneWOYDdJ5BYsWo4kw/rrRMRH5amaVJsw5zTSVNgT1LPsvyB94KE3fMldG0DQUIhMdtNWc s6YpF2PEsE1yD92Yj+tuu6TnkTn2dc+NyDW4pZdc/DKbhSvOkXee0v0XRYRQR4/ttmHozx+4 OhulbyvYyQrBIPJhtRHSjd0FR5sYpQTrdcrIVDn2SCS50jPcn+qyPRyFkAme4Yf/46bA0kXq 6ZecmpUKEne2aTmm9pXScE17ignBNPsM44F/Glp0BnSDOo8QICFSKLPjTNd9Glr2Z8SQ66HD yYfQQRzVxTFThx9AFsKKKgenLaPr1zyYjIN/Tp5ooJoujOOnWSdyoPFINfTP9CHW8hRtkKZv X7duXT0BAkAM96SwibD9Wij7sfBnDn2XY8OGbqi3uNxjUeIgHcUFQcdWFW8u/a0zEizR7pix 1c8o3R06/JorQryE4e7D0bQTGO4UgA0ZvlwM/IQywO35K/K01yAKEQCcSweUYlz3CMpfgAC2 liMltLvIDVgtryJVH6QnoupQSOO1Ts9djFbO3JdJecRy5y6+dxo0EqTJjp2OPPt1rXI9SfML ydmRcTUr5EaloY12qqy5jgraBr898GSHmbZCug6N19JDz+Vhqb4PeRECnCBtJ6sybp1qHHa5 RDofODEvIgz4WmlznDlfQn0NOjBCwy5GDPdm0VzOJIq6i6g/XWuFagJvm0nexoyb51ZKW6zC KM2he+3zMQKVJdNRfUmC79d9uxwlfSwfTgbfq28giVyjmhZK1bcoXAGib+41GHxikk8+ZzTy r/AGftA+U0yUPw9pBLvHr91+eZymkgWmD2PLbimkUXP+efONRa9FOZeWHPTP79R0U9xiFiPm zqpH5HUm0w3vSyXSnS/zLP/2nhRfCdrXs6n+pUHHgNBSyI/cFwc5zbq6epJU6RunrhPl/eO+ Xe4W0RCz0H4i2GBIgKPAk2Popu2NXqmhX5kbyEqI3iy3H0vPdSm4KsFLsNldrg77u1zi/VzS qBdKcmHB/1OTBXB+igcMsah/NAzKkzziFLcJTehbRg+Y4VkG17D9Oj7c1a97yIJFCe265cz+ uXyygPBTJMfbA1+F8KKOum3xla8sCFFyuJ/VkfFOPdJf0Do/NQ4IiD9lKZvccoNNQ/C1n2R0 APPWUUUouzEookU9tjVhP/b89f1QrUmRkcDRjvV97e7MyXe71GP+44YXbbaZy3ZWUP15L6mO bdfwcbjPaBVh11NqYd9TepmlPps+9v1qrZG5Q14B3GXPU+zA7ZtL3Taj8lCsqpBmu1QtQesA x/d/9BbPfOCOd//EU5XLw0gN7zR2fYRkzjUzPI0PESlu3MnoOvZCR1fb0uWlShQDLppK4d0k +4utfkf5xG7lhd3YM2NiTpZ9jjUI3ENO0n9Wkr23GM/ZtIX92x/
- Ironport-hdrordr: A9a23:Hm2Ft6vXM/l1fA5GC//N/RT77skC7IMji2hC6mlwRA09TyXGra 6TdaUguiMc1gx8ZJhBo7C90KnpewK7yXdQ2/htAV7EZnibhILIFvAZ0WKG+Vzd8kLFh4tgPM tbAsxD4ZjLfCdHZKXBkXmF+rQbsaG6GcmT7I+0pRodLnAJV0gj1XYDNu/yKDwGeOAsP+tBKH Pz3Lshm9L2Ek5nEPhTS0N1FdTrlpnurtbLcBQGDxko5E2nii6p0qfzF1y90g0FWz1C7L8++S yd+jaJqpmLgrWe8FvxxmXT55NZlJ/IzcZCPtWFjowwJi/3ggilSYx9U/mpvSwzosuo9FE2+e O87isIDoBW0Tf8b2u1qRzi103J1ysv0WbrzRuijX7qsaXCNXsHIvsEobgcXgrS6kImst05+r lMxXilu51eCg6FtDjh5vDTPisa13ackD4Hq6o+nnZfWYwRZPt6tooE5n5YF58GAWbT9J0nKu 9zF8vRjcwmP29yV0qp/VWH/ebcHEjaRny9Mw0/U42uondrdUlCvgslLJd1pAZFyHo/I6M0kd gsfJ4Y042mdfVmH56VMt1xNvdfOla9Mi4kD1jiVGgPNJt3c04l+KSHq4nc2omRCeg1Jd0J6d L8bG8=
- Ironport-sdr: PCshai0VNNXma0YrQ7EUfzpFe4Zl+claa/iWpLQiPuG+qtOzMUWMTDKzFA4LqsCnkTGhNxvCjf Cf+j8cH0h8n+UcbRpP5uIY4Ajpo9UE9/ihc1oEI2NpCfCj8JhidnoHynw6ZG9Rdg3TxGE4TkB8 byCjLDFbgPAtcj1vXu6mQCl5ngMJNQOJZ1Qo0lZahfSONheA+9XQ4aV99Cn4k/CPUnuOCGhS7e N+eU0bYVnbhtATiZ/7IlrexvovJ0QNYobHpAXKFzsceKT+l8m0Ux8XRmveWcceMnuaemvpGEA5 GvzxW5V9Kk7FMN8fvDaSYnlL
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
On Sat, Jan 08, 2022 at 01:14:26AM +0800, G.R. wrote:
> On Wed, Jan 5, 2022 at 10:33 PM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
> >
> > On Wed, Jan 05, 2022 at 12:05:39AM +0800, G.R. wrote:
> > > > > > > But seems like this patch is not stable enough yet and has its own
> > > > > > > issue -- memory is not properly released?
> > > > > >
> > > > > > I know. I've been working on improving it this morning and I'm
> > > > > > attaching an updated version below.
> > > > > >
> > > > > Good news.
> > > > > With this new patch, the NAS domU can serve iSCSI disk without OOM
> > > > > panic, at least for a little while.
> > > > > I'm going to keep it up and running for a while to see if it's stable
> > > > > over time.
> > > >
> > > > Thanks again for all the testing. Do you see any difference
> > > > performance wise?
> > > I'm still on a *debug* kernel build to capture any potential panic --
> > > none so far -- no performance testing yet.
> > > Since I'm a home user with a relatively lightweight workload, so far I
> > > didn't observe any difference in daily usage.
> > >
> > > I did some quick iperf3 testing just now.
> >
> > Thanks for doing this.
> >
> > > 1. between nas domU <=> Linux dom0 running on an old i7-3770 based box.
> > > The peak is roughly 12 Gbits/s when domU is the server.
> > > But I do see regression down to ~8.5 Gbits/s when I repeat the test in
> > > a short burst.
> > > The regression can recover when I leave the system idle for a while.
> > >
> > > When dom0 is the iperf3 server, the transfer rate is much lower, down
> > > all the way to 1.x Gbits/s.
> > > Sometimes, I can see the following kernel log repeats during the
> > > testing, likely contributing to the slowdown.
> > > interrupt storm detected on "irq2328:"; throttling interrupt
> > > source
> >
> > I assume the message is in the domU, not the dom0?
> Yes, in the TrueNAS domU.
> BTW, I rebooted back to the stock kernel and the message is no longer
> observed.
>
> With the stock kernel, the transfer rate from dom0 to nas domU can be
> as high as 30Gbps.
> The variation is still observed, sometimes down to ~19Gbps. There is
> no retransmission in this direction.
>
> For the reverse direction, the observed low transfer rate still exists.
> It's still within the range of 1.x Gbps, but should still be better
> than the previous test.
> The huge number of re-transmission is still observed.
> The same behavior can be observed on a stock FreeBSD 12.2 image, so
> this is not specific to TrueNAS.
So that's domU sending the data, and dom0 receiving it.
>
> According to the packet capture, the re-transmission appears to be
> caused by packet reorder.
> Here is one example incident:
> 1. dom0 sees a sequence jump in the incoming stream and begins to send out
> SACKs
> 2. When SACK shows up at domU, it begins to re-transmit lost frames
> (the re-transmit looks weird since it show up as a mixed stream of
> 1448 bytes and 12 bytes packets, instead of always 1448 bytes)
> 3. Suddenly the packets that are believed to have lost show up, dom0
> accept them as if they are re-transmission
Hm, so there seems to be some kind of issue with ordering I would say.
> 4. The actual re-transmission finally shows up in dom0...
> Should we expect packet reorder on a direct virtual link? Sounds fishy to me.
> Any chance we can get this re-transmission fixed?
Does this still happen with all the extra features disabled? (-rxcsum
-txcsum -lro -tso)
> So looks like at least the imbalance between two directions are not
> related to your patch.
> Likely the debug build is a bigger contributor to the perf difference
> in both directions.
>
> I also tried your patch on a release build, and didn't observe any
> major difference in iperf3 numbers.
> Roughly match the 30Gbps and 1.xGbps number on the stock release kernel.
Thanks a lot, will try to get this upstream then.
Roger.
|