[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Likely bug in the PV driver v9.0



Hi Paul,

the Dom0 is sitting on brand new hardware with a fresh installed ubuntu 20.04 and the NIC being used is the onboard one:

lspci -vv:
09:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
        Subsystem: ASRock Incorporation Motherboard (one of many)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 35
        Region 0: I/O ports at b000 [size=256]
        Region 2: Memory at fc304000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory at fc300000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 01
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 26.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
                         10BitTagComp-, 10BitTagReq-, OBFF Via message/WAKE#, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, TPHComp-, ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00000800
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 7a-96-1a-59-a1-a8-00-00
        Capabilities: [170 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [178 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Kernel driver in use: r8169
        Kernel modules: r8169

All settings are on default.

Best, Oliver


Am 07.05.20 um 14:03 schrieb Paul Durrant:

Oliver,

 

  That’s interesting. It suggests a bug in the guest tx side checksum calculation. This is normally done by netback setting up metadata in skb and having either the kernel or the h/w driver do the calculation. Disabling the option in the guest means the calculation will be done in-guest by XENVIF before the segment is passed to netback. Hence, it sounds like your problem may actually be in your dom0 or NIC (possibly failing to handle some quirk of the RDP packets).

 

  Cheers,

 

  Paul

 

 

From: Oliver Linden <oliver_linden@xxxxxxxxxxx>
Sent: 07 May 2020 12:34
To: paul@xxxxxxx; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Re: Likely bug in the PV driver v9.0

 

Hi Paul,

that was a perfect hint!

Disabling all features in the advanced properties pane allowed to reconnect via RDP.

I subsequently activated the features and was able to nail it down to the two "TCP Checksum Offload (IPv[46])" entries. For those two you need to set them to either "Disabled" or "RX enabled". TX and RX&TX enabled are breaking RDP connectivity.

Many thanks for your support, it's highly appreciated.

Best, Oliver

Am 06.05.20 um 09:26 schrieb Paul Durrant:

Hi Oliver,

 

  Xen 4.9 and Ubuntu 18.04 are clearly both a little old. I guess it is possible that changes in netback have caused problems. I think the next step is probably to disable all offloads (checksum and LSO) in advanced properties pane for the PV network frontend and see if the problem still exists. If that doesn’t have any effect then next would be to collect wireshark traces and look for oddities around RDP login.

 

  Cheers,

 

    Paul

 

 

From: Oliver Linden <oliver_linden@xxxxxxxxxxx>
Sent: 05 May 2020 18:38
To: paul@xxxxxxx; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Re: Likely bug in the PV driver v9.0

 

Hello Paul,

thanks a lot for your reply. I already reached out to the Xen IRC channels but didn't get a response besides the email address I've used for my initial email.

The Windows firewall settings don't have any influence on the behavior. I tested that already. But as far as I can recall I had the drivers installed on a Windows 10 1909 DomU on my old server running an upgraded ubuntu 18.04 with Xen 4.9 and xend still being active (originally server install date was Dec 2012 with ubuntu 12.04 and their Xen version).

With this setup the v9 drivers were working with RDP. Does this give you any hint/idea?

Best, Oliver

Am 05.05.20 um 15:56 schrieb Paul Durrant:

Hi Oliver,

 

  I can only think this is a checksumming issue. I can’t see how else a network driver operating no higher than L4 (for checksum/LSO) would be able to affect a very specific part of a higher level protocol. Although, one thing to watch out for… in the past I have seen Windows do things like re-enabling firewall rules when there is a change in the network stack so you might want to check that.

 

  Cheers,

 

    Paul

 

From: win-pv-devel <win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Oliver Linden
Sent: 04 May 2020 18:06
To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Likely bug in the PV driver v9.0

 

Dear all,

I'm observing an annoying bug with the v9 Windows PV drivers. It's easy reproducible here on my side:

Dom0:
ubuntu 20.04 freshly installed around Easter with there version of Xen (4.11) and only using the xl tool-stack.

DomU:
With every fresh installation of Windows 10 v1909 German the following can be reproduced:

  • Everything is working but slow as expected
  • Installation of the v9 drivers works well without any issues and results in significantly improved speed

but

RDP connections to such a machine aren't possible anymore. The RDP service always asks for a user/PW but never accepts it. To be precise, the RDP connections are working

  • with the plain vanilla installation
  • with the v9 drivers being installed except the network class & driver

Since I'm using my Windows machines predominately with RDP clients it would be great if this can be solved. Please let me know if you need any kind of details from my side.

Best, Oliver

 

Attachment: signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.