[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xen 4.18/ARM64 on Raspberry Pi 4B: VLAN traffic crashing Dom0


  • To: xen-users@xxxxxxxxxxxxxxxxxxxx
  • From: Paul Leiber <paul@xxxxxxxxxxxxxxxx>
  • Date: Thu, 14 Sep 2023 23:04:25 +0200
  • Arc-authentication-results: i=1; strato.com; arc=none; dkim=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1694725467; s=strato-dkim-0002; d=strato.com; h=In-Reply-To:From:References:To:Subject:Date:Message-ID:Cc:Date:From: Subject:Sender; bh=9s9DXbb2YC/PUYZzFeTcPdlbDdHdtgc0AOBn4zdDFmY=; b=GhVRu3ie57eFuMr3WV9lo1suJ1NrhJ6B/FfdycAG5hxNOcjb1vWwerjJ5JjRp8IYsr B6kC7KO61Nxa2IL0RtKeFSTBNTpT3BIOobgQ2J37gqXvTDH64S2KvPOIzJ9ywfriKb5/ PBENBM88XLx+GCL9/1kpb8YrlgHuGRJCU8S6jUnObkuc9q16fuIR/cAWBZ5ekC5oaD5L Rfm7QVc1yRGV8a4rzczLl921HiRKHZKAjtX9fLPRtlsF1OPCe4pLxpq606EvsF03tHuQ krAJu6akpk6/0VekO4ds3GBpL/3+UZTZMsFDTIiGorsCGhAj9fydw+e7gDxbHpdJP0IW d1qQ==
  • Arc-seal: i=1; a=rsa-sha256; t=1694725467; cv=none; d=strato.com; s=strato-dkim-0002; b=Jy1bzaIwJdVapERfLhC5wobKFZCRJDmBa9N/D/zyG7lJdt+Rg0TLUQUUOclFaL6doE yJ7Ns7LwceW0pSeOWrwa9BlRWxdccP9wVSfNTre1LZ8TNknuTP0CE7wQaymlQCYv/mlO SvYeq14gjesstBKTr7XOiqXrph6Cyi7n6ICObKYvszX/ksIk52o7Aa8ut/xa/5tDW4Lp Dkc7C5w+J3Da/b8AfrqY32rzh4drutLVYNil99Itlhszn5k9kzolfcotK3bhjqGvFuyd btZCI+zucmXWyx/14Xgi59LuarclcN7Kcm4tn0vgmrNLLDj18QJVxO3fiyqKFEUHH6Tr msyA==
  • Delivery-date: Thu, 14 Sep 2023 21:05:18 +0000
  • List-id: Xen user discussion <xen-users.lists.xenproject.org>


Thanks for your time, zithro.

Am 08.09.2023 um 17:08 schrieb zithro:
> First, I need to mention I've never used bridges+VLANs this way, so I may miss the obvious !
> I -think- it's a network problem, not a Xen one, but what do I know 😄

I also suspect in the meantime that this is a general (Debian, perhaps even Arm64 specific?) network problem. But I am not sure if it can be ruled out by now that Xen plays a role.

> I've often read that bridges on dom0 should have some additional params.
> They would be in the iface config, around "bridge_ports", like :
>      bridge_stp off      # dont use STP (spanning tree proto)
>      bridge_waitport 0   # dont wait for port to be available
>      bridge_fd 0         # no forward delay

Tried that, no change.

> You may also try to enable STP (iirc it's disabled by default on Linux bridges).
> But TBH, I'm not sure those params will help in this case.

Tried that, no change.

> I've also read the VLAN 1 is a bit "special", better avoid it.
> IIUC, untagged traffic would be auto tagged 1. Use ids 2/3, or 10/11, 10/20, etc.

I changed the VLAN numbers. First to 101, 102, 103 etc. This was when I noticed a new strange thing: VLANs with numbers >99 simply don't work on my Raspberry Pi under Debian. VLAN 99 works, VLAN 100 (or everything else >99 that I tried) doesn't work. If I choose a number >99, the VLAN is not configured, "ip a" doesn't list it. Other Debian systems on x64 architecture don't show this behavior, there, it was no problem to set up VLANs > 99. So another data point that there seems to be something fishy about the network on my Raspberry Pi system.

Therefore, I've changed the VLANs to 10, 20, 30 etc., which worked. But it didn't solve the initial problem of the crashing Dom0 and DomUs.

> Other stuff to test :
> - check MAC addresses

What should I check specifically? (However, if there are duplicate MAC addresses (what I am assuming you are aiming at), why would it work when using the same VLAN bridge?)

> - use tcpdump/wireshark remote logging on the real NIC (enabcm6e4ei0) *and* the bridges, to see what really happens, maybe a network/broadcast storm, filling dom0 cpu/memory ?

Now, here it becomes really strange. I started tcpdumps on Dom0, and depending on which interface/bridge traffic was logged, the problem went away, meaning, the DomU was running smoothly for hours, even when accessing the zabbix web interface! Stopping the log makes the system crash reproducably if I access the zabbix web interface.

Logging enabcm6e4ei0 (NIC): no crashes
Logging enabcm6e4ei0.10 (VLAN 10): instant crash
Logging enabcm6e4ei0.20 (VLAN 20): no crashes
Logging xenbr0 (on VLAN 10): instant crash
Logging xenbr1 (on VLAN 20): no crashes

I can't think of a rational explanation why logging the traffic on certain interfaces/bridges should avoid the crash of the complete system, while logging other interfaces/bridges doesn't. Any ideas?

I checked the dumps of enabcm6e4ei0.10 and xenbr0 (where the system crashes) with wireshark, nothing sticks out to me (but I am really no expert in analyzing network traffic). I could send the dumps directly to you, if you want to spend the time.

> - set "loglvl=all" to Xen cmdline to maybe get more info

Done, need to check results. (Serial interface is not connected right now.)

> - how are the interfaces configured in the domUs and in the cfg files ?

/etc/network/interfaces on the DomU on which zabbix is running:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto enX0
iface enX0 inet static
        address xx.xx.xx.xx/24
        gateway xx.xx.xx.xx

iface enX0 inet6 static
        address xxxx::xxxx:xxxx:xxxx:xxxx/64
        gateway xxxx::xxxx:xxxx:xxxx:xxxx
        # use SLAAC to get global IPv6 address from the router
        # we may not enable ipv6 forwarding, otherwise SLAAC gets disabled
        autoconf 1
        accept_ra 2

vif line in the xl.cfg of the same DomU:

vif         = [ 'mac=02:93:0B:61:A5:82,bridge=xenbr1,ip=xx.xx.xx.xx' ]


> - test w/o IPv6

Tried that, no difference.

> You could also show us the outputs of "ip a", "ip link show type bridge" (brctl show), etc.

root@xxx:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enabcm6e4ei0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether d8:3a:dd:28:39:4f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::da3a:ddff:fe28:394f/64 scope link
       valid_lft forever preferred_lft forever
3: enabcm6e4ei0.10@enabcm6e4ei0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master xenbr0 state UP group default qlen 1000
    link/ether d8:3a:dd:28:39:4f brd ff:ff:ff:ff:ff:ff
4: enabcm6e4ei0.20@enabcm6e4ei0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master xenbr1 state UP group default qlen 1000
    link/ether d8:3a:dd:28:39:4f brd ff:ff:ff:ff:ff:ff
    inet6 xxxx::xxxx:xxxx:xxxx:xxxx/64 scope global dynamic mngtmpaddr
       valid_lft 86134sec preferred_lft 14134sec
inet6 xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/64 scope global dynamic mngtmpaddr
       valid_lft 86134sec preferred_lft 14134sec
    inet6 fe80::da3a:ddff:fe28:394f/64 scope link
       valid_lft forever preferred_lft forever
5: xenbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:28:b7:1f:ee:6d brd ff:ff:ff:ff:ff:ff
    inet xx.xx.xx.xx/24 brd xx.xx.xx.255 scope global xenbr0
       valid_lft forever preferred_lft forever
    inet6 xxxx::xxxx:xxxx:xxxx:xxxx/64 scope global dynamic mngtmpaddr
       valid_lft 86135sec preferred_lft 14135sec
inet6 xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/64 scope global dynamic mngtmpaddr
       valid_lft 86135sec preferred_lft 14135sec
    inet6 xxxx::xxxx:xxxx:xxxx:xxxx/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::28:b7ff:fe1f:ee6d/64 scope link
       valid_lft forever preferred_lft forever
6: xenbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c6:11:98:cb:32:bd brd ff:ff:ff:ff:ff:ff
    inet6 xxxx::xxxx:xxxx:xxxx:xxxx/64 scope global dynamic mngtmpaddr
       valid_lft 86280sec preferred_lft 14280sec
inet6 xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/64 scope global dynamic mngtmpaddr
       valid_lft 86280sec preferred_lft 14280sec
    inet6 fe80::c411:98ff:fecb:32bd/64 scope link
       valid_lft forever preferred_lft forever
7: vif1.0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master xenbr0 state UP group default qlen 32
    link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fcff:ffff:feff:ffff/64 scope link
       valid_lft forever preferred_lft forever
8: vif2.0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master xenbr1 state UP group default qlen 32
    link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fcff:ffff:feff:ffff/64 scope link
       valid_lft forever preferred_lft forever

root@xxx:~# ip link show type bridge
5: xenbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 02:28:b7:1f:ee:6d brd ff:ff:ff:ff:ff:ff
6: xenbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether c6:11:98:cb:32:bd brd ff:ff:ff:ff:ff:ff

root@xxx:~# brctl show
bridge name     bridge id               STP enabled     interfaces
xenbr0          8000.0228b71fee6d       no              enabcm6e4ei0.10
                                                        vif1.0
xenbr1          8000.c61198cb32bd       no              enabcm6e4ei0.20
                                                        vif2.0

> PS: I guess it's only in the mail, and should be harmless, but you have two /eni stanzas "VLAN LAN" and "VLAN DMZ_LAN" that should be comments.
>
 Sorry, copy/paste error, fixed. No difference.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.