[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Memory corruption bug with Xen PV Dom0 and BOSS-S1 RAID card


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Paweł Srokosz <pawel.srokosz@xxxxxxx>
  • From: Jürgen Groß <jgross@xxxxxxxx>
  • Date: Thu, 20 Feb 2025 10:31:02 +0100
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, andrew cooper3 <andrew.cooper3@xxxxxxxxxx>, JBeulich@xxxxxxxx
  • Delivery-date: Thu, 20 Feb 2025 09:31:15 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20.02.25 10:16, Roger Pau Monné wrote:
On Wed, Feb 19, 2025 at 07:37:47PM +0100, Paweł Srokosz wrote:
Hello,

So the issue doesn't happen on debug=y builds? That's unexpected.  I would
expect the opposite, that some code in Linux assumes that pfn + 1 == mfn +
1, and hence breaks when the relation is reversed.

It was also surprising for me but I think the key thing is that debug=y
causes whole mapping to be reversed so each PFN lands on completely different
MFN e.g. MFN=0x1300000 is mapped to PFN=0x20e50c in ndebug, but in debug
it's mapped to PFN=0x5FFFFF. I guess that's why I can't reproduce the
problem.

Can you see if you can reproduce with dom0-iommu=strict in the Xen command
line?

Unfortunately, it doesn't help. But I have few more observations.

Firstly, I checked the "xen-mfndump dump-m2p" output and found that misread
blocks are mapped to suspiciously round MFNs. I have different versions of
Xen and Linux kernel on each machine and I see some coincidence.

I'm writing few huge files without Xen to ensure that they have been written
correctly (because under Xen both read and writeback is affected). Then I'm
booting to Xen, memory-mapping the files and reading each page. I see that when
block is corrupted, it is mapped on round MFN e.g. pfn=0x5095d9/mfn=0x1600000,
another on pfn=0x4095d9/mfn=0x1500000 etc.

On another machine with different Linux/Xen version these faults appear on
pfn=0x20e50c/mfn=0x1300000, pfn=0x30e50c/mfn=0x1400000 etc.

I also noticed that during read of page that is mapped to
pfn=0x20e50c/mfn=0x1300000, I'm getting these faults from DMAR:

```
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 1200000000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 1200001000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 1200006000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 1200008000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 1200009000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 120000a000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 120000c000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
```

That's interesting, it seems to me that Linux is assuming that pages
at certain boundaries are superpages, and thus it can just increase
the mfn to get the next physical page.

I'm not sure this is true. See below.

and every time I'm dropping the cache and reading this region, I'm getting
DMAR faults on few random addresses from 1200000000-120000f000 range (I guess
MFNs 0x1200000-120000f). MFNs 0x1200000-0x12000ff are not mapped to any PFN in
Dom0 (based on xen-mfndump output.).

It would be very interesting to figure out where those requests
originate, iow: which entity in Linux creates the bios with the
faulting address(es).

I _think_ this is related to the kernel trying to get some contiguous areas
for the buffers used by the I/Os. As those areas are being given back after
the I/O, they don't appear in the mfndump.

It's a wild guess, but could you try to boot Linux with swiotlb=force
on the command line and attempt to trigger the issue?  I wonder
whether imposing the usage of the swiotlb will surface the issues as
CPU accesses, rather then IOMMU faults, and that could get us a trace
inside Linux of how those requests are generated.

On the other hand, I'm not getting these DMAR faults while reading other 
regions.
Also I can't trigger the bug with reversed Dom0 mapping, even if I fill the page
cache with reads.

There's possibly some condition we are missing that causes a component
in Linux to assume the next address is mfn + 1, instead of doing the
full address translation from the linear or pfn space.

My theory is:

The kernel is seeing the used buffer to be a physically contiguous area,
so it is _not_ using a scatter-gather list (it does in the debug Xen case,
resulting in it not to show any errors). Unfortunately the buffer is not
aligned to its size, so swiotlb-xen will remap the buffer to a suitably
aligned one. The driver will then use the returned machine address for
I/Os to both the devices of the RAID configuration. When the first I/O is
done, the driver probably is calling the DMA unmap or device sync function
already, causing the intermediate contiguous region to be destroyed again
(this is the time when the DMAR errors should show up for the 2nd I/O still
running).

So the main issue IMHO is, that a DMA buffer mapped for one device is used
for 2 devices instead.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.