[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Proposal: use disk sequence numbers to avoid races in blkback
On Thu, May 05, 2022 at 08:30:17PM -0400, Demi Marie Obenour wrote: > Proposal: Check disk sequence numbers in blkback > ================================================ > > Currently, adding block devices to a domain is racy. libxl writes the > major and minor number of the device to XenStore, but it does not keep > the block device open until blkback has opened it. This creates a race > condition, as it is possible for the device to be destroyed and another > device allocated with the same major and minor numbers. Loop devices > are the most obvious example, since /dev/loop0 can be reused again and > again, but the same problem can also happen with device-mapper devices. > If the major and minor numbers are reused before blkback has attached to > the device, blkback will pass the wrong device to the domain, with > obvious security consequences. > > Other programs on Linux have the same problem, and a solution was > committed upstream in the form of disk sequence numbers. A disk > sequence number, or diskseq, is a 64-bit unsigned monotonically > increasing counter. The combination of a major and minor number and a > disk sequence number uniquely identifies a block device for the entire > uptime of the system. Seems fine to me, this is just an extra check to make sure the block device opened by blkback is the one that user space intended. I would see diskseq as a kind of checksum. > I propose that blkback check for an unsigned 64-bit hexadecimal XenStore > entry named “diskseq”. If the entry exists, blkback checks that the > number stored there matches the disk sequence number of the device. If > it does not exist, the check is skipped. If reading the entry fails for > any other reason, the entry is malformed, or if the sequence number is > wrong, blkback refuses to export the device. > > The toolstack changes are more involved for two reasons: > > 1. To ensure that loop devices are not leaked if the toolstack crashes, > they must be created with the delete-on-close flag set. This > requires that the toolstack hold the device open until blkback has > acquired a handle to it. Does this work with loop devices? I would expect that you need to issue a losetup call to detach the device. Even more, the loop device is created by the block script, but there's also a window between the block script execution and the toolstack knowing about the device, which could also allow for a leak? > > 2. For block devices that are opened by path, the toolstack needs to > ensure that the device it has opened is actually the device it > intended to open. This requires device-specific verification of the > open file descriptor. This is not needed for regular files, as the > LOOP_CONFIGURE ioctl is called on an existing loop device and sets > its backing file. > > The first is fairly easy in C. It can be accomplished by means of a > XenStore watch on the “status” entry. Once that watch fires, blkback > has opened the device, so the toolstack can safely close its file > descriptor. Does the toolstack really need to close the device? What harm does it do to keep the handle open until the domain is destroyed? What about disk hotplug? Which entity will keep the device opened in that case? Is xl block-attach going to block until the device switches to the connected state? > The second is significantly more difficult. It requires the block > script to be aware of at least device-mapper devices and LVM2 logical > volumes. The general technique is common to all block devices: obtain > the sequence number (via the BLKGETDISKSEQ() ioctl) and its major and > minor numbers (via fstat()). Then open /sys/dev/block/MAJOR:MINOR to > get a directory file descriptor, and use openat(2) and read(2) to get > various sysfs attributes. Finally, read the diskseq sysfs attribute and > check that it matches the sequence number from BLKGETDISKSEQ(). > Alternatively, one can use device-specific methods, such as > device-mapper ioctls. > > Device-mapper devices can be detected via the ‘dm/name’ sysfs attribute, > which must match the name under ‘/dev/mapper/’. If the name is of the > form ‘/dev/X/Y’, and the ‘dm/uuid’ attribute starts with the literal > string “LVM-”, then the expected ‘dm/name’ attribute should be found by > doubling all ‘-’ characters in X and Y, and then joining X and Y with > another ‘-’. This accounts for LVM2 logical volumes. Alternatively, > one can use device-mapper ioctls to both check if a device is a > device-mapper device, and to obtain its name and UUID. I plan on going > with the latter route. Likely a stupid remark, but needs obviously needs to be kept to Linux only. Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |