Xen project Mailing List

Re: Proposal: use disk sequence numbers to avoid races in blkback

To: Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 10 May 2022 12:57:48 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=S/kWedNJU5vEz0bEvj9CLmWre4hLCaCi5+i1k0Q1Tow=; b=by+mD2peuMGgGsbDtlH0TdFsusGLvGen7gRFWMuwGKObzzCO4zMH0rVlzih37Hd5p5ZUCPebFsiithV7H/whWgHR6pFNHFihu9D68Z7Yz6BCPeV5BvOPAvYcGICObLlyBi9HuSXqNmN4SrBiACbsXERwybLSljTS+BzsemdgZwXZB3vXRmTrukIyxP3/hVL4LBW9xSRo90U5V/UMzhNNIg3gq9FHX8zeoGX8w+FpdrZ+CFjfNEic8ov1BYSK9uj0mqu+sbQmtqEHpW/8AGhuzrxTw+5MnbHRIeFqxkuD65oL38Uw0s961Dzjjlf4FHjlpTyAkPTTwd2YORECfN5KxA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=b3oynJkTBePB8R/vIma7LbpntyGRwHx7eGAzYh5MOKbqbzXDdyl5wBG4rtS41FeC/AjQ1TNydt/6CUQPws2V8BADgggnT2ZstKcbEMG6SfU1DTgcyb+AMxQJ5ib77VSrTHYy2NPe3XhsqV2JVFDl/YbVozOBUh1Jaj9sloAdH2Hza1gf46wsUWg2Glx0Hhj59yLMGucjEUwW3ImQ0RTo0utxtUQGbels43o+mGBH7vMadFFCGOES7+USb5djYn6gru3WtNlxjFsL3avrVbZkb87Mam/C5eoUlYdVbI0ttVwmKYCrZ2LpEl8NgsJ5AKXap2BNYAlWmTyXGZApl3P87Q==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: Xen developer discussion <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 10 May 2022 10:58:03 +0000

Ironport-data: A9a23:jEBF1azyUOrs/EnWenF6t+cixyrEfRIJ4+MujC+fZmUNrF6WrkUGy 2ZJXW6Ba/reYmemKohyYd6wpxsA65aHydMwSwo+rSAxQypGp/SeCIXCJC8cHc8zwu4v7q5Dx 59DAjUVBJlsFhcwnj/0bv656yMUOZigHtIQMsadUsxKbVIiGX5JZS5LwbZj2NY12IDhWWthh PupyyHhEA79s9JLGjp8B5Kr8HuDa9yr5Vv0FnRnDRx6lAe2e0s9VfrzFonoR5fMeaFGH/bSe gr25OrRElU1XfsaIojNfr7TKiXmS1NJVOSEoiI+t6OK2nCuqsGuu0qS2TV1hUp/0l20c95NJ Npl5bWVVCsUOfzwyc84VjxHST55GbMW0eqSSZS/mZT7I0zuVVLJmq0rJmdpeIoS96BwHH1E8 uEeJHYVdBefiumqwbW9DO5xmsAkK8qtN4Qa0p1i5WiBUbB6HtacG+OTvY8wMDQY36iiGd7EY MUUc3x3ZQnoaBxTIFYHTpk5mY9Eg1GgK2wF9g3M/sLb5UD5yhV2zJHwHOCMRd6maoJWjmSyg WfvqjGR7hYycYb3JSC+2n6hg+7nnCXlWZkTHrm16v5rhlKIwmUZThYRUDOTsfS/z0KzRd9bA 0gV4TY167g/8lSxSdvwVAH+p2SL1jYZUsBVGvc36ymMzLTV+AeTAmUYTj9HZ8civcVwTjsvv mJlhPvsDD1r9beTFnSU8+7MqSvoYHBFa2gfeSUDUA0JpcH5p50+hQ7OSdAlF7OpitryGnf7x DXiQDUCuoj/RPUjj82TlW0rSRr1znQVZmbZPjnqY18=

Ironport-hdrordr: A9a23:ykcoZaEkvpco7fcypLqFepHXdLJyesId70hD6qkvc3Fom52j/f xGws5x6faVslkssb8b6LW90Y27MAvhHPlOkPIs1NaZLXDbUQ6TQL2KgrGD/9SNIVycygcZ79 YbT0EcMqyOMbEZt7ec3ODQKb9Jrri6GeKT9IHjJh9WPH1XgspbnmNE42igYy9LrF4sP+tFKH PQ3LsPmxOQPVAsKuirDHgMWObO4/XNiZLdeBYDQzoq8hOHgz+E4KPzV0Hw5GZUbxp/hZMZtU TVmQ3w4auu99m91x/nzmfWq7BbgsHoxNdvDNGFzuIVNjLvoAC1Y5kJYczLgBkF5MWUrHo6mt jFpBkte+x19nPqZ2mw5SDg3gHxuQxen0PK+Bu9uz/OsMb5TDU1B45qnoRCaCbU7EImoZVVzL 9L93jxjesZMTrw2ADGo/TYXRBjkUS55VA4l/QIsnBZWYwCLJdMsI0k+l9PGptoJlO31GkeKp guMCjg3ocXTbvDBEqp/VWHgebcE0jbJy32DHTr4aeuonprdHMQ9Tps+CVQpAZEyHsHceg02w 31CNUXqFhwdL5nUUtcPpZ3fSLlMB26ffrzWFjiUmjPJeUgB0/njaLRzfEc2NyKEaZ4vqfa3q 6xGm9liQ==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, May 05, 2022 at 08:30:17PM -0400, Demi Marie Obenour wrote: > Proposal: Check disk sequence numbers in blkback > ================================================ > > Currently, adding block devices to a domain is racy. libxl writes the > major and minor number of the device to XenStore, but it does not keep > the block device open until blkback has opened it. This creates a race > condition, as it is possible for the device to be destroyed and another > device allocated with the same major and minor numbers. Loop devices > are the most obvious example, since /dev/loop0 can be reused again and > again, but the same problem can also happen with device-mapper devices. > If the major and minor numbers are reused before blkback has attached to > the device, blkback will pass the wrong device to the domain, with > obvious security consequences. > > Other programs on Linux have the same problem, and a solution was > committed upstream in the form of disk sequence numbers. A disk > sequence number, or diskseq, is a 64-bit unsigned monotonically > increasing counter. The combination of a major and minor number and a > disk sequence number uniquely identifies a block device for the entire > uptime of the system. Seems fine to me, this is just an extra check to make sure the block device opened by blkback is the one that user space intended. I would see diskseq as a kind of checksum. > I propose that blkback check for an unsigned 64-bit hexadecimal XenStore > entry named “diskseq”. If the entry exists, blkback checks that the > number stored there matches the disk sequence number of the device. If > it does not exist, the check is skipped. If reading the entry fails for > any other reason, the entry is malformed, or if the sequence number is > wrong, blkback refuses to export the device. > > The toolstack changes are more involved for two reasons: > > 1. To ensure that loop devices are not leaked if the toolstack crashes, > they must be created with the delete-on-close flag set. This > requires that the toolstack hold the device open until blkback has > acquired a handle to it. Does this work with loop devices? I would expect that you need to issue a losetup call to detach the device. Even more, the loop device is created by the block script, but there's also a window between the block script execution and the toolstack knowing about the device, which could also allow for a leak? > > 2. For block devices that are opened by path, the toolstack needs to > ensure that the device it has opened is actually the device it > intended to open. This requires device-specific verification of the > open file descriptor. This is not needed for regular files, as the > LOOP_CONFIGURE ioctl is called on an existing loop device and sets > its backing file. > > The first is fairly easy in C. It can be accomplished by means of a > XenStore watch on the “status” entry. Once that watch fires, blkback > has opened the device, so the toolstack can safely close its file > descriptor. Does the toolstack really need to close the device? What harm does it do to keep the handle open until the domain is destroyed? What about disk hotplug? Which entity will keep the device opened in that case? Is xl block-attach going to block until the device switches to the connected state? > The second is significantly more difficult. It requires the block > script to be aware of at least device-mapper devices and LVM2 logical > volumes. The general technique is common to all block devices: obtain > the sequence number (via the BLKGETDISKSEQ() ioctl) and its major and > minor numbers (via fstat()). Then open /sys/dev/block/MAJOR:MINOR to > get a directory file descriptor, and use openat(2) and read(2) to get > various sysfs attributes. Finally, read the diskseq sysfs attribute and > check that it matches the sequence number from BLKGETDISKSEQ(). > Alternatively, one can use device-specific methods, such as > device-mapper ioctls. > > Device-mapper devices can be detected via the ‘dm/name’ sysfs attribute, > which must match the name under ‘/dev/mapper/’. If the name is of the > form ‘/dev/X/Y’, and the ‘dm/uuid’ attribute starts with the literal > string “LVM-”, then the expected ‘dm/name’ attribute should be found by > doubling all ‘-’ characters in X and Y, and then joining X and Y with > another ‘-’. This accounts for LVM2 logical volumes. Alternatively, > one can use device-mapper ioctls to both check if a device is a > device-mapper device, and to obtain its name and UUID. I plan on going > with the latter route. Likely a stupid remark, but needs obviously needs to be kept to Linux only. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.