[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure
Hi Roger, Roger Pau Monné <roger.pau@xxxxxxxxxx> writes: > On Wed, Jun 21, 2023 at 10:07:20PM +0000, Volodymyr Babchuk wrote: >> >> Hi Roger, >> >> Roger Pau Monné <roger.pau@xxxxxxxxxx> writes: >> >> > On Tue, Jun 13, 2023 at 10:32:26AM +0000, Volodymyr Babchuk wrote: >> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> >> >> >> Introduce a per-domain read/write lock to check whether vpci is present, >> >> so we are sure there are no accesses to the contents of the vpci struct >> >> if not. This lock can be used (and in a few cases is used right away) >> >> so that vpci removal can be performed while holding the lock in write >> >> mode. Previously such removal could race with vpci_read for example. >> >> >> >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure >> >> from being removed. >> >> >> >> 2. Writing the command register and ROM BAR register may trigger >> >> modify_bars to run, which in turn may access multiple pdevs while >> >> checking for the existing BAR's overlap. The overlapping check, if done >> >> under the read lock, requires vpci->lock to be acquired on both devices >> >> being compared, which may produce a deadlock. It is not possible to >> >> upgrade read lock to write lock in such a case. So, in order to prevent >> >> the deadlock, check which registers are going to be written and acquire >> >> the lock in the appropriate mode from the beginning. >> >> >> >> All other code, which doesn't lead to pdev->vpci destruction and does not >> >> access multiple pdevs at the same time, can still use a combination of the >> >> read lock and pdev->vpci->lock. >> >> >> >> 3. Optimize if ROM BAR write lock required detection by caching offset >> >> of the ROM BAR register in vpci->header->rom_reg which depends on >> >> header's type. >> >> >> >> 4. Reduce locked region in vpci_remove_device as it is now possible >> >> to set pdev->vpci to NULL early right after the write lock is acquired. >> >> >> >> 5. Reduce locked region in vpci_add_handlers as it is possible to >> >> initialize many more fields of the struct vpci before assigning it to >> >> pdev->vpci. >> >> >> >> 6. vpci_{add|remove}_register are required to be called with the write >> >> lock >> >> held, but it is not feasible to add an assert there as it requires >> >> struct domain to be passed for that. So, add a comment about this >> >> requirement >> >> to these and other functions with the equivalent constraints. >> >> >> >> 7. Drop const qualifier where the new rwlock is used and this is >> >> appropriate. >> >> >> >> 8. Do not call process_pending_softirqs with any locks held. For that >> >> unlock >> >> prior the call and re-acquire the locks after. After re-acquiring the >> >> lock there is no need to check if pdev->vpci exists: >> >> - in apply_map because of the context it is called (no race condition >> >> possible) >> >> - for MSI/MSI-X debug code because it is called at the end of >> >> pdev->vpci access and no further access to pdev->vpci is made >> >> >> >> 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock >> >> and if so, allow reading or writing the hardware register directly. This >> >> is >> >> acceptable as we only deal with Dom0 as of now. Once DomU support is >> >> added the write will need to be ignored and read return all 0's for the >> >> guests, while Dom0 can still access the registers directly. >> >> >> >> 10. Introduce pcidevs_trylock, so there is a possibility to try locking >> >> the pcidev's lock. >> >> >> >> 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain >> >> while accessing pdevs in vpci code. >> >> >> >> 12. This is based on the discussion at [1]. >> >> >> >> [1] >> >> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2000@xxxxxxxxx/__;!!GF_29dbcQIUBPA!zPy31CWFWlyC0xhEHiSj6rOPe7RDSjLranI9KZqhG4ssmChJMWvsPLJPQGTcVsnnowZpP8-LaKJkIWIzb8ue0DoYhg$ >> >> [lore[.]kernel[.]org] >> >> >> >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> >> >> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx> >> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> > >> > Thanks. >> > >> > I haven't looked in full detail, but I'm afraid there's an ABBA >> > deadlock with the per-domain vpci lock and the pcidevs lock in >> > modify_bars() vs vpci_add_handlers() and vpci_remove_device(). >> > >> > I've made some comments below. >> >> Thank you for the review. I believe that it is a good idea to have a >> per-domain pdev_list lock. See my answers below. > > I think it's important that the lock that protects domain->pdev_list > must be the same that also protects pdev->vpci, or else you might run > into similar ABBA deadlock situations. > > The problem then could be that in vpci_{read,write} you will take the > per-domain pdev lock in read mode in order to get the pdev, and for > writes to the command register or the ROM BAR you would have to > upgrade such lock to a write lock without dropping it, and we don't > have such functionality for rw locks ATM. > > Maybe just re-starting the function knowing that the lock must be > taken in write mode would be a good solution: writes to the command > register will already be slow since they are likely to involve > modifications to the p2m. Looks like modify_bars() is the only cause for this extended lock. I know that this was discussed earlier, but can we rework modify_bars to not iterate over all the pdevs? We can store copy of all enabled BARs in a domain structure, protected by domain->vpci_lock. Something akin to struct vpci_bar { list_head list; struct vpci *vpci; unsigned long start; unsigned long end; bool is_rom; }; >> >> @@ -213,6 +227,7 @@ static void defer_map(struct domain *d, struct >> >> pci_dev *pdev, >> >> raise_softirq(SCHEDULE_SOFTIRQ); >> >> } >> >> >> >> +/* This must hold domain's vpci_rwlock in write mode. */ >> > >> > Why it must be in write mode? >> > >> > Is this to avoid ABBA deadlocks as not taking the per-domain lock in >> > write mode would then force us to take each pdev->vpci->lock in order >> > to prevent concurrent modifications of remote devices? >> >> Yes, exactly this. This is mentioned in the commit message: >> >> 2. Writing the command register and ROM BAR register may trigger >> modify_bars to run, which in turn may access multiple pdevs while >> checking for the existing BAR's overlap. The overlapping check, if done >> under the read lock, requires vpci->lock to be acquired on both devices >> being compared, which may produce a deadlock. It is not possible to >> upgrade read lock to write lock in such a case. So, in order to prevent >> the deadlock, check which registers are going to be written and acquire >> the lock in the appropriate mode from the beginning. > > Might be good to put part of the commit message in the code comment, > just saying that the lock must be taken in write mode is not very > informative. > > /* > * The <lock-name> per domain lock must be taken in write mode in > * order to prevent changes to the vPCI data of devices assigned to > * the domain, since the function parses such data. > */ > Agree. Also, I'll add a corresponding ASSERT() -- WBR, Volodymyr
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |