[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 07.02.22 18:44, Oleksandr Andrushchenko wrote: > > On 07.02.22 18:37, Jan Beulich wrote: >> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote: >>> On 07.02.22 18:15, Jan Beulich wrote: >>>> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote: >>>>> On 07.02.22 17:26, Jan Beulich wrote: >>>>>> 1b. Make vpci_write use write lock for writes to command register and >>>>>> BARs >>>>>> only; keep using the read lock for all other writes. >>>>> I am not quite sure how to do that. Do you mean something like: >>>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size, >>>>> uint32_t data) >>>>> [snip] >>>>> list_for_each_entry ( r, &pdev->vpci->handlers, node ) >>>>> { >>>>> [snip] >>>>> if ( r->needs_write_lock) >>>>> write_lock(d->vpci_lock) >>>>> else >>>>> read_lock(d->vpci_lock) >>>>> .... >>>>> >>>>> And provide rw as an argument to: >>>>> >>>>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler, >>>>> vpci_write_t *write_handler, unsigned int >>>>> offset, >>>>> unsigned int size, void *data, --->>> bool >>>>> write_path <<<-----) >>>>> >>>>> Is this what you mean? >>>> This sounds overly complicated. You can derive locally in vpci_write(), >>>> from just its "reg" and "size" parameters, whether the lock needs taking >>>> in write mode. >>> Yes, I started writing a reply with that. So, the summary (ROM >>> position depends on header type): >>> if ( (reg == PCI_COMMAND) || (reg == ROM) ) >>> { >>> read PCI_COMMAND and see if memory or IO decoding are enabled. >>> if ( enabled ) >>> write_lock(d->vpci_lock) >>> else >>> read_lock(d->vpci_lock) >>> } >> Hmm, yes, you can actually get away without using "size", since both >> command register and ROM BAR are 32-bit aligned registers, and 64-bit >> accesses get split in vpci_ecam_write(). > But, OS may want reading a single byte of ROM BAR, so I think > I'll need to check if reg+size fall into PCI_COMAND and ROM BAR > ranges >> For the command register the memory- / IO-decoding-enabled check may >> end up a little more complicated, as the value to be written also >> matters. Maybe read the command register only for the ROM BAR write, >> using the write lock uniformly for all command register writes? > Sounds good for the start. > Another concern is that if we go with a read_lock and then in the > underlying code we disable memory decoding and try doing > something and calling cmd_write handler for any reason then.... > > I mean that the check in the vpci_write is somewhat we can tolerate, > but then it is must be considered that no code in the read path > is allowed to perform write path functions. Which brings a pretty > valid use-case: say in read mode we detect an unrecoverable error > and need to remove the device: > vpci_process_pending -> ERROR -> vpci_remove_device or similar. > > What do we do then? It is all going to be fragile... I have tried to summarize the options we have wrt locking and would love to hear from @Roger and @Jan. In every variant there is a task of dealing with the overlap detection in modify_bars, so this is the only place as of now which needs special treatment. Existing limitations: there is no way to upgrade a read lock to a write lock, so paths which may require write lock protection need to use write lock from the very beginning. Workarounds can be applied. 1. Per-domain rw lock, aka d->vpci_lock ============================================================== Note: with per-domain rw lock it is possible to do without introducing per-device locks, so pdev->vpci->lock can be removed and no pdev->vpci_lock should be required. This is only going to work in case if vpci_write always takes the write lock and vpci_read takes a read lock and no path in vpci_read is allowed to perform write path operations. vpci_process_pending uses write lock as it have vpci_remove_device in its error path. Pros: - no per-device vpci lock is needed? - solves overlap code ABBA in modify_bars Cons: - all writes are serialized - need to carefully select read paths, so they are guaranteed not to lead to lock upgrade use-cases 1.1. Semi read lock upgrade in modify bars -------------------------------------------------------------- In this case both vpci_read and vpci_write take a read lock and when it comes to modify_bars: 1. read_unlock(d->vpci_lock) 2. write_lock(d->vpci_lock) 3. Check that pdev->vpci is still available and is the same object: if (pdev->vpci && (pdev->vpci == old_vpci) ) { /* vpci structure is valid and can be used. */ } else { /* vpci has gone, return an error. */ } Pros: - no per-device vpci lock is needed? - solves overlap code ABBA in modify_bars - readers and writers are NOT serialized - NO need to carefully select read paths, so they are guaranteed not to lead to lock upgrade use-cases Cons: - ??? 2. per-device lock (pdev->vpci_lock) + d->overlap_chk_lock ============================================================== In order to solve overlap ABBA, we introduce a per-domain helper lock to protect the overlapping code in modify_bars: old_vpci = pdev->vpci; spin_unlock(pdev->vpci_lock); spin_lock(pdev->domain->overlap_chk_lock); spin_lock(pdev->vpci_lock); if ( pdev->vpci && (pdev->vpci == old_vpci) ) for_each_pdev ( pdev->domain, tmp ) { if ( tmp != pdev ) { spin_lock(tmp->vpci_lock); if ( tmp->vpci ) ... } } Pros: - all accesses are independent, only the same device access is serialized - no need to care about readers and writers wrt read lock upgrade issues Cons: - helper spin lock 3. Move overlap detection into process pending ============================================================== There is a Roger's patch [1] which adds a possibility for vpci_process_pending to perform different tasks rather than just map/unmap. With this patch extended in a way that it can hold a request queue it is possible to delay execution of the overlap code until no pdev->vpci_lock is held, but before returning to a guest after vpci_{read|write} or similar. Pros: - no need to emulate read lock upgrade - fully parallel read/write - queue in the vpci_process_pending will later on be used by SR-IOV, so this is going to help the future code Cons: - ??? 4. Re-write overlap detection code ============================================================== It is possible to re-write overlap detection code, so the information about the mapped/unmapped regions is not read from vpci->header->bars[i] of each device, but instead there is a per-domain structure which holds the regions and implements reference counting. Pros: - solves ABBA Cons: - very complex code is expected 5. You name it ============================================================== From all the above I would recommend we go with option 2 which seems to reliably solve ABBA and does not bring cons of the other approaches. Thank you in advance, Oleksandr [1] https://lore.kernel.org/all/5BABA6EF02000078001EC452@xxxxxxxxxxxxxxxxxxxxxxxx/T/#m231fb0586007725bfd8538bb97ff1777a36842cf
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |