[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()
17.12.2020 13:58, Kevin Wolf wrote: Am 17.12.2020 um 10:37 hat Sergio Lopez geschrieben:On Wed, Dec 16, 2020 at 07:31:02PM +0100, Kevin Wolf wrote:Am 16.12.2020 um 15:55 hat Sergio Lopez geschrieben:On Wed, Dec 16, 2020 at 01:35:14PM +0100, Kevin Wolf wrote:Am 15.12.2020 um 18:23 hat Sergio Lopez geschrieben:On Tue, Dec 15, 2020 at 04:01:19PM +0100, Kevin Wolf wrote:Am 15.12.2020 um 14:15 hat Sergio Lopez geschrieben:On Tue, Dec 15, 2020 at 01:12:33PM +0100, Kevin Wolf wrote:Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben:While processing the parents of a BDS, one of the parents may process the child that's doing the tail recursion, which leads to a BDS being processed twice. This is especially problematic for the aio_notifiers, as they might attempt to work on both the old and the new AIO contexts. To avoid this, add the BDS pointer to the ignore list, and check the child BDS pointer while iterating over the children. Signed-off-by: Sergio Lopez <slp@xxxxxxxxxx>Ugh, so we get a mixed list of BdrvChild and BlockDriverState? :-/I know, it's effective but quite ugly...What is the specific scenario where you saw this breaking? Did you have multiple BdrvChild connections between two nodes so that we would go to the parent node through one and then come back to the child node through the other?I don't think this is a corner case. If the graph is walked top->down, there's no problem since children are added to the ignore list before getting processed, and siblings don't process each other. But, if the graph is walked bottom->up, a BDS will start processing its parents without adding itself to the ignore list, so there's nothing preventing them from processing it again.I don't understand. child is added to ignore before calling the parent callback on it, so how can we come back through the same BdrvChild? QLIST_FOREACH(child, &bs->parents, next_parent) { if (g_slist_find(*ignore, child)) { continue; } assert(child->klass->set_aio_ctx); *ignore = g_slist_prepend(*ignore, child); child->klass->set_aio_ctx(child, new_context, ignore); }Perhaps I'm missing something, but the way I understand it, that loop is adding the BdrvChild pointer of each of its parents, but not the BdrvChild pointer of the BDS that was passed as an argument to b_s_a_c_i.Generally, the caller has already done that. In the theoretical case that it was the outermost call in the recursion and it hasn't (I couldn't find any such case), I think we should still call the callback for the passed BdrvChild like we currently do.You didn't dump the BdrvChild here. I think that would add some information on why we re-entered 0x555ee2fbf660. Maybe you can also add bs->drv->format_name for each node to make the scenario less abstract?I've generated another trace with more data: bs=0x565505e48030 (backup-top) enter bs=0x565505e48030 (backup-top) processing children bs=0x565505e48030 (backup-top) calling bsaci child=0x565505e42090 (child->bs=0x565505e5d420) bs=0x565505e5d420 (qcow2) enter bs=0x565505e5d420 (qcow2) processing children bs=0x565505e5d420 (qcow2) calling bsaci child=0x565505e41ea0 (child->bs=0x565505e52060) bs=0x565505e52060 (file) enter bs=0x565505e52060 (file) processing children bs=0x565505e52060 (file) processing parents bs=0x565505e52060 (file) processing itself bs=0x565505e5d420 (qcow2) processing parents bs=0x565505e5d420 (qcow2) calling set_aio_ctx child=0x5655066a34d0 bs=0x565505fbf660 (qcow2) enter bs=0x565505fbf660 (qcow2) processing children bs=0x565505fbf660 (qcow2) calling bsaci child=0x565505e41d20 (child->bs=0x565506bc0c00) bs=0x565506bc0c00 (file) enter bs=0x565506bc0c00 (file) processing children bs=0x565506bc0c00 (file) processing parents bs=0x565506bc0c00 (file) processing itself bs=0x565505fbf660 (qcow2) processing parents bs=0x565505fbf660 (qcow2) calling set_aio_ctx child=0x565505fc7aa0 bs=0x565505fbf660 (qcow2) calling set_aio_ctx child=0x5655068b8510 bs=0x565505e48030 (backup-top) enter bs=0x565505e48030 (backup-top) processing children bs=0x565505e48030 (backup-top) calling bsaci child=0x565505e3c450 (child->bs=0x565505fbf660) bs=0x565505fbf660 (qcow2) enter bs=0x565505fbf660 (qcow2) processing children bs=0x565505fbf660 (qcow2) processing parents bs=0x565505fbf660 (qcow2) processing itself bs=0x565505e48030 (backup-top) processing parents bs=0x565505e48030 (backup-top) calling set_aio_ctx child=0x565505e402d0 bs=0x565505e48030 (backup-top) processing itself bs=0x565505fbf660 (qcow2) processing itselfHm, is this complete? Is see no "processing itself" for bs=0x565505e5d420. Or is this because it crashed before getting there?Yes, it crashes there. I forgot to mention that, sorry.Anyway, trying to reconstruct the block graph with BdrvChild pointers annotated at the edges: BlockBackend | v backup-top ------------------------+ | | | | +-----------------------+ | | 0x5655068b8510 | | 0x565505e3c450 | | | | 0x565505e42090 | | v | | qcow2 ---------------------+ | | | | | | | 0x565505e52060 | | | ??? [1] | | | | | v 0x5655066a34d0 | | | | 0x565505fc7aa0 file v v v v qcow2 (backing) | | 0x565505e41d20 v file [1] This seems to be a BdrvChild with a non-BDS parent. Probably a BdrvChild directly owned by the backup job.So it seems this is happening: backup-top (5e48030) <---------| (5) | | | | | (6) ------------> qcow2 (5fbf660) | ^ | | (3) | | (4) |-> (1) qcow2 (5e5d420) ----- |-> file (6bc0c00) | |-> (2) file (5e52060) backup-top (5e48030), the BDS that was passed as argument in the first bdrv_set_aio_context_ignore() call, is re-entered when qcow2 (5fbf660) is processing its parents, and the latter is also re-entered when the first one starts processing its children again.Yes, but look at the BdrvChild pointers, it is through different edges that we come back to the same node. No BdrvChild is used twice. If backup-top had added all of its children to the ignore list before calling into the overlay qcow2, the backing qcow2 wouldn't eventually have called back into backup-top.I've tested a patch that first adds every child to the ignore list, and then processes those that weren't there before, as you suggested on a previous email. With that, the offending qcow2 is not re-entered, so we avoid the crash, but backup-top is still entered twice:I think we also need to every parent to the ignore list before calling callbacks, though it doesn't look like this is the problem you're currently seeing.I agree.bs=0x560db0e3b030 (backup-top) enter bs=0x560db0e3b030 (backup-top) processing children bs=0x560db0e3b030 (backup-top) calling bsaci child=0x560db0e2f450 (child->bs=0x560db0fb2660) bs=0x560db0fb2660 (qcow2) enter bs=0x560db0fb2660 (qcow2) processing children bs=0x560db0fb2660 (qcow2) calling bsaci child=0x560db0e34d20 (child->bs=0x560db1bb3c00) bs=0x560db1bb3c00 (file) enter bs=0x560db1bb3c00 (file) processing children bs=0x560db1bb3c00 (file) processing parents bs=0x560db1bb3c00 (file) processing itself bs=0x560db0fb2660 (qcow2) calling bsaci child=0x560db16964d0 (child->bs=0x560db0e50420) bs=0x560db0e50420 (qcow2) enter bs=0x560db0e50420 (qcow2) processing children bs=0x560db0e50420 (qcow2) calling bsaci child=0x560db0e34ea0 (child->bs=0x560db0e45060) bs=0x560db0e45060 (file) enter bs=0x560db0e45060 (file) processing children bs=0x560db0e45060 (file) processing parents bs=0x560db0e45060 (file) processing itself bs=0x560db0e50420 (qcow2) processing parents bs=0x560db0e50420 (qcow2) processing itself bs=0x560db0fb2660 (qcow2) processing parents bs=0x560db0fb2660 (qcow2) calling set_aio_ctx child=0x560db1672860 bs=0x560db0fb2660 (qcow2) calling set_aio_ctx child=0x560db1b14a20 bs=0x560db0e3b030 (backup-top) enter bs=0x560db0e3b030 (backup-top) processing children bs=0x560db0e3b030 (backup-top) processing parents bs=0x560db0e3b030 (backup-top) calling set_aio_ctx child=0x560db0e332d0 bs=0x560db0e3b030 (backup-top) processing itself bs=0x560db0fb2660 (qcow2) processing itself bs=0x560db0e3b030 (backup-top) calling bsaci child=0x560db0e35090 (child->bs=0x560db0e50420) bs=0x560db0e50420 (qcow2) enter bs=0x560db0e3b030 (backup-top) processing parents bs=0x560db0e3b030 (backup-top) processing itself I see that "blk_do_set_aio_context()" passes "blk->root" to "bdrv_child_try_set_aio_context()" so it's already in the ignore list, so I'm not sure what's happening here. Is backup-top is referenced from two different BdrvChild or is "blk->root" not pointing to backup-top's BDS?The second time that backup-top is entered, it is not as the BDS of blk->root, but as the parent node of the overlay qcow2. Which is interesting, because last time it was still the backing qcow2, so the change did have _some_ effect. The part that I don't understand is why you still get the line with child=0x560db1b14a20, because when you add all children to the ignore list first, that should have been put into the ignore list as one of the first things in the whole process (when backup-top was first entered). Is 0x560db1b14a20 a BdrvChild that has backup-top as its opaque value, but isn't actually present in backup-top's bs->children?Exactly, that line corresponds to this chunk of code: <---- begin ----> QLIST_FOREACH(child, &bs->parents, next_parent) { if (g_slist_find(*ignore, child)) { continue; } assert(child->klass->set_aio_ctx); *ignore = g_slist_prepend(*ignore, child); fprintf(stderr, "bs=%p (%s) calling set_aio_ctx child=%p\n", bs, bs->drv->format_name, child); child->klass->set_aio_ctx(child, new_context, ignore); } <---- end ----> Do you think it's safe to re-enter backup-top, or should we look for a way to avoid this?I think it should be avoided, but I don't understand why putting all children of backup-top into the ignore list doesn't already avoid it. If backup-top is in the parents list of qcow2, then qcow2 should be in the children list of backup-top and therefore the BdrvChild should already be in the ignore list. The only way I can explain this is that backup-top and qcow2 have different ideas about which BdrvChild objects exist that connect them. Or that the graph changes between both places, but I don't see how that could happen in bdrv_set_aio_context_ignore(). bdrv_set_aio_context_ignore() do bdrv_drained_begin().. As I reported recently, nothing prevents some job finish and do graph modification during some another drained section. It may be the case. If backup-top involved, I can suppose that graph modification is in backup_clean, when we remove the filter.. Who is making set_aio_context in the issue? I mean, what is the backtrace of bdrv_set_aio_context_ignore()? -- Best regards, Vladimir
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |