[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] race condition in xen-gntdev
On 06/22/2015 02:37 PM, Konrad Rzeszutek Wilk wrote: On Mon, Jun 22, 2015 at 08:13:35PM +0200, Marek Marczykowski-Górecki wrote:On Mon, Jun 22, 2015 at 01:46:27PM -0400, Konrad Rzeszutek Wilk wrote:On Wed, Jun 17, 2015 at 09:42:11PM +0200, Marek Marczykowski-Górecki wrote:On Thu, May 28, 2015 at 01:45:08AM +0200, Marek Marczykowski-Górecki wrote:On Thu, Apr 30, 2015 at 04:47:44PM +0200, Marek Marczykowski-Górecki wrote:Hi, What is the proper way to handle shared pages (either side - using gntdev or gntalloc) regarding fork and possible exec later? The child process do not need to access those pages in any way, but will map different one(s), using newly opened FD to the gntdev/gntalloc device. Should it unmap them and close FD to the device manually just after the fork? Or the process using gntdev or gntalloc should prevent using fork at all? I'm asking because I get kernel oops[1] in context of such process. This process uses both gntdev and gntalloc. The PID reported there is a child, which maps additional pages (using newly opened FD to /dev/xen/gnt*), but I'm not sure if the crash happens before, after or at this second mapping (actually vchan connection), or maybe even at cleanup of this second mapping. The parent process keeps its mappings for the whole lifetime of its child. I don't have a 100% reliable way to reproduce this problem, but it happens quite often when I run such operations in a loop.Any ideas?I've done some further debugging, below is what I get.Woot! Thank you!The kernel is vanilla 3.19.3, running on Xen 4.4.2. The kernel message: [74376.073464] general protection fault: 0000 [#1] SMP [74376.073475] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront [74376.073522] CPU: 1 PID: 9377 Comm: qrexec-agent Tainted: G O 3.19.3-4.pvops.qubes.x86_64 #1 [74376.073528] task: ffff880002442e40 ti: ffff88000032c000 task.ti: ffff88000032c000 [74376.073532] RIP: e030:[<ffffffffa00952c5>] [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]static void unmap_if_in_range(struct grant_map *map, unsigned long start, unsigned long end) { unsigned long mstart, mend; int err; if (!map->vma) return; The above crash is at first access to "map"...[74376.073543] RSP: e02b:ffff88000032fc08 EFLAGS: 00010292 [74376.073546] RAX: 0000000000000000 RBX: dead000000100100 RCX: 00007fd8616ea000 [74376.073550] RDX: 00007fd8616ea000 RSI: 00007fd8616e9000 RDI: dead000000100100... which is 0xdead000000100100 (LIST_POISON1).[74376.073554] RBP: ffff88000032fc48 R08: 0000000000000000 R09: 0000000000000000 [74376.073557] R10: ffffea000021bb00 R11: 0000000000000000 R12: 00007fd8616e9000 [74376.073561] R13: 00007fd8616ea000 R14: ffff880012702e40 R15: ffff880012702e70 [74376.073569] FS: 00007fd8616ca700(0000) GS:ffff880013c80000(0000) knlGS:0000000000000000 [74376.073574] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [74376.073577] CR2: 00007fd8616e9458 CR3: 00000000e7af5000 CR4: 0000000000042660 [74376.073582] Stack: [74376.073584] ffff8800188356c0 00000000000000d0 ffff88000032fc68 00000000c64ef797 [74376.073590] 0000000000000220 dead000000100100 00007fd8616e9000 00007fd8616ea000 [74376.073596] ffff88000032fc88 ffffffffa00953c6 ffff88000032fcc8 ffff880012702e70 [74376.073603] Call Trace: [74376.073610] [<ffffffffa00953c6>] mn_invl_range_start+0x46/0x90 [xen_gntdev] [74376.073620] [<ffffffff811e88fb>] __mmu_notifier_invalidate_range_start+0x5b/0x90 [74376.073627] [<ffffffff811c2a59>] do_wp_page+0x769/0x820 [74376.074031] [<ffffffff811c4f5c>] handle_mm_fault+0x7fc/0x10c0 [74376.074031] [<ffffffff813864cd>] ? radix_tree_lookup+0xd/0x10 [74376.074031] [<ffffffff81061e1c>] __do_page_fault+0x1dc/0x5a0 [74376.074031] [<ffffffff817560a6>] ? mutex_lock+0x16/0x37 [74376.074031] [<ffffffffa0008928>] ? evtchn_ioctl+0x118/0x3c0 [xen_evtchn] [74376.074031] [<ffffffff812209d8>] ? do_vfs_ioctl+0x2f8/0x4f0 [74376.074031] [<ffffffff811cafdf>] ? do_munmap+0x29f/0x3b0 [74376.074031] [<ffffffff81062211>] do_page_fault+0x31/0x70 [74376.074031] [<ffffffff81759e28>] page_fault+0x28/0x30static void mn_invl_range_start(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, unsigned long end) { struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); struct grant_map *map; spin_lock(&priv->lock); list_for_each_entry(map, &priv->maps, next) { unmap_if_in_range(map, start, end); } mn_invl_range_start+0x46 is the first call to unmap_if_in_range, so something is wrong with priv->maps list. So I've searched for all the list_del calls on this list and found one not guarded by spinlock: static int gntdev_release(struct inode *inode, struct file *flip) { struct gntdev_priv *priv = flip->private_data; struct grant_map *map; pr_debug("priv %p\n", priv); while (!list_empty(&priv->maps)) { map = list_entry(priv->maps.next, struct grant_map, next); list_del(&map->next); gntdev_put_map(NULL /* already removed */, map); } WARN_ON(!list_empty(&priv->freeable_maps)); if (use_ptemod) mmu_notifier_unregister(&priv->mn, priv->mm); kfree(priv); return 0; } So I see this as: P1(parent) P2(child) 1. gntdev_release called 2. list destroyed (above loop) 3. page fault occurs, gntdev mmu notifier called 4. priv->lock taken 5. iterate over priv->mapsCould we check the map->users?I don't have an easy way to reproduce the problem. It happens randomly once a day/two in some random domain. I've tried to fiddle with gnttab+fork+page faults, but failed to reliably reproduce the problem.6. crashed since map is already destroyedOh, indeed. gntdev_release calls gntdev_put_map which frees the whole thing! Ouch.7. mmu_notifier_unregister calls: 8. mn_release, which tries to take priv->lock 9. this process hangs So I'd guess the fix would be to move mmu_notifier_unregister before releasing priv->maps list. Could someone more familiar with this code confirm this?Hey Marek, Could we just add an lock around the 'gntdev_release' loop?I don't fully understand what is going there, especially when gntdev_release is called (in case of forked process, still some memory mapped etc), and when mmu notifier is called. But just looking at this crash IMO adding a lock should be sufficient. What will happen when mmu notifier would not find the mapping? I guess nothing, because the mapping is just removed for a reason, right?That was my thought. But then why didn't we add this lock in the first place? Hopefully Daniel can remember ~2yr patch :-) The reason that gntdev_release didn't have a lock is because there are not supposed to be any references to the areas pointed to by priv->maps when it is called. However, since the MMU notifier has not yet been unregistered, it is apparently possible to race here; the comment on mmu_notifier_unregister seems to confirm this as a possibility (as do the backtraces). I think adding the lock will be sufficient. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |