summaryrefslogtreecommitdiff
path: root/drivers/infiniband/core
AgeCommit message (Collapse)AuthorFilesLines
2022-03-02RDMA/cma: Do not change route.addr.src_addr outside state checksJason Gunthorpe1-16/+24
commit 22e9f71072fa605cbf033158db58e0790101928d upstream. If the state is not idle then resolve_prepare_src() should immediately fail and no change to global state should happen. However, it unconditionally overwrites the src_addr trying to build a temporary any address. For instance if the state is already RDMA_CM_LISTEN then this will corrupt the src_addr and would cause the test in cma_cancel_operation(): if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev) Which would manifest as this trace from syzkaller: BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26 Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204 CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 __list_add_valid+0x93/0xa0 lib/list_debug.c:26 __list_add include/linux/list.h:67 [inline] list_add_tail include/linux/list.h:100 [inline] cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline] rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751 ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102 ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732 vfs_write+0x28e/0xa30 fs/read_write.c:603 ksys_write+0x1ee/0x250 fs/read_write.c:658 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae This is indicating that an rdma_id_private was destroyed without doing cma_cancel_listens(). Instead of trying to re-use the src_addr memory to indirectly create an any address derived from the dst build one explicitly on the stack and bind to that as any other normal flow would do. rdma_bind_addr() will copy it over the src_addr once it knows the state is valid. This is similar to commit bc0bdc5afaa7 ("RDMA/cma: Do not change route.addr.src_addr.ss_family") Link: https://lore.kernel.org/r/0-v2-e975c8fd9ef2+11e-syz_cma_srcaddr_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: 732d41c545bb ("RDMA/cma: Make the locking for automatic state transition more clear") Reported-by: syzbot+c94a3675a626f6333d74@syzkaller.appspotmail.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08RDMA/ucma: Protect mc during concurrent multicast leavesLeon Romanovsky1-11/+23
commit 36e8169ec973359f671f9ec7213547059cae972e upstream. Partially revert the commit mentioned in the Fixes line to make sure that allocation and erasing multicast struct are locked. BUG: KASAN: use-after-free in ucma_cleanup_multicast drivers/infiniband/core/ucma.c:491 [inline] BUG: KASAN: use-after-free in ucma_destroy_private_ctx+0x914/0xb70 drivers/infiniband/core/ucma.c:579 Read of size 8 at addr ffff88801bb74b00 by task syz-executor.1/25529 CPU: 0 PID: 25529 Comm: syz-executor.1 Not tainted 5.16.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247 __kasan_report mm/kasan/report.c:433 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:450 ucma_cleanup_multicast drivers/infiniband/core/ucma.c:491 [inline] ucma_destroy_private_ctx+0x914/0xb70 drivers/infiniband/core/ucma.c:579 ucma_destroy_id+0x1e6/0x280 drivers/infiniband/core/ucma.c:614 ucma_write+0x25c/0x350 drivers/infiniband/core/ucma.c:1732 vfs_write+0x28e/0xae0 fs/read_write.c:588 ksys_write+0x1ee/0x250 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Currently the xarray search can touch a concurrently freeing mc as the xa_for_each() is not surrounded by any lock. Rather than hold the lock for a full scan hold it only for the effected items, which is usually an empty list. Fixes: 95fe51096b7a ("RDMA/ucma: Remove mc_list and rely on xarray") Link: https://lore.kernel.org/r/1cda5fabb1081e8d16e39a48d3a4f8160cea88b8.1642491047.git.leonro@nvidia.com Reported-by: syzbot+e3f96c43d19782dd14a7@syzkaller.appspotmail.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08RDMA/cma: Use correct address when leaving multicast groupMaor Gottlieb1-10/+12
commit d9e410ebbed9d091b97bdf45b8a3792e2878dc48 upstream. In RoCE we should use cma_iboe_set_mgid() and not cma_set_mgid to generate the mgid, otherwise we will generate an IGMP for an incorrect address. Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join") Link: https://lore.kernel.org/r/913bc6783fd7a95fe71ad9454e01653ee6fb4a9a.1642491047.git.leonro@nvidia.com Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27RDMA/cma: Let cma_resolve_ib_dev() continue search even after empty entryAvihai Horon1-3/+9
[ Upstream commit 20679094a0161c94faf77e373fa3f7428a8e14bd ] Currently, when cma_resolve_ib_dev() searches for a matching GID it will stop searching after encountering the first empty GID table entry. This behavior is wrong since neither IB nor RoCE spec enforce tightly packed GID tables. For example, when the matching valid GID entry exists at index N, and if a GID entry is empty at index N-1, cma_resolve_ib_dev() will fail to find the matching valid entry. Fix it by making cma_resolve_ib_dev() continue searching even after encountering missing entries. Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()") Link: https://lore.kernel.org/r/b7346307e3bb396c43d67d924348c6c496493991.1639055490.git.leonro@nvidia.com Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27RDMA/core: Let ib_find_gid() continue search even after empty entryAvihai Horon1-1/+2
[ Upstream commit 483d805191a23191f8294bbf9b4e94836f5d92e4 ] Currently, ib_find_gid() will stop searching after encountering the first empty GID table entry. This behavior is wrong since neither IB nor RoCE spec enforce tightly packed GID tables. For example, when a valid GID entry exists at index N, and if a GID entry is empty at index N-1, ib_find_gid() will fail to find the valid entry. Fix it by making ib_find_gid() continue searching even after encountering missing entries. Fixes: 5eb620c81ce3 ("IB/core: Add helpers for uncached GID and P_Key searches") Link: https://lore.kernel.org/r/e55d331b96cecfc2cf19803d16e7109ea966882d.1639055490.git.leonro@nvidia.com Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-11RDMA/uverbs: Check for null return of kmalloc_arrayJiasheng Jiang1-0/+3
commit 7694a7de22c53a312ea98960fcafc6ec62046531 upstream. Because of the possible failure of the allocation, data might be NULL pointer and will cause the dereference of the NULL pointer later. Therefore, it might be better to check it and return -ENOMEM. Fixes: 6884c6c4bd09 ("RDMA/verbs: Store the write/write_ex uapi entry points in the uverbs_api") Link: https://lore.kernel.org/r/20211231093315.1917667-1-jiasheng@iscas.ac.cn Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-11RDMA/core: Don't infoleak GRH fieldsLeon Romanovsky1-1/+1
commit b35a0f4dd544eaa6162b6d2f13a2557a121ae5fd upstream. If dst->is_global field is not set, the GRH fields are not cleared and the following infoleak is reported. ===================================================== BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 instrument_copy_to_user include/linux/instrumented.h:121 [inline] _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 copy_to_user include/linux/uaccess.h:209 [inline] ucma_init_qp_attr+0x8c7/0xb10 drivers/infiniband/core/ucma.c:1242 ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732 vfs_write+0x8ce/0x2030 fs/read_write.c:588 ksys_write+0x28b/0x510 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __ia32_sys_write+0xdb/0x120 fs/read_write.c:652 do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline] __do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180 do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205 do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c Local variable resp created at: ucma_init_qp_attr+0xa4/0xb10 drivers/infiniband/core/ucma.c:1214 ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732 Bytes 40-59 of 144 are uninitialized Memory access of size 144 starts at ffff888167523b00 Data copied to user address 0000000020000100 CPU: 1 PID: 25910 Comm: syz-executor.1 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ===================================================== Fixes: 4ba66093bdc6 ("IB/core: Check for global flag when using ah_attr") Link: https://lore.kernel.org/r/0e9dd51f93410b7b2f4f5562f52befc878b71afa.1641298868.git.leonro@nvidia.com Reported-by: syzbot+6d532fa8f9463da290bc@syzkaller.appspotmail.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-02RDMA/sa_query: Use strscpy_pad instead of memcpy to copy a stringMark Zhang1-2/+3
commit 64733956ebba7cc629856f4a6ee35a52bc9c023f upstream. When copying the device name, the length of the data memcpy copied exceeds the length of the source buffer, which cause the KASAN issue below. Use strscpy_pad() instead. BUG: KASAN: slab-out-of-bounds in ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core] Read of size 64 at addr ffff88811a10f5e0 by task rping/140263 CPU: 3 PID: 140263 Comm: rping Not tainted 5.15.0-rc1+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x7d print_address_description.constprop.0+0x1d/0xa0 kasan_report+0xcb/0x110 kasan_check_range+0x13d/0x180 memcpy+0x20/0x60 ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core] ib_nl_make_request+0x1c6/0x380 [ib_core] send_mad+0x20a/0x220 [ib_core] ib_sa_path_rec_get+0x3e3/0x800 [ib_core] cma_query_ib_route+0x29b/0x390 [rdma_cm] rdma_resolve_route+0x308/0x3e0 [rdma_cm] ucma_resolve_route+0xe1/0x150 [rdma_ucm] ucma_write+0x17b/0x1f0 [rdma_ucm] vfs_write+0x142/0x4d0 ksys_write+0x133/0x160 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f26499aa90f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48 RSP: 002b:00007f26495f2dc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000000007d0 RCX: 00007f26499aa90f RDX: 0000000000000010 RSI: 00007f26495f2e00 RDI: 0000000000000003 RBP: 00005632a8315440 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000293 R12: 00007f26495f2e00 R13: 00005632a83154e0 R14: 00005632a8315440 R15: 00005632a830a810 Allocated by task 131419: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 proc_self_get_link+0x8b/0x100 pick_link+0x4f1/0x5c0 step_into+0x2eb/0x3d0 walk_component+0xc8/0x2c0 link_path_walk+0x3b8/0x580 path_openat+0x101/0x230 do_filp_open+0x12e/0x240 do_sys_openat2+0x115/0x280 __x64_sys_openat+0xce/0x140 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink") Link: https://lore.kernel.org/r/72ede0f6dab61f7f23df9ac7a70666e07ef314b0.1635055496.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06RDMA/cma: Fix listener leak in rdma_cma_listen_on_all() failureTao Liu1-3/+10
[ Upstream commit ca465e1f1f9b38fe916a36f7d80c5d25f2337c81 ] If cma_listen_on_all() fails it leaves the per-device ID still on the listen_list but the state is not set to RDMA_CM_ADDR_BOUND. When the cmid is eventually destroyed cma_cancel_listens() is not called due to the wrong state, however the per-device IDs are still holding the refcount preventing the ID from being destroyed, thus deadlocking: task:rping state:D stack: 0 pid:19605 ppid: 47036 flags:0x00000084 Call Trace: __schedule+0x29a/0x780 ? free_unref_page_commit+0x9b/0x110 schedule+0x3c/0xa0 schedule_timeout+0x215/0x2b0 ? __flush_work+0x19e/0x1e0 wait_for_completion+0x8d/0xf0 _destroy_id+0x144/0x210 [rdma_cm] ucma_close_id+0x2b/0x40 [rdma_ucm] __destroy_id+0x93/0x2c0 [rdma_ucm] ? __xa_erase+0x4a/0xa0 ucma_destroy_id+0x9a/0x120 [rdma_ucm] ucma_write+0xb8/0x130 [rdma_ucm] vfs_write+0xb4/0x250 ksys_write+0xb5/0xd0 ? syscall_trace_enter.isra.19+0x123/0x190 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Ensure that cma_listen_on_all() atomically unwinds its action under the lock during error. Fixes: c80a0c52d85c ("RDMA/cma: Add missing error handling of listen_id") Link: https://lore.kernel.org/r/20210913093344.17230-1-thomas.liu@ucloud.cn Signed-off-by: Tao Liu <thomas.liu@ucloud.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06IB/cma: Do not send IGMP leaves for sendonly Multicast groupsChristoph Lameter1-1/+6
[ Upstream commit 2cc74e1ee31d00393b6698ec80b322fd26523da4 ] ROCE uses IGMP for Multicast instead of the native Infiniband system where joins are required in order to post messages on the Multicast group. On Ethernet one can send Multicast messages to arbitrary addresses without the need to subscribe to a group. So ROCE correctly does not send IGMP joins during rdma_join_multicast(). F.e. in cma_iboe_join_multicast() we see: if (addr->sa_family == AF_INET) { if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) { ib.rec.hop_limit = IPV6_DEFAULT_HOPLIMIT; if (!send_only) { err = cma_igmp_send(ndev, &ib.rec.mgid, true); } } } else { So the IGMP join is suppressed as it is unnecessary. However no such check is done in destroy_mc(). And therefore leaving a sendonly multicast group will send an IGMP leave. This means that the following scenario can lead to a multicast receiver unexpectedly being unsubscribed from a MC group: 1. Sender thread does a sendonly join on MC group X. No IGMP join is sent. 2. Receiver thread does a regular join on the same MC Group x. IGMP join is sent and the receiver begins to get messages. 3. Sender thread terminates and destroys MC group X. IGMP leave is sent and the receiver no longer receives data. This patch adds the same logic for sendonly joins to destroy_mc() that is also used in cma_iboe_join_multicast(). Fixes: ab15c95a17b3 ("IB/core: Support for CMA multicast join flags") Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2109081340540.668072@gentwo.de Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06RDMA/cma: Do not change route.addr.src_addr.ss_familyJason Gunthorpe1-2/+6
commit bc0bdc5afaa740d782fbf936aaeebd65e5c2921d upstream. If the state is not idle then rdma_bind_addr() will immediately fail and no change to global state should happen. For instance if the state is already RDMA_CM_LISTEN then this will corrupt the src_addr and would cause the test in cma_cancel_operation(): if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev) To view a mangled src_addr, eg with a IPv6 loopback address but an IPv4 family, failing the test. This would manifest as this trace from syzkaller: BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26 Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204 CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 __list_add_valid+0x93/0xa0 lib/list_debug.c:26 __list_add include/linux/list.h:67 [inline] list_add_tail include/linux/list.h:100 [inline] cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline] rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751 ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102 ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732 vfs_write+0x28e/0xa30 fs/read_write.c:603 ksys_write+0x1ee/0x250 fs/read_write.c:658 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Which is indicating that an rdma_id_private was destroyed without doing cma_cancel_listens(). Instead of trying to re-use the src_addr memory to indirectly create an any address build one explicitly on the stack and bind to that as any other normal flow would do. Link: https://lore.kernel.org/r/0-v1-9fbb33f5e201+2a-cma_listen_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: 732d41c545bb ("RDMA/cma: Make the locking for automatic state transition more clear") Reported-by: syzbot+6bb0528b13611047209c@syzkaller.appspotmail.com Tested-by: Hao Sun <sunhao.th@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-18RDMA/iwcm: Release resources if iw_cm module initialization failsLeon Romanovsky1-7/+12
[ Upstream commit e677b72a0647249370f2635862bf0241c86f66ad ] The failure during iw_cm module initialization partially left the system with unreleased memory and other resources. Rewrite the module init/exit routines in such way that netlink commands will be opened only after successful initialization. Fixes: b493d91d333e ("iwcm: common code for port mapper") Link: https://lore.kernel.org/r/b01239f99cb1a3e6d2b0694c242d89e6410bcd93.1627048781.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-19RDMA/cma: Fix rdma_resolve_route() memory leakGerd Rausch1-1/+2
[ Upstream commit 74f160ead74bfe5f2b38afb4fcf86189f9ff40c9 ] Fix a memory leak when "mda_resolve_route() is called more than once on the same "rdma_cm_id". This is possible if cma_query_handler() triggers the RDMA_CM_EVENT_ROUTE_ERROR flow which puts the state machine back and allows rdma_resolve_route() to be called again. Link: https://lore.kernel.org/r/f6662b7b-bdb7-2706-1e12-47c61d3474b6@oracle.com Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14RDMA/core: Always release restrack objectLeon Romanovsky1-1/+1
[ Upstream commit 3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc ] Change location of rdma_restrack_del() to fix the bug where task_struct was acquired but not released, causing to resource leak. ucma_create_id() { ucma_alloc_ctx(); rdma_create_user_id() { rdma_restrack_new(); rdma_restrack_set_name() { rdma_restrack_attach_task.part.0(); <--- task_struct was gotten } } ucma_destroy_private_ctx() { ucma_put_ctx(); rdma_destroy_id() { _destroy_id() <--- id_priv was freed } } } Fixes: 889d916b6f8a ("RDMA/core: Don't access cm_id after its destruction") Link: https://lore.kernel.org/r/073ec27acb943ca8b6961663c47c5abe78a5c8cc.1624948948.git.leonro@nvidia.com Reported-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14RDMA/cma: Fix incorrect Packet Lifetime calculationHåkon Bugge1-2/+4
[ Upstream commit e84045eab69c625bc0b0bf24d8e05bc65da1eed1 ] An approximation for the PacketLifeTime is half the local ACK timeout. The encoding for both timers are logarithmic. If the local ACK timeout is set, but zero, it means the timer is disabled. In this case, we choose the CMA_IBOE_PACKET_LIFETIME value, since 50% of infinite makes no sense. Before this commit, the PacketLifeTime became 255 if local ACK timeout was zero (not running). Fixed by explicitly testing for timeout being zero. Fixes: e1ee1e62bec4 ("RDMA/cma: Use ACK timeout for RoCE packetLifeTime") Link: https://lore.kernel.org/r/1624371207-26710-1-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14RDMA/cma: Protect RMW with qp_mutexHåkon Bugge1-1/+17
[ Upstream commit ca0c448d2b9f43e3175835d536853854ef544e22 ] The struct rdma_id_private contains three bit-fields, tos_set, timeout_set, and min_rnr_timer_set. These are set by accessor functions without any synchronization. If two or all accessor functions are invoked in close proximity in time, there will be Read-Modify-Write from several contexts to the same variable, and the result will be intermittent. Fixed by protecting the bit-fields by the qp_mutex in the accessor functions. The consumer of timeout_set and min_rnr_timer_set is in rdma_init_qp_attr(), which is called with qp_mutex held for connected QPs. Explicit locking is added for the consumers of tos and tos_set. This commit depends on ("RDMA/cma: Remove unnecessary INIT->INIT transition"), since the call to rdma_init_qp_attr() from cma_init_conn_qp() does not hold the qp_mutex. Fixes: 2c1619edef61 ("IB/cma: Define option to set ack timeout and pack tos_set") Fixes: 3aeffc46afde ("IB/cma: Introduce rdma_set_min_rnr_timer()") Link: https://lore.kernel.org/r/1624369197-24578-3-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14RDMA/core: Sanitize WQ state received from the userspaceLeon Romanovsky1-2/+19
[ Upstream commit f97442887275d11c88c2899e720fe945c1f61488 ] The mlx4 and mlx5 implemented differently the WQ input checks. Instead of duplicating mlx4 logic in the mlx5, let's prepare the input in the central place. The mlx5 implementation didn't check for validity of state input. It is not real bug because our FW checked that, but still worth to fix. Fixes: f213c0527210 ("IB/uverbs: Add WQ support") Link: https://lore.kernel.org/r/ac41ad6a81b095b1a8ad453dcf62cf8d3c5da779.1621413310.git.leonro@nvidia.com Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26RDMA/uverbs: Fix a NULL vs IS_ERR() bugDan Carpenter1-2/+2
[ Upstream commit 463a3f66473b58d71428a1c3ce69ea52c05440e5 ] The uapi_get_object() function returns error pointers, it never returns NULL. Fixes: 149d3845f4a5 ("RDMA/uverbs: Add a method to introspect handles in a context") Link: https://lore.kernel.org/r/YJ6Got+U7lz+3n9a@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26RDMA/core: Don't access cm_id after its destructionShay Drory1-2/+3
[ Upstream commit 889d916b6f8a48b8c9489fffcad3b78eedd01a51 ] restrack should only be attached to a cm_id while the ID has a valid device pointer. It is set up when the device is first loaded, but not cleared when the device is removed. There is also two copies of the device pointer, one private and one in the public API, and these were left out of sync. Make everything go to NULL together and manipulate restrack right around the device assignments. Found by syzcaller: BUG: KASAN: wild-memory-access in __list_del include/linux/list.h:112 [inline] BUG: KASAN: wild-memory-access in __list_del_entry include/linux/list.h:135 [inline] BUG: KASAN: wild-memory-access in list_del include/linux/list.h:146 [inline] BUG: KASAN: wild-memory-access in cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline] BUG: KASAN: wild-memory-access in cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline] BUG: KASAN: wild-memory-access in cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783 Write of size 8 at addr dead000000000108 by task syz-executor716/334 CPU: 0 PID: 334 Comm: syz-executor716 Not tainted 5.11.0+ #271 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0xbe/0xf9 lib/dump_stack.c:120 __kasan_report mm/kasan/report.c:400 [inline] kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413 __list_del include/linux/list.h:112 [inline] __list_del_entry include/linux/list.h:135 [inline] list_del include/linux/list.h:146 [inline] cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline] cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline] cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783 _destroy_id+0x29/0x460 drivers/infiniband/core/cma.c:1862 ucma_close_id+0x36/0x50 drivers/infiniband/core/ucma.c:185 ucma_destroy_private_ctx+0x58d/0x5b0 drivers/infiniband/core/ucma.c:576 ucma_close+0x91/0xd0 drivers/infiniband/core/ucma.c:1797 __fput+0x169/0x540 fs/file_table.c:280 task_work_run+0xb7/0x100 kernel/task_work.c:140 exit_task_work include/linux/task_work.h:30 [inline] do_exit+0x7da/0x17f0 kernel/exit.c:825 do_group_exit+0x9e/0x190 kernel/exit.c:922 __do_sys_exit_group kernel/exit.c:933 [inline] __se_sys_exit_group kernel/exit.c:931 [inline] __x64_sys_exit_group+0x2d/0x30 kernel/exit.c:931 do_syscall_64+0x2d/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 255d0c14b375 ("RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count") Link: https://lore.kernel.org/r/3352ee288fe34f2b44220457a29bfc0548686363.1620711734.git.leonro@nvidia.com Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26RDMA/core: Prevent divide-by-zero error triggered by the userLeon Romanovsky1-0/+3
[ Upstream commit 54d87913f147a983589923c7f651f97de9af5be1 ] The user_entry_size is supplied by the user and later used as a denominator to calculate number of entries. The zero supplied by the user will trigger the following divide-by-zero error: divide error: 0000 [#1] SMP KASAN PTI CPU: 4 PID: 497 Comm: c_repro Not tainted 5.13.0-rc1+ #281 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:ib_uverbs_handler_UVERBS_METHOD_QUERY_GID_TABLE+0x1b1/0x510 Code: 87 59 03 00 00 e8 9f ab 1e ff 48 8d bd a8 00 00 00 e8 d3 70 41 ff 44 0f b7 b5 a8 00 00 00 e8 86 ab 1e ff 31 d2 4c 89 f0 31 ff <49> f7 f5 48 89 d6 48 89 54 24 10 48 89 04 24 e8 1b ad 1e ff 48 8b RSP: 0018:ffff88810416f828 EFLAGS: 00010246 RAX: 0000000000000008 RBX: 1ffff1102082df09 RCX: ffffffff82183f3d RDX: 0000000000000000 RSI: ffff888105f2da00 RDI: 0000000000000000 RBP: ffff88810416fa98 R08: 0000000000000001 R09: ffffed102082df5f R10: ffff88810416faf7 R11: ffffed102082df5e R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000008 R15: ffff88810416faf0 FS: 00007f5715efa740(0000) GS:ffff88811a700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000840 CR3: 000000010c2e0001 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? ib_uverbs_handler_UVERBS_METHOD_INFO_HANDLES+0x4b0/0x4b0 ib_uverbs_cmd_verbs+0x1546/0x1940 ib_uverbs_ioctl+0x186/0x240 __x64_sys_ioctl+0x38a/0x1220 do_syscall_64+0x3f/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 9f85cbe50aa0 ("RDMA/uverbs: Expose the new GID query API to user space") Link: https://lore.kernel.org/r/b971cc70a8b240a8b5eda33c99fa0558a0071be2.1620657876.git.leonro@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14RDMA/core: Add CM to restrack after successful attachment to a deviceShay Drory1-2/+10
[ Upstream commit cb5cd0ea4eb3ce338a593a5331ddb4986ae20faa ] The device attach triggers addition of CM_ID to the restrack DB. However, when error occurs, we releasing this device, but defer CM_ID release. This causes to the situation where restrack sees CM_ID that is not valid anymore. As a solution, add the CM_ID to the resource tracking DB only after the attachment is finished. Found by syzcaller: infiniband syz0: added syz_tun rdma_rxe: ignoring netdev event = 10 for syz_tun infiniband syz0: set down infiniband syz0: ib_query_port failed (-19) restrack: ------------[ cut here ]------------ infiniband syz0: BUG: RESTRACK detected leak of resources restrack: User CM_ID object allocated by syz-executor716 is not freed restrack: ------------[ cut here ]------------ Fixes: b09c4d701220 ("RDMA/restrack: Improve readability in task name management") Link: https://lore.kernel.org/r/ab93e56ba831eac65c322b3256796fa1589ec0bb.1618753862.git.leonro@nvidia.com Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14RDMA/core: Fix corrupted SL on passive sideHåkon Bugge1-1/+2
[ Upstream commit 194f64a3cad3ab9e381e996a13089de3215d1887 ] On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary Subnet Local is zero. In cm_req_handler(), the cm_process_routed_req() function is called. Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed: IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); This corrupts SL in req_msg if it was different from zero. In other words, a request to setup a connection using an SL != zero, will not be honored, and a connection using SL zero will be created instead. Fixed by not calling cm_process_routed_req() on RoCE systems, the cm_process_route_req() is only for IB anyhow. Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths") Link: https://lore.kernel.org/r/1616420132-31005-1-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14RDMA/addr: Be strict with gid sizeLeon Romanovsky1-1/+3
[ Upstream commit d1c803a9ccd7bd3aff5e989ccfb39ed3b799b975 ] The nla_len() is less than or equal to 16. If it's less than 16 then end of the "gid" buffer is uninitialized. Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload") Link: https://lore.kernel.org/r/20210405074434.264221-1-leon@kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_sizeChristoph Hellwig1-4/+4
commit b116c702791a9834e6485f67ca6267d9fdf59b87 upstream. RDMA ULPs must not call DMA mapping APIs directly but instead use the ib_dma_* wrappers. Fixes: 0c16d9635e3a ("RDMA/umem: Move to allocate SG table from pages") Link: https://lore.kernel.org/r/20201106181941.1878556-3-hch@lst.de Reported-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Cc: "Marciniszyn, Mike" <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-09RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_repSaeed Mahameed1-2/+3
[ Upstream commit 221384df6123747d2a75517dd06cc01752f81518 ] ib_send_cm_sidr_rep() { spin_lock_irqsave() cm_send_sidr_rep_locked() { ... spin_lock_irq() .... spin_unlock_irq() <--- this will enable interrupts } spin_unlock_irqrestore() } spin_unlock_irqrestore() expects interrupts to be disabled but the internal spin_unlock_irq() will always enable hard interrupts. Fix this by replacing the internal spin_{lock,unlock}_irq() with irqsave/restore variants. It fixes the following kernel trace: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 2 PID: 20001 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20 Call Trace: _raw_spin_unlock_irqrestore+0x4e/0x50 ib_send_cm_sidr_rep+0x3a/0x50 [ib_cm] cma_send_sidr_rep+0xa1/0x160 [rdma_cm] rdma_accept+0x25e/0x350 [rdma_cm] ucma_accept+0x132/0x1cc [rdma_ucm] ucma_write+0xbf/0x140 [rdma_ucm] vfs_write+0xc1/0x340 ksys_write+0xb3/0xe0 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 87c4c774cbef ("RDMA/cm: Protect access to remote_sidr_table") Link: https://lore.kernel.org/r/20210301081844.445823-1-leon@kernel.org Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04RDMA/ucma: Fix use-after-free bug in ucma_create_ueventAvihai Horon1-29/+41
[ Upstream commit fe454dc31e84f8c14cb8942fcb61666c9f40745b ] ucma_process_join() allocates struct ucma_multicast mc and frees it if an error occurs during its run. Specifically, if an error occurs in copy_to_user(), a use-after-free might happen in the following scenario: 1. mc struct is allocated. 2. rdma_join_multicast() is called and succeeds. During its run, cma_iboe_join_multicast() enqueues a work that will later use the aforementioned mc struct. 3. copy_to_user() is called and fails. 4. mc struct is deallocated. 5. The work that was enqueued by cma_iboe_join_multicast() is run and calls ucma_create_uevent() which tries to access mc struct (which is freed by now). Fix this bug by cancelling the work enqueued by cma_iboe_join_multicast(). Since cma_work_handler() frees struct cma_work, we don't use it in cma_iboe_join_multicast() so we can safely cancel the work later. The following syzkaller report revealed it: BUG: KASAN: use-after-free in ucma_create_uevent+0x2dd/0x;3f0 drivers/infiniband/core/ucma.c:272 Read of size 8 at addr ffff88810b3ad110 by task kworker/u8:1/108 CPU: 1 PID: 108 Comm: kworker/u8:1 Not tainted 5.10.0-rc6+ #257 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: rdma_cm cma_work_handler Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xbe/0xf9 lib/dump_stack.c:118 print_address_description.constprop.0+0x3e/0×60 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report.cold+0x1f/0×37 mm/kasan/report.c:562 ucma_create_uevent+0x2dd/0×3f0 drivers/infiniband/core/ucma.c:272 ucma_event_handler+0xb7/0×3c0 drivers/infiniband/core/ucma.c:349 cma_cm_event_handler+0x5d/0×1c0 drivers/infiniband/core/cma.c:1977 cma_work_handler+0xfa/0×190 drivers/infiniband/core/cma.c:2718 process_one_work+0x54c/0×930 kernel/workqueue.c:2272 worker_thread+0x82/0×830 kernel/workqueue.c:2418 kthread+0x1ca/0×220 kernel/kthread.c:292 ret_from_fork+0x1f/0×30 arch/x86/entry/entry_64.S:296 Allocated by task 359: kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:461 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:434 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:664 [inline] ucma_process_join+0x16e/0×3f0 drivers/infiniband/core/ucma.c:1453 ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538 ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724 vfs_write fs/read_write.c:603 [inline] vfs_write+0x191/0×4c0 fs/read_write.c:585 ksys_write+0x1a1/0×1e0 fs/read_write.c:658 do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 359: kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48 kasan_set_track+0x1c/0×30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0×30 mm/kasan/generic.c:355 __kasan_slab_free+0x112/0×160 mm/kasan/common.c:422 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0xb3/0×3e0 mm/slub.c:4124 ucma_process_join+0x22d/0×3f0 drivers/infiniband/core/ucma.c:1497 ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538 ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724 vfs_write fs/read_write.c:603 [inline] vfs_write+0x191/0×4c0 fs/read_write.c:585 ksys_write+0x1a1/0×1e0 fs/read_write.c:658 do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88810b3ad100 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 16 bytes inside of 192-byte region [ffff88810b3ad100, ffff88810b3ad1c0) Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join") Link: https://lore.kernel.org/r/20210211090517.1278415-1-leon@kernel.org Reported-by: Amit Matityahu <mitm@nvidia.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04IB/cm: Avoid a loop when device has 255 portsParav Pandit1-4/+4
[ Upstream commit 131be26750379592f0dd6244b2a90bbb504a10bb ] When RDMA device has 255 ports, loop iterator i overflows. Due to which cm_add_one() port iterator loops infinitely. Use core provided port iterator to avoid the infinite loop. Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Link: https://lore.kernel.org/r/20210127150010.1876121-9-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04IB/umad: Return EPOLLERR in case of when device disassociatedShay Drory1-0/+10
[ Upstream commit def4cd43f522253645b72c97181399c241b54536 ] Currently, polling a umad device will always works, even if the device was disassociated. A disassociated device should immediately return EPOLLERR from poll(). Otherwise userspace is endlessly hung on poll() with no idea that the device has been removed from the system. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/20210125121339.837518-3-leon@kernel.org Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04IB/umad: Return EIO in case of when device disassociatedShay Drory1-1/+6
[ Upstream commit 4fc5461823c9cad547a9bdfbf17d13f0da0d6bb5 ] MAD message received by the user has EINVAL error in all flows including when the device is disassociated. That makes it impossible for the applications to treat such flow differently. Change it to return EIO, so the applications will be able to perform disassociation recovery. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/20210125121339.837518-2-leon@kernel.org Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-27RDMA/cma: Fix error flow in default_roce_mode_storeNeta Ostrovsky1-1/+3
[ Upstream commit 7c7b3e5d9aeed31d35c5dab0bf9c0fd4c8923206 ] In default_roce_mode_store(), we took a reference to cma_dev, but didn't return it with cma_dev_put in the error flow. Fixes: 1c15b4f2a42f ("RDMA/core: Modify enum ib_gid_type and enum rdma_network_type") Link: https://lore.kernel.org/r/20210113130214.562108-1-leon@kernel.org Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-27RDMA/umem: Avoid undefined behavior of rounddown_pow_of_two()Aharon Landau1-1/+1
[ Upstream commit b79f2dc5ffe17b03ec8c55f0d63f65e87bcac676 ] rounddown_pow_of_two() is undefined when the input is 0. Therefore we need to avoid it in ib_umem_find_best_pgsz and return 0. Otherwise, it could result in not rejecting an invalid page size which eventually causes a kernel oops due to the logical inconsistency. Fixes: 3361c29e9279 ("RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()") Link: https://lore.kernel.org/r/20210113121703.559778-2-leon@kernel.org Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-27RDMA/ucma: Do not miss ctx destruction steps in some casesJason Gunthorpe1-63/+72
[ Upstream commit 8ae291cc95e49011b736b641b0cfad502b7a1526 ] The destruction flow is very complicated here because the cm_id can be destroyed from the event handler at any time if the device is hot-removed. This leaves behind a partial ctx with no cm_id in the xarray, and will let user space leak memory. Make everything consistent in this flow in all places: - Return the xarray back to XA_ZERO_ENTRY before beginning any destruction. The thread that reaches this first is responsible to kfree, everyone else does nothing. - Test the xarray during the special hot-removal case to block the queue_work, this has much simpler locking and doesn't require a 'destroying' - Fix the ref initialization so that it is only positive if cm_id != NULL, then rely on that to guide the destruction process in all cases. Now the new ucma_destroy_private_ctx() can be called in all places that want to free the ctx, including all the error unwinds, and none of the details are missed. Fixes: a1d33b70dbbc ("RDMA/ucma: Rework how new connections are passed through event delivery") Link: https://lore.kernel.org/r/20210105111327.230270-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19RDMA/restrack: Don't treat as an error allocation ID wrappingLeon Romanovsky1-0/+1
commit 3c638cdb8ecc0442552156e0fed8708dd2c7f35b upstream. xa_alloc_cyclic() call returns positive number if ID allocation succeeded but wrapped. It is not an error, so normalize the "ret" variable to zero as marker of not-an-error. drivers/infiniband/core/restrack.c:261 rdma_restrack_add() warn: 'ret' can be either negative or positive Fixes: fd47c2f99f04 ("RDMA/restrack: Convert internal DB from hash to XArray") Link: https://lore.kernel.org/r/20201216100753.1127638-1-leon@kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-09RDMA/core: remove use of dma_virt_opsChristoph Hellwig2-21/+27
[ Upstream commit 5a7a9e038b032137ae9c45d5429f18a2ffdf7d42 ] Use the ib_dma_* helpers to skip the DMA translation instead. This removes the last user if dma_virt_ops and keeps the weird layering violation inside the RDMA core instead of burderning the DMA mapping subsystems with it. This also means the software RDMA drivers now don't have to mess with DMA parameters that are not relevant to them at all, and that in the future we can use PCI P2P transfers even for software RDMA, as there is no first fake layer of DMA mapping that the P2P DMA support. Link: https://lore.kernel.org/r/20201106181941.1878556-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mike Marciniszyn <mike.marcini