Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 22e9f71072fa605cbf033158db58e0790101928d upstream.
If the state is not idle then resolve_prepare_src() should immediately
fail and no change to global state should happen. However, it
unconditionally overwrites the src_addr trying to build a temporary any
address.
For instance if the state is already RDMA_CM_LISTEN then this will corrupt
the src_addr and would cause the test in cma_cancel_operation():
if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev)
Which would manifest as this trace from syzkaller:
BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26
Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204
CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
__list_add_valid+0x93/0xa0 lib/list_debug.c:26
__list_add include/linux/list.h:67 [inline]
list_add_tail include/linux/list.h:100 [inline]
cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline]
rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751
ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102
ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732
vfs_write+0x28e/0xa30 fs/read_write.c:603
ksys_write+0x1ee/0x250 fs/read_write.c:658
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
This is indicating that an rdma_id_private was destroyed without doing
cma_cancel_listens().
Instead of trying to re-use the src_addr memory to indirectly create an
any address derived from the dst build one explicitly on the stack and
bind to that as any other normal flow would do. rdma_bind_addr() will copy
it over the src_addr once it knows the state is valid.
This is similar to commit bc0bdc5afaa7 ("RDMA/cma: Do not change
route.addr.src_addr.ss_family")
Link: https://lore.kernel.org/r/0-v2-e975c8fd9ef2+11e-syz_cma_srcaddr_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 732d41c545bb ("RDMA/cma: Make the locking for automatic state transition more clear")
Reported-by: syzbot+c94a3675a626f6333d74@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 081bdc9fe05bb23248f5effb6f811da3da4b8252 ]
Remove the flush_workqueue(system_long_wq) call since flushing
system_long_wq is deadlock-prone and since that call is redundant with a
preceding cancel_work_sync()
Link: https://lore.kernel.org/r/20220215210511.28303-3-bvanassche@acm.org
Fixes: ef6c49d87c34 ("IB/srp: Eliminate state SRP_TARGET_DEAD")
Reported-by: syzbot+831661966588c802aae9@syzkaller.appspotmail.com
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c46fa8911b17e3f808679061a8af8bee219f4602 ]
Error path of rtrs_clt_open() calls free_clt(), where free_permit is
called. This is wrong since error path of rtrs_clt_open() does not need
to call free_permit().
Also, moving free_permits() call to rtrs_clt_close(), makes it more
aligned with the call to alloc_permit() in rtrs_clt_open().
Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20220217030929.323849-2-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 25a033f5a75873cfdd36eca3c702363b682afb42 ]
Let's wait the inflight permits before free it.
Link: https://lore.kernel.org/r/20201217141915.56989-10-jinpu.wang@cloud.ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8700af2cc18c919b2a83e74e0479038fd113c15d ]
Callback function rtrs_clt_dev_release() for put_device() calls kfree(clt)
to free memory. We shouldn't call kfree(clt) again, and we can't use the
clt after kfree too.
Replace device_register() with device_initialize() and device_add() so that
dev_set_name can() be used appropriately.
Move mutex_destroy() to the release function so it can be called in
the alloc_clt err path.
Fixes: eab098246625 ("RDMA/rtrs-clt: Refactor the failure cases in alloc_clt")
Link: https://lore.kernel.org/r/20220217030929.323849-1-haris.iqbal@ionos.com
Reported-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f3136c4ce7acf64bee43135971ca52a880572e32 upstream.
The failure to allocate memory during MLX4_DEV_EVENT_PORT_MGMT_CHANGE
event handler will cause skip the assignment logic, but
ib_dispatch_event() will be called anyway.
Fix it by calling to return instead of break after memory allocation
failure.
Fixes: 00f5ce99dc6e ("mlx4: Use port management change event instead of smp_snoop")
Link: https://lore.kernel.org/r/12a0e83f18cfad4b5f62654f141e240d04915e10.1643622264.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b43a76f423aa304037603fd6165c4a534d2c09a7 upstream.
Code unconditionally resumed fenced SQ processing after next RDMA Read
completion, even if other RDMA Read responses are still outstanding, or
ORQ is full. Also adds comments for better readability of fence
processing, and removes orq_get_tail() helper, which is not needed
anymore.
Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Fixes: a531975279f3 ("rdma/siw: main include file")
Link: https://lore.kernel.org/r/20220130170815.1940-1-bmt@zurich.ibm.com
Reported-by: Jared Holzman <jared.holzman@excelero.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4028bccb003cf67e46632dee7f97ddc5d7b6e685 upstream.
The rdma-core test suite sends an unaligned remote address and expects a
failure.
ERROR: test_atomic_non_aligned_addr (tests.test_atomic.AtomicTest)
The qib/hfi1 rc handling validates properly, but the test has the client
and server on the same system.
The loopback of these operations is a distinct code path.
Fix by syntaxing the proposed remote address in the loopback code path.
Fixes: 15703461533a ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt")
Link: https://lore.kernel.org/r/1642584489-141005-1-git-send-email-mike.marciniszyn@cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 36e8169ec973359f671f9ec7213547059cae972e upstream.
Partially revert the commit mentioned in the Fixes line to make sure that
allocation and erasing multicast struct are locked.
BUG: KASAN: use-after-free in ucma_cleanup_multicast drivers/infiniband/core/ucma.c:491 [inline]
BUG: KASAN: use-after-free in ucma_destroy_private_ctx+0x914/0xb70 drivers/infiniband/core/ucma.c:579
Read of size 8 at addr ffff88801bb74b00 by task syz-executor.1/25529
CPU: 0 PID: 25529 Comm: syz-executor.1 Not tainted 5.16.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
__kasan_report mm/kasan/report.c:433 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
ucma_cleanup_multicast drivers/infiniband/core/ucma.c:491 [inline]
ucma_destroy_private_ctx+0x914/0xb70 drivers/infiniband/core/ucma.c:579
ucma_destroy_id+0x1e6/0x280 drivers/infiniband/core/ucma.c:614
ucma_write+0x25c/0x350 drivers/infiniband/core/ucma.c:1732
vfs_write+0x28e/0xae0 fs/read_write.c:588
ksys_write+0x1ee/0x250 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Currently the xarray search can touch a concurrently freeing mc as the
xa_for_each() is not surrounded by any lock. Rather than hold the lock for
a full scan hold it only for the effected items, which is usually an empty
list.
Fixes: 95fe51096b7a ("RDMA/ucma: Remove mc_list and rely on xarray")
Link: https://lore.kernel.org/r/1cda5fabb1081e8d16e39a48d3a4f8160cea88b8.1642491047.git.leonro@nvidia.com
Reported-by: syzbot+e3f96c43d19782dd14a7@syzkaller.appspotmail.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d9e410ebbed9d091b97bdf45b8a3792e2878dc48 upstream.
In RoCE we should use cma_iboe_set_mgid() and not cma_set_mgid to generate
the mgid, otherwise we will generate an IGMP for an incorrect address.
Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join")
Link: https://lore.kernel.org/r/913bc6783fd7a95fe71ad9454e01653ee6fb4a9a.1642491047.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5f8f55b92edd621f056bdf09e572092849fabd83 upstream.
An early failure in hfi1_ipoib_setup_rn() can lead to the following panic:
BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
Workqueue: events work_for_cpu_fn
RIP: 0010:try_to_grab_pending+0x2b/0x140
Code: 1f 44 00 00 41 55 41 54 55 48 89 d5 53 48 89 fb 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 48 89 55 00 40 84 f6 75 77 <f0> 48 0f ba 2b 00 72 09 31 c0 5b 5d 41 5c 41 5d c3 48 89 df e8 6c
RSP: 0018:ffffb6b3cf7cfa48 EFLAGS: 00010046
RAX: 0000000000000246 RBX: 00000000000001b0 RCX: 0000000000000000
RDX: 0000000000000246 RSI: 0000000000000000 RDI: 00000000000001b0
RBP: ffffb6b3cf7cfa70 R08: 0000000000000f09 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: ffffb6b3cf7cfa90 R14: ffffffff9b2fbfc0 R15: ffff8a4fdf244690
FS: 0000000000000000(0000) GS:ffff8a527f400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000001b0 CR3: 00000017e2410003 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
__cancel_work_timer+0x42/0x190
? dev_printk_emit+0x4e/0x70
iowait_cancel_work+0x15/0x30 [hfi1]
hfi1_ipoib_txreq_deinit+0x5a/0x220 [hfi1]
? dev_err+0x6c/0x90
hfi1_ipoib_netdev_dtor+0x15/0x30 [hfi1]
hfi1_ipoib_setup_rn+0x10e/0x150 [hfi1]
rdma_init_netdev+0x5a/0x80 [ib_core]
? hfi1_ipoib_free_rdma_netdev+0x20/0x20 [hfi1]
ipoib_intf_init+0x6c/0x350 [ib_ipoib]
ipoib_intf_alloc+0x5c/0xc0 [ib_ipoib]
ipoib_add_one+0xbe/0x300 [ib_ipoib]
add_client_context+0x12c/0x1a0 [ib_core]
enable_device_and_get+0xdc/0x1d0 [ib_core]
ib_register_device+0x572/0x6b0 [ib_core]
rvt_register_device+0x11b/0x220 [rdmavt]
hfi1_register_ib_device+0x6b4/0x770 [hfi1]
do_init_one.isra.20+0x3e3/0x680 [hfi1]
local_pci_probe+0x41/0x90
work_for_cpu_fn+0x16/0x20
process_one_work+0x1a7/0x360
? create_worker+0x1a0/0x1a0
worker_thread+0x1cf/0x390
? create_worker+0x1a0/0x1a0
kthread+0x116/0x130
? kthread_flush_work_fn+0x10/0x10
ret_from_fork+0x1f/0x40
The panic happens in hfi1_ipoib_txreq_deinit() because there is a NULL
deref when hfi1_ipoib_netdev_dtor() is called in this error case.
hfi1_ipoib_txreq_init() and hfi1_ipoib_rxq_init() are self unwinding so
fix by adjusting the error paths accordingly.
Other changes:
- hfi1_ipoib_free_rdma_netdev() is deleted including the free_netdev()
since the netdev core code deletes calls free_netdev()
- The switch to the accelerated entrances is moved to the success path.
Cc: stable@vger.kernel.org
Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/1642287756-182313-4-git-send-email-mike.marciniszyn@cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8d1cfb884e881efd69a3be4ef10772c71cb22216 upstream.
There is a redundant ']' in the name of opcode IB_OPCODE_RC_SEND_MIDDLE,
so just fix it.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20211218112320.3558770-1-cgxu519@mykernel.net
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 39d5534b1302189c809e90641ffae8cbdc42a8fc upstream.
It is more general for ARM device drivers to use the device attribute to
map PCI BAR spaces.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20211206133652.27476-1-liangwenpeng@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e375b9c92985e409c4bb95dd43d34915ea7f5e28 ]
The API for ib_query_qp requires the driver to set cur_qp_state on return,
add the missing set.
Fixes: 67bbc05512d8 ("RDMA/cxgb4: Add query_qp support")
Link: https://lore.kernel.org/r/20211220152530.60399-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 20679094a0161c94faf77e373fa3f7428a8e14bd ]
Currently, when cma_resolve_ib_dev() searches for a matching GID it will
stop searching after encountering the first empty GID table entry. This
behavior is wrong since neither IB nor RoCE spec enforce tightly packed
GID tables.
For example, when the matching valid GID entry exists at index N, and if a
GID entry is empty at index N-1, cma_resolve_ib_dev() will fail to find
the matching valid entry.
Fix it by making cma_resolve_ib_dev() continue searching even after
encountering missing entries.
Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()")
Link: https://lore.kernel.org/r/b7346307e3bb396c43d67d924348c6c496493991.1639055490.git.leonro@nvidia.com
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 483d805191a23191f8294bbf9b4e94836f5d92e4 ]
Currently, ib_find_gid() will stop searching after encountering the first
empty GID table entry. This behavior is wrong since neither IB nor RoCE
spec enforce tightly packed GID tables.
For example, when a valid GID entry exists at index N, and if a GID entry
is empty at index N-1, ib_find_gid() will fail to find the valid entry.
Fix it by making ib_find_gid() continue searching even after encountering
missing entries.
Fixes: 5eb620c81ce3 ("IB/core: Add helpers for uncached GID and P_Key searches")
Link: https://lore.kernel.org/r/e55d331b96cecfc2cf19803d16e7109ea966882d.1639055490.git.leonro@nvidia.com
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b1a4da64bfc189510e08df1ccb1c589e667dc7a3 ]
Fix the wrongly reported max_send_wr and max_recv_wr attributes for user
QP by making sure to save their valuse on QP creation, so when query QP is
called the attributes will be reported correctly.
Fixes: cecbcddf6461 ("qedr: Add support for QP verbs")
Link: https://lore.kernel.org/r/20211206201314.124947-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2a67fcfa0db6b4075515bd23497750849b88850f ]
Before query pkey, make sure that the queried index is valid.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20211117145954.123893-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
pending cmd-bit"
[ Upstream commit a917dfb66c0a1fa1caacf3d71edcafcab48e6ff0 ]
The 'cmdq->cmdq_bitmap' bitmap is 'rcfw->cmdq_depth' bits long. The size
stored in 'cmdq->bmap_size' is the size of the bitmap in bytes.
Remove this erroneous 'bmap_size' and use 'rcfw->cmdq_depth' directly in
'bnxt_qplib_disable_rcfw_channel()'. Otherwise some error messages may be
missing.
Other uses of 'cmdq_bitmap' already take into account 'rcfw->cmdq_depth'
directly.
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/47ed717c3070a1d0f53e7b4c768a4fd11caf365d.1636707421.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 7694a7de22c53a312ea98960fcafc6ec62046531 upstream.
Because of the possible failure of the allocation, data might be NULL
pointer and will cause the dereference of the NULL pointer later.
Therefore, it might be better to check it and return -ENOMEM.
Fixes: 6884c6c4bd09 ("RDMA/verbs: Store the write/write_ex uapi entry points in the uverbs_api")
Link: https://lore.kernel.org/r/20211231093315.1917667-1-jiasheng@iscas.ac.cn
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b35a0f4dd544eaa6162b6d2f13a2557a121ae5fd upstream.
If dst->is_global field is not set, the GRH fields are not cleared
and the following infoleak is reported.
=====================================================
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33
instrument_copy_to_user include/linux/instrumented.h:121 [inline]
_copy_to_user+0x1c9/0x270 lib/usercopy.c:33
copy_to_user include/linux/uaccess.h:209 [inline]
ucma_init_qp_attr+0x8c7/0xb10 drivers/infiniband/core/ucma.c:1242
ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732
vfs_write+0x8ce/0x2030 fs/read_write.c:588
ksys_write+0x28b/0x510 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__ia32_sys_write+0xdb/0x120 fs/read_write.c:652
do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline]
__do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180
do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205
do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
Local variable resp created at:
ucma_init_qp_attr+0xa4/0xb10 drivers/infiniband/core/ucma.c:1214
ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732
Bytes 40-59 of 144 are uninitialized
Memory access of size 144 starts at ffff888167523b00
Data copied to user address 0000000020000100
CPU: 1 PID: 25910 Comm: syz-executor.1 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
=====================================================
Fixes: 4ba66093bdc6 ("IB/core: Check for global flag when using ah_attr")
Link: https://lore.kernel.org/r/0e9dd51f93410b7b2f4f5562f52befc878b71afa.1641298868.git.leonro@nvidia.com
Reported-by: syzbot+6d532fa8f9463da290bc@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 12d3bbdd6bd2780b71cc466f3fbc6eb7d43bbc2a ]
Variables allocated by kvmalloc_array() should not be freed by kfree.
Because they may be allocated by vmalloc. So we replace kfree() with
kvfree() here.
Fixes: 6fd610c5733d ("RDMA/hns: Support 0 hop addressing for SRQ buffer")
Link: https://lore.kernel.org/r/20211210094234.5829-1-billsjc@sjtu.edu.cn
Signed-off-by: Jiacheng Shi <billsjc@sjtu.edu.cn>
Acked-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bee90911e0138c76ee67458ac0d58b38a3190f65 ]
The wrong goto label was used for the error case and missed cleanup of the
pkt allocation.
Fixes: d39bf40e55e6 ("IB/qib: Protect from buffer overflow in struct qib_user_sdma_pkt fields")
Link: https://lore.kernel.org/r/20211208175238.29983-1-jose.exposito89@gmail.com
Addresses-Coverity-ID: 1493352 ("Resource leak")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b0969f83890bf8b47f5c8bd42539599b2b52fdeb upstream.
When hns_roce_v2_destroy_qp() is called, the brief calling process of the
driver is as follows:
......
hns_roce_v2_destroy_qp
hns_roce_v2_qp_modify
hns_roce_cmd_mbox
hns_roce_qp_destroy
If hns_roce_cmd_mbox() detects that the hardware is being reset during the
execution of the hns_roce_cmd_mbox(), the driver will not be able to get
the return value from the hardware (the firmware cannot respond to the
driver's mailbox during the hardware reset phase).
The driver needs to wait for the hardware reset to complete before
continuing to execute hns_roce_qp_destroy(), otherwise it may happen that
the driver releases the resources but the hardware is still accessing. In
order to fix this problem, HNS RoCE needs to add a piece of code to wait
for the hardware reset to complete.
The original interface get_hw_reset_stat() is the instantaneous state of
the hardware reset, which cannot accurately reflect whether the hardware
reset is completed, so it needs to be replaced with the ae_dev_reset_cnt
interface.
The sign that the hardware reset is complete is that the return value of
the ae_dev_reset_cnt interface is greater than the original value
reset_cnt recorded by the driver.
Fixes: 6a04aed6afae ("RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset")
Link: https://lore.kernel.org/r/20211123142402.26936-1-liangwenpeng@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 52414e27d6b568120b087d1fbafbb4482b0ccaab upstream.
is_reset is used to indicate whether the hardware starts to reset. When
hns_roce_hw_v2_reset_notify_down() is called, the hardware has not yet
started to reset. If is_reset is set at this time, all mailbox operations
of resource destroy actions will be intercepted by driver. When the driver
cleans up resources, but the hardware is still accessed, the following
errors will appear:
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010
arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000003f
arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50e0800
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010
arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000043e
arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50a0800
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000020880000436
arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50a0880
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010
arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000043a
arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50e0840
hns3 0000:35:00.0: INT status: CMDQ(0x0) HW errors(0x0) other(0x0)
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
hns3 0000:35:00.0: received unknown or unhandled event of vector0
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010
{34}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 7
is_reset will be set correctly in check_aedev_reset_status(), so the
setting in hns_roce_hw_v2_reset_notify_down() should be deleted.
Fixes: 726be12f5ca0 ("RDMA/hns: Set reset flag when hw resetting")
Link: https://lore.kernel.org/r/20211123084809.37318-1-liangwenpeng@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9292f8f9a2ac42eb320bced7153aa2e63d8cc13a upstream.
The code tests the dma address which legitimately can be 0.
The code should test the kernel logical address to avoid leaking eager
buffer allocations that happen to map to a dma address of 0.
Fixes: 60368186fd85 ("IB/hfi1: Fix user-space buffers mapping with IOMMU enabled")
Link: https://lore.kernel.org/r/20211129191952.101968.17137.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 60a8b5a1611b4a26de4839ab9c1fc2a9cf3e17c1 upstream.
This buffer is currently allocated in hfi1_init():
if (reinit)
ret = init_after_reset(dd);
else
ret = loadtime_init(dd);
if (ret)
goto done;
/* allocate dummy tail memory for all receive contexts */
dd->rcvhdrtail_dummy_kvaddr = dma_alloc_coherent(&dd->pcidev->dev,
sizeof(u64),
&dd->rcvhdrtail_dummy_dma,
GFP_KERNEL);
if (!dd->rcvhdrtail_dummy_kvaddr) {
dd_dev_err(dd, "cannot allocate dummy tail memory\n");
ret = -ENOMEM;
goto done;
}
The reinit triggered path will overwrite the old allocation and leak it.
Fix by moving the allocation to hfi1_alloc_devdata() and the deallocation
to hfi1_free_devdata().
Link: https://lore.kernel.org/r/20211129192008.101968.91302.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: 46b010d3eeb8 ("staging/rdma/hfi1: Workaround to prevent corruption during packet delivery")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f6a3cfec3c01f9983e961c3327cef0db129a3c43 upstream.
The following trace can be observed with an init failure such as firmware
load failures:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 0 P4D 0
Oops: 0010 [#1] SMP PTI
CPU: 0 PID: 537 Comm: kworker/0:3 Tainted: G OE --------- - - 4.18.0-240.el8.x86_64 #1
Workqueue: events work_for_cpu_fn
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0000:ffffae5f878a3c98 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff95e48e025c00 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff95e48e025c00
RBP: ffff95e4bf3660a4 R08: 0000000000000000 R09: ffffffff86d5e100
R10: ffff95e49e1de600 R11: 0000000000000001 R12: ffff95e4bf366180
R13: ffff95e48e025c00 R14: ffff95e4bf366028 R15: ffff95e4bf366000
FS: 0000000000000000(0000) GS:ffff95e4df200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000f86a0a003 CR4: 00000000001606f0
Call Trace:
receive_context_interrupt+0x1f/0x40 [hfi1]
__free_irq+0x201/0x300
free_irq+0x2e/0x60
pci_free_irq+0x18/0x30
msix_free_irq.part.2+0x46/0x80 [hfi1]
msix_clean_up_interrupts+0x2b/0x70 [hfi1]
hfi1_init_dd+0x640/0x1a90 [hfi1]
do_init_one.isra.19+0x34d/0x680 [hfi1]
local_pci_probe+0x41/0x90
work_for_cpu_fn+0x16/0x20
process_one_work+0x1a7/0x360
worker_thread+0x1cf/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? kthread_flush_work_fn+0x10/0x10
ret_from_fork+0x35/0x40
The free_irq() results in a callback to the registered interrupt handler,
and rcd->do_interrupt is NULL because the receive context data structures
are not fully initialized.
Fix by ensuring that the do_interrupt is always assigned and adding a
guards in the slow path handler to detect and handle a partially
initialized receive context and noop the receive.
Link: https://lore.kernel.org/r/20211129192003.101968.33612.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: b0ba3c18d6bf ("IB/hfi1: Move normal functions from hfi1_devdata to const array")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b6d57e24ce6cc3df8a8845e1b193e88a65d501b1 upstream.
The following BUG has just surfaced with our 5.16 testing:
BUG: using smp_processor_id() in preemptible [00000000] code: mpicheck/1581081
caller is sdma_select_user_engine+0x72/0x210 [hfi1]
CPU: 0 PID: 1581081 Comm: mpicheck Tainted: G S 5.16.0-rc1+ #1
Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
Call Trace:
<TASK>
dump_stack_lvl+0x33/0x42
check_preemption_disabled+0xbf/0xe0
sdma_select_user_engine+0x72/0x210 [hfi1]
? _raw_spin_unlock_irqrestore+0x1f/0x31
? hfi1_mmu_rb_insert+0x6b/0x200 [hfi1]
hfi1_user_sdma_process_request+0xa02/0x1120 [hfi1]
? hfi1_write_iter+0xb8/0x200 [hfi1]
hfi1_write_iter+0xb8/0x200 [hfi1]
do_iter_readv_writev+0x163/0x1c0
do_iter_write+0x80/0x1c0
vfs_writev+0x88/0x1a0
? recalibrate_cpu_khz+0x10/0x10
? ktime_get+0x3e/0xa0
? __fget_files+0x66/0xa0
do_writev+0x65/0x100
do_syscall_64+0x3a/0x80
Fix this long standing bug by moving the smp_processor_id() to after the
rcu_read_lock().
The rcu_read_lock() implicitly disables preemption.
Link: https://lore.kernel.org/r/20211129191958.101968.87329.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: 0cb2aa690c7e ("IB/hfi1: Add sysfs interface for affinity setup")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6bda39149d4b8920fdb8744090653aca3daa792d ]
When VF is configured with default vlan, HW strips the vlan from the
packet and driver receives it in Rx completion. VLAN needs to be reported
for UD work completion only if the vlan is configured on the host. Add a
check for valid vlan in the UD receive path.
Link: https://lore.kernel.org/r/1631709163-2287-12-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f4e56ec4452f48b8292dcf0e1c4bdac83506fb8b ]
The error flow fixed in this patch is not possible because all kernel
users of create QP interface check that device supports steering before
set IB_QP_CREATE_NETIF_QP flag.
Fixes: c1c98501121e ("IB/mlx4: Add support for steerable IB UD QPs")
Link: https://lore.kernel.org/r/91c61f6e60eb0240f8bbc321fda7a1d2986dd03c.1634023677.git.leonro@nvidia.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 598d16fa1bf93431ad35bbab3ed1affe4fb7b562 ]
Fill the missing parameters for the FW command while querying SRQ.
Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters")
Link: https://lore.kernel.org/r/1631709163-2287-8-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dcd3f985b20ffcc375f82ca0ca9f241c7025eb5e ]
The port->attr.port_cap_flags should be set to enum
ib_port_capability_mask_bits in ib_mad.h, not
RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210831083223.65797-1-weijunji@bytedance.com
Signed-off-by: Junji Wei <weijunji@bytedance.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 4f960393a0ee9a39469ceb7c8077ae8db665cc12 upstream.
This patch fixes a crash caused by querying the QP via netlink, and
corrects the state of GSI qp. GSI qp's have a NULL qed_qp.
The call trace is generated by:
$ rdma res show
BUG: kernel NULL pointer dereference, address: 0000000000000034
Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012
RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed]
RSP: 0018:ffffba560a08f580 EFLAGS: 00010206
RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000
RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090
RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048
R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000
R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec
FS: 00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0
Call Trace:
qedr_query_qp+0x82/0x360 [qedr]
ib_query_qp+0x34/0x40 [ib_core]
? ib_query_qp+0x34/0x40 [ib_core]
fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core]
? __nla_put+0x20/0x30
? nla_put+0x33/0x40
fill_res_qp_entry+0xe3/0x120 [ib_core]
res_get_common_dumpit+0x3f8/0x5d0 [ib_core]
? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core]
nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core]
netlink_dump+0x156/0x2f0
__netlink_dump_start+0x1ab/0x260
rdma_nl_rcv+0x1de/0x330 [ib_core]
? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core]
netlink_unicast+0x1b8/0x270
netlink_sendmsg+0x33e/0x470
sock_sendmsg+0x63/0x70
__sys_sendto+0x13f/0x180
? setup_sgl.isra.12+0x70/0xc0
__x64_sys_sendto+0x28/0x30
do_syscall_64+0x3a/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Cc: stable@vger.kernel.org
Fixes: cecbcddf6461 ("qedr: Add support for QP verbs")
Link: https://lore.kernel.org/r/20211027184329.18454-1-palok@marvell.com
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 64733956ebba7cc629856f4a6ee35a52bc9c023f upstream.
When copying the device name, the length of the data memcpy copied exceeds
the length of the source buffer, which cause the KASAN issue below. Use
strscpy_pad() instead.
BUG: KASAN: slab-out-of-bounds in ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core]
Read of size 64 at addr ffff88811a10f5e0 by task rping/140263
CPU: 3 PID: 140263 Comm: rping Not tainted 5.15.0-rc1+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack_lvl+0x57/0x7d
print_address_description.constprop.0+0x1d/0xa0
kasan_report+0xcb/0x110
kasan_check_range+0x13d/0x180
memcpy+0x20/0x60
ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core]
ib_nl_make_request+0x1c6/0x380 [ib_core]
send_mad+0x20a/0x220 [ib_core]
ib_sa_path_rec_get+0x3e3/0x800 [ib_core]
cma_query_ib_route+0x29b/0x390 [rdma_cm]
rdma_resolve_route+0x308/0x3e0 [rdma_cm]
ucma_resolve_route+0xe1/0x150 [rdma_ucm]
ucma_write+0x17b/0x1f0 [rdma_ucm]
vfs_write+0x142/0x4d0
ksys_write+0x133/0x160
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f26499aa90f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48
RSP: 002b:00007f26495f2dc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000000007d0 RCX: 00007f26499aa90f
RDX: 0000000000000010 RSI: 00007f26495f2e00 RDI: 0000000000000003
RBP: 00005632a8315440 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f26495f2e00
R13: 00005632a83154e0 R14: 00005632a8315440 R15: 00005632a830a810
Allocated by task 131419:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7c/0x90
proc_self_get_link+0x8b/0x100
pick_link+0x4f1/0x5c0
step_into+0x2eb/0x3d0
walk_component+0xc8/0x2c0
link_path_walk+0x3b8/0x580
path_openat+0x101/0x230
do_filp_open+0x12e/0x240
do_sys_openat2+0x115/0x280
__x64_sys_openat+0xce/0x140
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink")
Link: https://lore.kernel.org/r/72ede0f6dab61f7f23df9ac7a70666e07ef314b0.1635055496.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1ab52ac1e9bc9391f592c9fa8340a6e3e9c36286 upstream.
Currently, the driver doesn't set the PCP-based priority for DCT, hence
DCT response packets are transmitted without user priority.
Fix it by setting user provided priority in the eth_prio field in the DCT
context, which in turn sets the value in the transmitted packet.
Fixes: 776a3906b692 ("IB/mlx5: Add support for DC target QP")
Link: https://lore.kernel.org/r/5fd2d94a13f5742d8803c218927322257d53205c.1633512672.git.leonro@nvidia.com
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 13bac861952a78664907a0f927d3e874e9a59034 upstream.
sc_disable() after having disabled the send context wakes up any waiters
by calling hfi1_qp_wakeup() while holding the waitlock for the sc.
This is contrary to the model for all other calls to hfi1_qp_wakeup()
where the waitlock is dropped and a local is used to drive calls to
hfi1_qp_wakeup().
Fix by moving the sc->piowait into a local list and driving the wakeup
calls from the list.
Fixes: 099a884ba4c0 ("IB/hfi1: Handle wakeup of orphaned QPs for pio")
Link: https://lore.kernel.org/r/20211013141852.128104.2682.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d39bf40e55e666b5905fdbd46a0dced030ce87be upstream.
Overflowing either addrlimit or bytes_togo can allow userspace to trigger
a buffer overflow of kernel memory. Check for overflows in all the places
doing math on user controlled buffers.
Fixes: f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Link: https://lore.kernel.org/r/20211012175519.7298.77738.stgit@awfm-01.cornelisnetworks.com
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|