summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/irdma
AgeCommit message (Collapse)AuthorFilesLines
2023-03-19RDMA/irdma: Add ipv4 check to irdma_find_listener()Tatyana Nikolova1-6/+10
Add ipv4 check to irdma_find_listener(). Otherwise the function incorrectly finds and returns a listener with a different addr family for the zero IP addr, if a listener with a zero IP addr and the same port as the one searched for has already been created. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230315145231.931-5-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19RDMA/irdma: Increase iWARP CM default rexmit countMustafa Ismail1-1/+1
When running perftest with large number of connections in iWARP mode, the passive side could be slow to respond. Increase the rexmit counter default to allow scaling connections. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230315145231.931-4-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19RDMA/irdma: Fix memory leak of PBLE objectsMustafa Ismail1-0/+3
On rmmod of irdma, the PBLE object memory is not being freed. PBLE object memory are not statically pre-allocated at function initialization time unlike other HMC objects. PBLEs objects and the Segment Descriptors (SD) for it can be dynamically allocated during scale up and SD's remain allocated till function deinitialization. Fix this leak by adding IRDMA_HMC_IW_PBLE to the iw_hmc_obj_types[] table and skip pbles in irdma_create_hmc_obj but not in irdma_del_hmc_objects(). Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230315145231.931-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19RDMA/irdma: Do not generate SW completions for NOPsMustafa Ismail1-1/+4
Currently, artificial SW completions are generated for NOP wqes which can generate unexpected completions with wr_id = 0. Skip the generation of artificial completions for NOPs. Fixes: 81091d7696ae ("RDMA/irdma: Add SW mechanism to generate completions on error") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230315145231.931-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19RDMA/irdma: Refactor PBLE functionsSindhu Devale3-32/+31
Refactor PBLE functions using a bit mask to represent the PBLE level desired versus 2 parameters use_pble and lvl_one_only which makes the code confusing. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Link: https://lore.kernel.org/r/20230315145305.955-5-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19RDMA/irdma: Change name of interruptsMichal Swiatkowski2-3/+13
Add more information in interrupt names. Before this patch it was: irdma CEQ CEQ ... Now: irdma-0000:18:00.0-AEQ irdma-0000:18:00.0-CEQ-0 irdma-0000:18:00.0-CEQ-1 ... Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Suggested-by: Piotr Raczynski <piotr.raczynski@intel.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230315145305.955-4-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19RDMA/irdma: Remove a redundant irdma_arp_table() callTatyana Nikolova1-4/+0
Remove a redundant function call in irdma_modify_qp_roce, since irdma_arp_table() with IRDMA_ARP_RESOLVE action is called after the if/else ipv check as part of irdma_add_arp(). Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230315145305.955-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19RDMA/irdma: Refactor HW statisticsKrzysztof Czurylo9-580/+362
Refactor HW statistics which, - Unifies HW statistics support for all HW generations. - Unifies support of 32- and 64-bit counters. - Removes duplicated code and simplifies implementation. - Fixes roll-over handling. - Removes unneeded last_hw_stats. With new implementation, there is no separate handling and no separate arrays for 32- and 64-bit counters (offsets, regs, values). Instead, there is a HW stats map array for each HW revision, which defines HW-specific width and location of each counter in the statistics buffer. Once the statistics are gathered (either via CQP op, or by reading HW registers), counter values are extracted from the statistics buffer using the stats map and the delta between the last and new values is computed. Finally, the counter values in rdma_hw_stats are incremented by those deltas. From the OS perspective, all the counters are 64-bit and their order in rdma_hw_stats->value[] array, as well as in irdma_hw_stat_names[], is the same for all HW gens. New statistics should always be added at the end. Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com> Signed-off-by: Youvaraj Sagar <youvaraj.sagar@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230315145305.955-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2-103/+212
Pull rdma updates from Jason Gunthorpe: "Quite a small cycle this time, even with the rc8. I suppose everyone went to sleep over xmas. - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana - inline CQE support for hns - Have mlx5 display device error codes - Pinned DMABUF support for irdma - Continued rxe cleanups, particularly converting the MRs to use xarray - Improvements to what can be cached in the mlx5 mkey cache" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits) IB/mlx5: Extend debug control for CC parameters IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors IB/hfi1: Fix math bugs in hfi1_can_pin_pages() RDMA/irdma: Add support for dmabuf pin memory regions RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys RDMA/rxe: Fix missing memory barriers in rxe_queue.h RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet RDMA/rxe: Remove rxe_alloc() RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size Subject: RDMA/rxe: Handle zero length rdma iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry() RDMA/mlx5: Use rdma_umem_for_each_dma_block() RDMA/umem: Remove unused 'work' member from struct ib_umem RDMA/irdma: Cap MSIX used to online CPUs + 1 RDMA/mlx5: Check reg_create() create for errors RDMA/restrack: Correct spelling RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish() ...
2023-02-17RDMA/irdma: Add support for dmabuf pin memory regionsZhu Yanjun1-0/+42
This is a followup to the EFA dmabuf[1]. Irdma driver currently does not support on-demand-paging(ODP). So it uses habanalabs as the dmabuf exporter, and irdma as the importer to allow for peer2peer access through libibverbs. In this commit, the function ib_umem_dmabuf_get_pinned() is used. This function is introduced in EFA dmabuf[1] which allows the driver to get a dmabuf umem which is pinned and does not require move_notify callback implementation. The returned umem is pinned and DMA mapped like standard cpu umems, and is released through ib_umem_release(). [1]https://lore.kernel.org/lkml/20211007114018.GD2688930@ziepe.ca/t/ Link: https://lore.kernel.org/r/20230217011425.498847-1-yanjun.zhu@intel.com Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-08RDMA/irdma: Cap MSIX used to online CPUs + 1Mustafa Ismail1-0/+2
The irdma driver can use a maximum number of msix vectors equal to num_online_cpus() + 1 and the kernel warning stack below is shown if that number is exceeded. The kernel throws a warning as the driver tries to update the affinity hint with a CPU mask greater than the max CPU IDs. Fix this by capping the MSIX vectors to num_online_cpus() + 1. WARNING: CPU: 7 PID: 23655 at include/linux/cpumask.h:106 irdma_cfg_ceq_vector+0x34c/0x3f0 [irdma] RIP: 0010:irdma_cfg_ceq_vector+0x34c/0x3f0 [irdma] Call Trace: irdma_rt_init_hw+0xa62/0x1290 [irdma] ? irdma_alloc_local_mac_entry+0x1a0/0x1a0 [irdma] ? __is_kernel_percpu_address+0x63/0x310 ? rcu_read_lock_held_common+0xe/0xb0 ? irdma_lan_unregister_qset+0x280/0x280 [irdma] ? irdma_request_reset+0x80/0x80 [irdma] ? ice_get_qos_params+0x84/0x390 [ice] irdma_probe+0xa40/0xfc0 [irdma] ? rcu_read_lock_bh_held+0xd0/0xd0 ? irdma_remove+0x140/0x140 [irdma] ? rcu_read_lock_sched_held+0x62/0xe0 ? down_write+0x187/0x3d0 ? auxiliary_match_id+0xf0/0x1a0 ? irdma_remove+0x140/0x140 [irdma] auxiliary_bus_probe+0xa6/0x100 __driver_probe_device+0x4a4/0xd50 ? __device_attach_driver+0x2c0/0x2c0 driver_probe_device+0x4a/0x110 __driver_attach+0x1aa/0x350 bus_for_each_dev+0x11d/0x1b0 ? subsys_dev_iter_init+0xe0/0xe0 bus_add_driver+0x3b1/0x610 driver_register+0x18e/0x410 ? 0xffffffffc0b88000 irdma_init_module+0x50/0xaa [irdma] do_one_initcall+0x103/0x5f0 ? perf_trace_initcall_level+0x420/0x420 ? do_init_module+0x4e/0x700 ? __kasan_kmalloc+0x7d/0xa0 ? kmem_cache_alloc_trace+0x188/0x2b0 ? kasan_unpoison+0x21/0x50 do_init_module+0x1d1/0x700 load_module+0x3867/0x5260 ? layout_and_allocate+0x3990/0x3990 ? rcu_read_lock_held_common+0xe/0xb0 ? rcu_read_lock_sched_held+0x62/0xe0 ? rcu_read_lock_bh_held+0xd0/0xd0 ? __vmalloc_node_range+0x46b/0x890 ? lock_release+0x5c8/0xba0 ? alloc_vm_area+0x120/0x120 ? selinux_kernel_module_from_file+0x2a5/0x300 ? __inode_security_revalidate+0xf0/0xf0 ? __do_sys_init_module+0x1db/0x260 __do_sys_init_module+0x1db/0x260 ? load_module+0x5260/0x5260 ? do_syscall_64+0x22/0x450 do_syscall_64+0xa5/0x450 entry_SYSCALL_64_after_hwframe+0x66/0xdb Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Link: https://lore.kernel.org/r/20230207201938.1329-1-sindhu.devale@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-29RDMA/irdma: Fix potential NULL-ptr-dereferenceNikita Zhandarovich1-0/+3
in_dev_get() can return NULL which will cause a failure once idev is dereferenced in in_dev_for_each_ifa_rtnl(). This patch adds a check for NULL value in idev beforehand. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20230126185230.62464-1-n.zhandarovich@fintech.ru Reviewed-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-26RDMA/irdma: Split CQ handler into irdma_reg_user_mr_type_cqZhu Yanjun1-29/+40
Split the source codes related with CQ handling into a new function. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230116193502.66540-5-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-26RDMA/irdma: Split QP handler into irdma_reg_user_mr_type_qpZhu Yanjun1-14/+33
Split the source codes related with QP handling into a new function. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230116193502.66540-4-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-26RDMA/irdma: Split mr alloc and free into new functionsZhu Yanjun1-28/+46
In the function irdma_reg_user_mr, the mr allocation and free will be used by other functions. As such, the source codes related with mr allocation and free are split into the new functions. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230116193502.66540-3-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-26RDMA/irdma: Split MEM handler into irdma_reg_user_mr_type_memZhu Yanjun1-32/+50
The source codes related with IRDMA_MEMREG_TYPE_MEM are split into a new function irdma_reg_user_mr_type_mem. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230116193502.66540-2-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-04RDMA/irdma: Remove extra ret variable in favor of existing errZhu Yanjun1-3/+2
In the function irdma_reg_user_mr, err and ret exist. Actually, one variable err is enough. Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230104064333.660344-1-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-22RDMA/irdma: Initialize net_type before checking itMustafa Ismail1-0/+1
The av->net_type is not initialized before it is checked in irdma_modify_qp_roce. This leads to an incorrect update to the ARP cache and QP context. RoCEv2 connections might fail as result. Set the net_type using rdma_gid_attr_network_type. Fixes: 80005c43d4c8 ("RDMA/irdma: Use net_type to check network type") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221122004410.1471-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/irdma: Do not request 2-level PBLEs for CQ allocMustafa Ismail1-10/+5
When allocating PBLE's for a large CQ, it is possible that a 2-level PBLE is returned which would cause the CQ allocation to fail since 1-level is assumed and checked for. Fix this by requesting a level one PBLE only. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221115011701.1379-4-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/irdma: Fix RQ completion opcodeMustafa Ismail5-45/+71
The opcode written by HW, in the RQ CQE, is the RoCEv2/iWARP protocol opcode from the received packet and not the SW opcode as currently assumed. Fix this by returning the raw operation type and queue type in the CQE to irdma_process_cqe and add 2 helpers set_ib_wc_op_sq set_ib_wc_op_rq to map IRDMA HW op types to IB op types. Note that for iWARP, only Write with Immediate is supported so the opcode can only be IB_WC_RECV_RDMA_WITH_IMM when there is immediate data present. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221115011701.1379-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/irdma: Fix inline for multiple SGE'sMustafa Ismail3-104/+119
Currently, inline send and inline write assume a single SGE and only copy data from the first one. Add support for multiple SGE's. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221115011701.1379-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-07RDMA/irdma: Report the correct link speedShiraz Saleem1-32/+3
The active link speed is currently hard-coded in irdma_query_port due to which the port rate in ibstatus does reflect the active link speed. Call ib_get_eth_speed in irdma_query_port to get the active link speed. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Reported-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221104234957.1135-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-06Merge tag 'v6.0' into rdma.git for-nextJason Gunthorpe3-11/+24
Trvial merge conflicts against rdma.git for-rc resolved matching linux-next: drivers/infiniband/hw/hns/hns_roce_hw_v2.c drivers/infiniband/hw/hns/hns_roce_main.c https://lore.kernel.org/r/20220929124005.105149-1-broonie@kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-20RDMA/irdma: Validate udata inlen and outlenShiraz Saleem1-7/+60
Currently ib_copy_from_udata and ib_copy_to_udata could underfill the request and response buffer if the user-space passes an undersized value for udata->inlen or udata->outlen respectively [1] This could lead to undesirable behavior. Zero initing the buffer only goes as far as preventing using the buffer uninitialized. Validate udata->inlen and udata->outlen passed from user-space to ensure they are at least the required minimum size. [1] https://lore.kernel.org/linux-rdma/MWHPR11MB0029F37D40D9D4A993F8F549E9D79@MWHPR11MB0029.namprd11.prod.outlook.com/ Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20220907191324.1173-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20RDMA/irdma: Align AE id codes to correct flush code and eventSindhu-Devale6-21/+38
A number of asynchronous event (AE) ids were not aligned to the correct flush_code and event_type. Fix these up so that the correct IBV error and event codes are returned to application. Also, add handling for new AE ids like IRDMA_AE_INVALID_REQUEST to return the correct WC error code. Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Sindhu-Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20220907191324.1173-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-07RDMA/irdma: Report RNR NAK generation in device capsSindhu-Devale1-1/+4
Report RNR NAK generation when device capabilities are queried Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Sindhu-Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20220906223244.1119-6-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-07RDMA/irdma: Use s/g array in post send only when its validSindhu-Devale1-1/+2
Send with invalidate verb call can pass in an uninitialized s/g array with 0 sge's which is filled into irdma WQE and causes a HW asynchronous event. Fix this by using the s/g array in irdma post send only when its valid. Fixes: 551c46e ("RDMA/irdma: Add user/kernel shared libraries") Signed-off-by: Sindhu-Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20220906223244.1119-5-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-07RDMA/irdma: Return correct WC error for bind operation failureSindhu-Devale1-1/+3
When a QP and a MR on a local host are in different PDs, the HW generates an asynchronous event (AE). The same AE is generated when a QP and a MW are in different PDs during a bind operation. Return the more appropriate IBV_WC_MW_BIND_ERR for the latter case by checking the OP type from the CQE in error. Fixes: 551c46edc769 ("RDMA/irdma: Add user/kernel shared libraries") Signed-off-by: Sindhu-Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20220906223244.1119-4-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-07RDMA/irdma: Return error on MR deregister CQP failureSindhu-Devale2-6/+13
The MR deregister CQP can fail if an MW is bound to it. Return an appropriate error for this case. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Sindhu-Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20220906223244.1119-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-07RDMA/irdma: Report the correct max cqes from query deviceSindhu-Devale1-1/+1
Report the correct max cqes available to an application taking into account a reserved entry to detect overflow. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Sindhu-Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20220906223244.1119-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-28RDMA/irdma: Fix drain SQ hang with no completionShiraz Saleem1-1/+1
SW generated completions for outstanding WRs posted on SQ after QP is in error target the wrong CQ. This causes the ib_drain_sq to hang with no completion. Fix this to generate completions on the right CQ. [ 863.969340] INFO: task kworker/u52:2:671 blocked for more than 122 seconds. [ 863.979224] Not tainted 5.14.0-130.el9.x86_64 #1 [ 863.986588] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 863.996997] task:kworker/u52:2 state:D stack: 0 pid: 671 ppid: 2 flags:0x00004000 [ 864.007272] Workqueue: xprtiod xprt_autoclose [sunrpc] [ 864.014056] Call Trace: [ 864.017575] __schedule+0x206/0x580 [ 864.022296] schedule+0x43/0xa0 [ 864.026736] schedule_timeout+0x115/0x150 [ 864.032185] __wait_for_common+0x93/0x1d0 [ 864.037717] ? usleep_range_state+0x90/0x90 [ 864.043368] __ib_drain_sq+0xf6/0x170 [ib_core] [ 864.049371] ? __rdma_block_iter_next+0x80/0x80 [ib_core] [ 864.056240] ib_drain_sq+0x66/0x70 [ib_core] [ 864.062003] rpcrdma_xprt_disconnect+0x82/0x3b0 [rpcrdma] [ 864.069365] ? xprt_prepare_transmit+0x5d/0xc0 [sunrpc] [ 864.076386] xprt_rdma_close+0xe/0x30 [rpcrdma] [ 864.082593] xprt_autoclose+0x52/0x100 [sunrpc] [ 864.088718] process_one_work+0x1e8/0x3c0 [ 864.094170] worker_thread+0x50/0x3b0 [ 864.099109] ? rescuer_thread+0x370/0x370 [ 864.104473] kthread+0x149/0x170 [ 864.109022] ? set_kthread_struct+0x40/0x40 [ 864.114713] ret_from_fork+0x22/0x30 Fixes: 81091d7696ae ("RDMA/irdma: Add SW mechanism to generate completions on error") Link: https://lore.kernel.org/r/20220824154358.117-1-shiraz.saleem@intel.com Reported-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds6-35/+36
Pull rdma updates from Jason Gunthorpe: "This cycle we got a new RDMA driver "ERDMA" for the Alibaba cloud environment. Otherwise the changes are dominated by rxe fixes. There is another RDMA driver on the list that might get merged next cycle, 'MANA' for the Azure cloud environment. Summary: - Bug fixes and small features for irdma, hns, siw, qedr, hfi1, mlx5 - General spelling/grammer fixes - rdma cm can follow changes in neighbours for control packets - Significant amounts of rxe fixes and spec compliance changes - Use the modern NAPI API - Use the bitmap API instead of open coding - Performance improvements for rtrs - Add the ERDMA driver for Alibaba cloud - Fix a use after free bug in SRP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA/ib_srpt: Unify checking rdma_cm_id condition in srpt_cm_req_recv() RDMA/rxe: Fix error unwind in rxe_create_qp() RDMA/mlx5: Add missing check for return value in get namespace flow RDMA/rxe: Split qp state for requester and completer RDMA/rxe: Generate error completion for error requester QP state RDMA/rxe: Update wqe_index for each wqe error completion RDMA/srpt: Fix a use-after-free RDMA/srpt: Introduce a reference count in struct srpt_device RDMA/srpt: Duplicate port name members IB/qib: Fix repeated "in" within comments RDMA/erdma: Add driver to kernel build environment RDMA/erdma: Add the ABI definitions RDMA/erdma: Add the erdma module RDMA/erdma: Add connection management (CM) support RDMA/erdma: Add verbs implementation RDMA/erdma: Add verbs header file RDMA/erdma: Add event queue implementation RDMA/erdma: Add cmdq implementation RDMA/erdma: Add main include file RDMA/erdma: Add the hardware related definitions ...
2022-07-18RDMA/irdma: Use the bitmap API to allocate bitmapsChristophe JAILLET1-5/+4
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Link: https://lore.kernel.org/r/1f671b1af5881723ee265a0a12809c92950e58aa.1657567269.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Fix setting of QP context err_rq_idx_valid fieldMustafa Ismail1-11/+4
Setting err_rq_idx_valid field in QP context when the AE source of the AEQE is not associated with an RQ causes the firmware flush to fail. Set err_rq_idx_valid field in QP context only if it is associated with an RQ. Additionally, cleanup the redundant setting of this field in irdma_process_aeq. Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20220705230815.265-8-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Fix VLAN connection with wildcard addressMustafa Ismail1-5/+6
When an application listens on a wildcard address, and there are VLAN and non-VLAN IP addresses, iWARP connection establishemnt can fail if the listen node VLAN ID does not match. Fix this by checking the vlan_id only if not a wildcard listen node. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Link: https://lore.kernel.org/r/20220705230815.265-7-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Fix a window for use-after-freeMustafa Ismail1-1/+1
During a destroy CQ an interrupt may cause processing of a CQE after CQ resources are freed by irdma_cq_free_rsrc(). Fix this by moving the call to irdma_cq_free_rsrc() after the irdma_sc_cleanup_ceqes(), which is called under the cq_lock. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20220705230815.265-6-shiraz.saleem@intel.com Signed-off-by: Bartosz Sobczak <bartosz.sobczak@intel.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Make resource distribution algorithm more QP orientedNayan Kumar2-7/+6
Adapt the resource distribution algorithm in irdma_cfg_fpm_val to be more QP oriented. If the configuration is too big for the available memory, trim the MR and PBLE's first before trimming the QPs. This also avoids having to double QPs requested as input to algorithm for GEN1 devices. Link: https://lore.kernel.org/r/20220705230815.265-5-shiraz.saleem@intel.com Signed-off-by: Nayan Kumar <nayan.kumar@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Make CQP invalid state error non-criticalMustafa Ismail1-0/+1
The invalid state error returned by the Control Queue-Pair (CQP) is not a critical error. Add it to the irdma_noncrit_err_list and drop reporting it as device error message. Link: https://lore.kernel.org/r/20220705230815.265-4-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Add AE source to error logMustafa Ismail1-2/+2
To assist with debugging add the Asynchronous Event (AE) source when logging the abnormal AE error log message. Link: https://lore.kernel.org/r/20220705230815.265-3-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Add 2 level PBLE support for FMRMustafa Ismail2-4/+12
Level 2 Physical Buffer List Entry (PBLE) is currently not supported for Fast MRs which limits memory registrations to 256K pages. Adapt irdma_set_page and irdma_alloc_mr to allow for 2 level PBLEs. Link: https://lore.kernel.org/r/20220705230815.265-2-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-11RDMA/irdma: Fix sleep from invalid context BUGMustafa Ismail1-50/+0
Taking the qos_mutex to process RoCEv2 QP's on netdev events causes a kernel splat. Fix this by removing the handling for RoCEv2 in irdma_cm_teardown_connections that uses the mutex. This handling is only needed for iWARP to avoid having connections established while the link is down or having connections remain functional after the IP address is removed. BUG: sleeping function called from invalid context at kernel/locking/mutex. Call Trace: kernel: dump_stack+0x66/0x90 kernel: ___might_sleep.cold.92+0x8d/0x9a kernel: mutex_lock+0x1c/0x40 kernel: irdma_cm_teardown_connections+0x28e/0x4d0 [irdma] kernel: ? check_preempt_curr+0x7a/0x90 kernel: ? select_idle_sibling+0x22/0x3c0 kernel: ? select_task_rq_fair+0x94c/0xc90 kernel: ? irdma_exec_cqp_cmd+0xc27/0x17c0 [irdma] kernel: ? __wake_up_common+0x7a/0x190 kernel: irdma_if_notify+0x3cc/0x450 [irdma] kernel: ? sched_clock_cpu+0xc/0xb0 kernel: irdma_inet6addr_event+0xc6/0x150 [irdma] Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-11RDMA/irdma: Do not advertise 1GB page size for x722Mustafa Ismail4-2/+5
x722 does not support 1GB page size but the irdma driver incorrectly advertises 1GB page size support for x722 device to ib_core to compute the best page size to use on this MR. This could lead to incorrect start offsets computed by hardware on the MR. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24Merge tag 'v5.18' into rdma.git for-nextJason Gunthorpe3-37/+21
Following patches have dependencies. Resolve the merge conflict in drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names for the fs functions following linux-next: https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-11RDMA/irdma: Add SW mechanism to generate completions on errorMustafa Ismail4-37/+210
HW flushes after QP in error state is not reliable. This can lead to application hang waiting on a completion for outstanding WRs. Implement a SW mechanism to generate completions for any outstanding WR's after the QP is modified to error. This is accomplished by starting a delayed worker after the QP is modified to error and the HW flush is performed. The worker will generate completions that will be returned to the application when it polls the CQ. This mechanism only applies to Kernel applications. Link: https://lore.kernel.org/r/20220425181624.1617-1-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-02RDMA/irdma: Fix possible crash due to NULL netdev in notifierMustafa Ismail1-12/+9
For some net events in irdma_net_event notifier, the netdev can be NULL which will cause a crash in rdma_vlan_dev_real_dev. Fix this by moving all processing to the NETEVENT_NEIGH_UPDATE case where the netdev is guaranteed to not be NULL. Fixes: 6702bc147448 ("RDMA/irdma: Fix netdev notifications for vlan's") Link: https://lore.kernel.org/r/20220425181703.1634-4-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-02RDMA/irdma: Reduce iWARP QP destroy timeShiraz Saleem1-6/+4
QP destroy is synchronous and waits for its refcnt to be decremented in irdma_cm_node_free_cb (for iWARP) which fires after the RCU grace period elapses. Applications running a large number of connections are exposed to high wait times on destroy QP for events like SIGABORT. The long pole for this wait time is the firing of the call_rcu callback during a CM node destroy which can be slow. It holds the QP reference count and blocks the destroy QP from completing. call_rcu only needs to make sure that list walkers have a reference to the cm_node object before freeing it and thus need to wait for grace period elapse. The rest of the connection teardown in irdma_cm_node_free_cb is moved out of the grace period wait in irdma_destroy_connection. Also, replace call_rcu with a simple kfree_rcu as it just needs to do a kfree on the cm_node Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Link: https://lore.kernel.org/r/20220425181703.1634-3-shiraz.saleem@intel.com Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-02RDMA/irdma: Flush iWARP QP if modified to ERR from RTR stateTatyana Nikolova2-13/+7
When connection establishment fails in iWARP mode, an app can drain the QPs and hang because flush isn't issued when the QP is modified from RTR state to error. Issue a flush in this case using function irdma_cm_disconn(). Update irdma_cm_disconn() to do flush when cm_id is NULL, which is the case when the QP is in RTR state and there is an error in the connection establishment. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20220425181703.1634-2-shiraz.saleem@intel.com Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-19RDMA/irdma: Fix deadlock in irdma_cleanup_cm_core()Duoming Zhou1-6/+1
There is a deadlock in irdma_cleanup_cm_core(), which is shown below: (Thread 1) | (Thread 2) | irdma_schedule_cm_timer() irdma_cleanup_cm_core() | add_timer() spin_lock_irqsave() //(1) | (wait a time) ... | irdma_cm_timer_tick() del_timer_sync() | spin_lock_irqsave() //(2) (wait timer to stop) | ... We hold cm_core->ht_lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need cm_core->ht_lock in position (2) of thread 2. As a result, irdma_cleanup_cm_core() will block forever. This patch removes the check of timer_pending() in irdma_cleanup_cm_core(), because the del_timer_sync() function will just return directly if there isn't a pending timer. As a result, the lock is redundant, because there is no resource it could protect. Link: https://lore.kernel.org/r/20220418153322.42524-1-duoming@zju.edu.cn Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-06RDMA: Split kernel-only global device caps from uverbs device capsJason Gunthorpe3-6/+3
Split out flags from ib_device::device_cap_flags that are only used internally to the kernel into kernel_cap_flags that is not part of the uapi. This limits the device_cap_flags to being the same bitmap that will be copied to userspace. This cleanly splits out the uverbs flags from the kernel flags to avoid confusion in the flags bitmap. Add some short comments describing which each of the kernel flags is connected to. Remove unused kernel flags. Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-04RDMA/irdma: Remove the redundant variableZhu Yanjun1-5/+2
In the function irdma_puda_get_next_send_wqe, the variable wqe is not necessary. So remove it. Link: https://lore.kernel.org/r/20220323230135.291813-1-yanjun.zhu@intel.com Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>