Age | Commit message (Collapse) | Author | Files | Lines |
|
Rename existing HCA capability relaxed_ordering_read to
relaxed_ordering_read_pci_enabled. This is in accordance with recent PRM
change to better describe the capability, as it's set only if both the
device supports relaxed ordering (RO) read and RO is enabled in PCI
config space.
In addition, add new HCA capability relaxed_ordering_read which is set
if the device supports RO read, regardless of RO in PCI config space.
This will be used in the following patch to allow RO in VFs and VMs.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Link: https://lore.kernel.org/r/caa0002fd8135086357dfcc368e2f5cc73b08480.1681131553.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
relaxed_ordering_read HCA capability is set if both the device supports
relaxed ordering (RO) read and RO is set in PCI config space.
RO in PCI config space can change during runtime. This will change the
value of relaxed_ordering_read HCA capability in FW, but the driver will
not see it since it queries the capabilities only once.
This can lead to the following scenario:
1. RO in PCI config space is enabled.
2. User creates MKey without RO.
3. RO in PCI config space is disabled.
As a result, relaxed_ordering_read HCA capability is turned off in FW
but remains on in driver copy of the capabilities.
4. User requests to reconfig the MKey with RO via UMR.
5. Driver will try to reconfig the MKey with RO read although it
shouldn't (as relaxed_ordering_read HCA capability is really off).
To fix this, check pcie_relaxed_ordering_enabled() before setting RO
read in UMR.
Fixes: 896ec9735336 ("RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Link: https://lore.kernel.org/r/8d39eb8317e7bed1a354311a20ae707788fd94ed.1681131553.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
pcie_relaxed_ordering_enabled() check was added to avoid a syndrome when
creating a MKey with relaxed ordering (RO) enabled when the driver's
relaxed_ordering_{read,write} HCA capabilities are out of sync with FW.
While this can happen with relaxed_ordering_read, it can't happen with
relaxed_ordering_write as it's set if the device supports RO write,
regardless of RO in PCI config space, and thus can't change during
runtime.
Therefore, drop the pcie_relaxed_ordering_enabled() check for
relaxed_ordering_write while keeping it for relaxed_ordering_read.
Doing so will also allow the usage of RO write in VFs and VMs (where RO
in PCI config space is not reported/emulated properly).
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Link: https://lore.kernel.org/r/7e8f55e31572c1702d69cae015a395d3a824a38a.1681131553.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Make it clearer what is going on by adding a function to go back from the
"virtual" dma_addr to a kva and another to a struct page. This is used in the
ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib).
Call them instead of a naked casting and virt_to_page() when working with dma_addr
values encoded by the various ib_map functions.
This also fixes the virt_to_page() casting problem Linus Walleij has been
chasing.
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In the function rxe_create_qp(), rxe_qp_from_init() is called to
initialize qp, internally things like rxe_init_task are not setup until
rxe_qp_init_req().
If an error occurred before this point then the unwind will call
rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
which will oops when trying to access the uninitialized spinlock.
If rxe_init_task is not executed, rxe_cleanup_task will not be called.
Reported-by: syzbot+cfcc1a3c85be15a40cba@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=fd85757b74b3eb59f904138486f755f71e090df8
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Fixes: 2d4b21e0a291 ("IB/rxe: Prevent from completer to operate on non valid QP")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20230413101115.1366068-1-yanjun.zhu@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
There is no need to zero 'pktsize' bytes of 'buf', only the header needs
to be cleared, to be safe.
All the other bytes are already written with some memcpy() at the end of
the function.
Doing so also gives the opportunity to the compiler to avoid the memset()
call. It can be inlined now that the length is known as compile time.
Link: https://lore.kernel.org/r/098e3c397be0436f1867899245ecfe656c472110.1675369386.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
In a previous patch TASKLET_STATE_SCHED was used as a mask but it is a bit
position instead. Add the missing shift.
Link: https://lore.kernel.org/r/20230329193308.7489-1-rpearsonhpe@gmail.com
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/linux-rdma/8a054b78-6d50-4bc6-8d8a-83f85fbdb82f@kili.mountain/
Fixes: d94671632572 ("RDMA/rxe: Rewrite rxe_task.c")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Place struct mmu_rb_handler on cache line start like so:
struct mmu_rb_handler *h;
void *free_ptr;
int ret;
free_ptr = kzalloc(sizeof(*h) + cache_line_size() - 1, GFP_KERNEL);
if (!free_ptr)
return -ENOMEM;
h = PTR_ALIGN(free_ptr, cache_line_size());
Additionally, move struct mmu_rb_handler fields "root" and "ops_args" to
start after the next cacheline using the "____cacheline_aligned_in_smp"
annotation.
Allocating an additional cache_line_size() - 1 bytes to place
struct mmu_rb_handler on a cache line start does increase memory
consumption.
However, few struct mmu_rb_handler are created when hfi1 is in use.
As mmu_rb_handler->root and mmu_rb_handler->ops_args are accessed
frequently, the advantage of having them both within a cache line is
expected to outweigh the disadvantage of the additional memory
consumption per struct mmu_rb_handler.
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088636963.3027109.16959757980497822530.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
hfi1 user SDMA request processing has two bugs that can cause data
corruption for user SDMA requests that have multiple payload iovecs
where an iovec other than the tail iovec does not run up to the page
boundary for the buffer pointed to by that iovec.a
Here are the specific bugs:
1. user_sdma_txadd() does not use struct user_sdma_iovec->iov.iov_len.
Rather, user_sdma_txadd() will add up to PAGE_SIZE bytes from iovec
to the packet, even if some of those bytes are past
iovec->iov.iov_len and are thus not intended to be in the packet.
2. user_sdma_txadd() and user_sdma_send_pkts() fail to advance to the
next iovec in user_sdma_request->iovs when the current iovec
is not PAGE_SIZE and does not contain enough data to complete the
packet. The transmitted packet will contain the wrong data from the
iovec pages.
This has not been an issue with SDMA packets from hfi1 Verbs or PSM2
because they only produce iovecs that end short of PAGE_SIZE as the tail
iovec of an SDMA request.
Fixing these bugs exposes other bugs with the SDMA pin cache
(struct mmu_rb_handler) that get in way of supporting user SDMA requests
with multiple payload iovecs whose buffers do not end at PAGE_SIZE. So
this commit fixes those issues as well.
Here are the mmu_rb_handler bugs that non-PAGE_SIZE-end multi-iovec
payload user SDMA requests can hit:
1. Overlapping memory ranges in mmu_rb_handler will result in duplicate
pinnings.
2. When extending an existing mmu_rb_handler entry (struct mmu_rb_node),
the mmu_rb code (1) removes the existing entry under a lock, (2)
releases that lock, pins the new pages, (3) then reacquires the lock
to insert the extended mmu_rb_node.
If someone else comes in and inserts an overlapping entry between (2)
and (3), insert in (3) will fail.
The failure path code in this case unpins _all_ pages in either the
original mmu_rb_node or the new mmu_rb_node that was inserted between
(2) and (3).
3. In hfi1_mmu_rb_remove_unless_exact(), mmu_rb_node->refcount is
incremented outside of mmu_rb_handler->lock. As a result, mmu_rb_node
could be evicted by another thread that gets mmu_rb_handler->lock and
checks mmu_rb_node->refcount before mmu_rb_node->refcount is
incremented.
4. Related to #2 above, SDMA request submission failure path does not
check mmu_rb_node->refcount before freeing mmu_rb_node object.
If there are other SDMA requests in progress whose iovecs have
pointers to the now-freed mmu_rb_node(s), those pointers to the
now-freed mmu_rb nodes will be dereferenced when those SDMA requests
complete.
Fixes: 7be85676f1d1 ("IB/hfi1: Don't remove RB entry when not needed.")
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088636445.3027109.10054635277810177889.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
hfi1_mmu_rb_remove_unless_exact() did not move mmu_rb_node objects in
mmu_rb_handler->lru_list after getting a cache hit on an mmu_rb_node.
As a result, hfi1_mmu_rb_evict() was not guaranteed to evict truly
least-recently used nodes.
This could be a performance issue for an application when that
application:
- Uses some long-lived buffers frequently.
- Uses a large number of buffers once.
- Hits the mmu_rb_handler cache size or pinned-page limits, forcing
mmu_rb_handler cache entries to be evicted.
In this case, the one-time use buffers cause the long-lived buffer
entries to eventually filter to the end of the LRU list where
hfi1_mmu_rb_evict() will consider evicting a frequently-used long-lived
entry instead of evicting one of the one-time use entries.
Fix this by inserting new mmu_rb_node at the tail of
mmu_rb_handler->lru_list and move mmu_rb_ndoe to the tail of
mmu_rb_handler->lru_list when the mmu_rb_node is a hit in
hfi1_mmu_rb_remove_unless_exact(). Change hfi1_mmu_rb_evict() to evict
from the head of mmu_rb_handler->lru_list instead of the tail.
Fixes: 0636e9ab8355 ("IB/hfi1: Add cache evict LRU list")
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088635931.3027109.10423156330761536044.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
These warnings can cause build failure:
In file included from ./include/trace/define_trace.h:102,
from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
from drivers/infiniband/hw/hfi1/trace.h:15,
from drivers/infiniband/hw/hfi1/trace.c:6:
drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_get_offsets_hfi1_trace_template’:
./include/trace/trace_events.h:261:9: warning: function ‘trace_event_get_offsets_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
struct trace_event_raw_##call __maybe_unused *entry; \
^~~~~~~~~~~~~~~~
drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
DECLARE_EVENT_CLASS(hfi1_trace_template,
^~~~~~~~~~~~~~~~~~~
In file included from ./include/trace/define_trace.h:102,
from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
from drivers/infiniband/hw/hfi1/trace.h:15,
from drivers/infiniband/hw/hfi1/trace.c:6:
drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_raw_event_hfi1_trace_template’:
./include/trace/trace_events.h:386:9: warning: function ‘trace_event_raw_event_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
struct trace_event_raw_##call *entry; \
^~~~~~~~~~~~~~~~
drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
DECLARE_EVENT_CLASS(hfi1_trace_template,
^~~~~~~~~~~~~~~~~~~
In file included from ./include/trace/define_trace.h:103,
from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
from drivers/infiniband/hw/hfi1/trace.h:15,
from drivers/infiniband/hw/hfi1/trace.c:6:
drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘perf_trace_hfi1_trace_template’:
./include/trace/perf.h:70:9: warning: function ‘perf_trace_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
struct hlist_head *head; \
^~~~~~~~~~
drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
DECLARE_EVENT_CLASS(hfi1_trace_template,
^~~~~~~~~~~~~~~~~~~
Solution adapted here is similar to the one in fbbc95a49d5b0
Signed-off-by: Ehab Ababneh <ehab.ababneh@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088635415.3027109.5711716700328939402.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The hfi1_cdbg trace mechanism appends a newline. Remove trailing
newlines from all format strings.
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088634897.3027109.10401662436950683555.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When unregistering MAD agent, srpt module has a non-null check
for 'mad_agent' pointer before invoking ib_unregister_mad_agent().
This check can pass if 'mad_agent' variable holds an error value.
The 'mad_agent' can have an error value for a short window when
srpt_add_one() and srpt_remove_one() is executed simultaneously.
In srpt module, added a valid pointer check for 'sport->mad_agent'
before unregistering MAD agent.
This issue can hit when RoCE driver unregisters ib_device
Stack Trace:
------------
BUG: kernel NULL pointer dereference, address: 000000000000004d
PGD 145003067 P4D 145003067 PUD 2324fe067 PMD 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 10 PID: 4459 Comm: kworker/u80:0 Kdump: loaded Tainted: P
Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.5.4 01/13/2020
Workqueue: bnxt_re bnxt_re_task [bnxt_re]
RIP: 0010:_raw_spin_lock_irqsave+0x19/0x40
Call Trace:
ib_unregister_mad_agent+0x46/0x2f0 [ib_core]
IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
? __schedule+0x20b/0x560
srpt_unregister_mad_agent+0x93/0xd0 [ib_srpt]
srpt_remove_one+0x20/0x150 [ib_srpt]
remove_client_context+0x88/0xd0 [ib_core]
bond0: (slave p2p1): link status definitely up, 100000 Mbps full duplex
disable_device+0x8a/0x160 [ib_core]
bond0: active interface up!
? kernfs_name_hash+0x12/0x80
(NULL device *): Bonding Info Received: rdev: 000000006c0b8247
__ib_unregister_device+0x42/0xb0 [ib_core]
(NULL device *): Master: mode: 4 num_slaves:2
ib_unregister_device+0x22/0x30 [ib_core]
(NULL device *): Slave: id: 105069936 name:p2p1 link:0 state:0
bnxt_re_stopqps_and_ib_uninit+0x83/0x90 [bnxt_re]
bnxt_re_alloc_lag+0x12e/0x4e0 [bnxt_re]
Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Link: https://lore.kernel.org/r/20230406042549.507328-1-saravanan.vajravel@broadcom.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Trace icm_send_rej event before the cm state is reset to idle, so that
correct cm state will be logged. For example when an incoming request is
rejected, the old trace log was:
icm_send_rej: local_id=961102742 remote_id=3829151631 state=IDLE reason=REJ_CONSUMER_DEFINED
With this patch:
icm_send_rej: local_id=312971016 remote_id=3778819983 state=MRA_REQ_SENT reason=REJ_CONSUMER_DEFINED
Fixes: 8dc105befe16 ("RDMA/cm: Add tracepoints to track MAD send operations")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/20230330072351.481200-1-markzhang@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Enable Congesion control by default. Issue FW command
enable the CC during driver load and disable it during
unload.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-8-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Use the new TLV APIs for existing slow path commands. The TLV
APIs will be used to populate extended headers for some of the
Firmware commands, which will be introduced in the patches that
follow.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-7-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Header file to support TLV encapsulated commands. These
functions will be used by the driver in the follow up patches.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-6-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Reducing the number of arguments to bnxt_qplib_rcfw_send_message
by enclosing all its arguments into a command message structure.
Use the same struct while passing the command information to
send_message.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-5-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Convert RCFW_CMD_PREP macro to static inline function.
Also, remove the cmd_flags passed as none of the functions
are using it.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
bnxt_en driver does the queue mapping for RoCE traffic. Removing the
queue mapping from RoCE driver.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Updating the HW structures to the latest version.
This is copied from the code maintained internally. No functionality
changes in this patch. Code is re-organized to match the file maintained
in the internal tree. Also, New HW interface structures are added, which
will be used by the drivers in future.
CC: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
syzbot is reporting that siw_netdev_event(NETDEV_UNREGISTER) cannot destroy
siw_device created after unshare(CLONE_NEWNET) due to net namespace check.
It seems that this check was by error there and should be removed.
Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Fixes: bdcf26bf9b3a ("rdma/siw: network and RDMA core interface")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/a44e9ac5-44e2-d575-9e30-02483cc7ffd1@I-love.SAKURA.ne.jp
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The call netdev_{put, hold} of dev_{put, hold} will check NULL,
so there is no need to check before using dev_{put, hold},
remove it to silence the warnings:
./drivers/infiniband/core/cma.c:713:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.
./drivers/infiniband/core/cma.c:2433:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4668
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230331010633.63261-1-yang.lee@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
clang with W=1 reports
drivers/infiniband/hw/qib/qib_file_ops.c:487:20: error: variable
'cnt' set but not used [-Werror,-Wunused-but-set-variable]
u32 tid, ctxttid, cnt, limit, tidcnt;
^
drivers/infiniband/hw/qib/qib_file_ops.c:1771:9: error: variable
'cnt' set but not used [-Werror,-Wunused-but-set-variable]
int i, cnt = 0, maxtid = ctxt_tidbase + dd->rcvtidcnt;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230330235800.1845815-1-trix@redhat.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
clang with W=1 reports
drivers/infiniband/hw/mlx5/devx.c:1996:6: error: variable
'num_alloc_xa_entries' set but not used [-Werror,-Wunused-but-set-variable]
int num_alloc_xa_entries = 0;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230330153607.1838750-1-trix@redhat.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
This commit doesn't change any logic.
Reviewed-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20230330131333.37900-3-mgurtovoy@nvidia.com
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Move this common logic into iser_create_send_desc instead of duplicating
the code.
Reviewed-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20230330131333.37900-2-mgurtovoy@nvidia.com
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The removed macros are old leftovers.
Reviewed-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20230330131333.37900-1-mgurtovoy@nvidia.com
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
There is no need to print any debug messages after failure to
allocate memory, because kernel will print OOM dumps anyway.
Together with removal of these messages, remove useless goto jumps.
Fixes: 5bf944f24129 ("RDMA/rxe: Add error messages")
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/all/ea43486f-43dd-4054-b1d5-3a0d202be621@kili.mountain
Link: https://lore.kernel.org/r/d3cedf723b84e73e8062a67b7489d33802bafba2.1680113597.git.leon@kernel.org
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Remove the tasklet call in rxe_cq.c and also the is_dying in the
cq struct. There is no reason for the rxe driver to defer the call
to the cq completion handler by scheduling a tasklet. rxe_cq_post()
is not called in a hard irq context.
The rxe driver currently is incorrect because the tasklet call is
made without protecting the cq pointer with a reference from having
the underlying memory freed before the deferred routine is called.
Executing the comp_handler inline fixes this problem.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Link: https://lore.kernel.org/r/20230327215643.10410-1-rpearsonhpe@gmail.com
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
clang with W=1 reports
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1592:6: error: variable
'discard_cnt' set but not used [-Werror,-Wunused-but-set-variable]
int discard_cnt = 0;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230326120959.1351948-1-trix@redhat.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
clang with W=1 reports
drivers/infiniband/hw/bnxt_re/qplib_fp.c:303:6: error: variable
'num_srqne_processed' set but not used [-Werror,-Wunused-but-set-variable]
int num_srqne_processed = 0;
^
drivers/infiniband/hw/bnxt_re/qplib_fp.c:304:6: error: variable
'num_cqne_processed' set but not used [-Werror,-Wunused-but-set-variable]
int num_cqne_processed = 0;
^
These variables are not used so remove them.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230325140559.1336056-1-trix@redhat.com
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}
pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Link: https://lore.kernel.org/r/20230323115742.13836-1-cai.huoqing@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Previously for switchdev only per device counters were supported.
Currently we allocate counters for switchdev per port, which also
includes the ports that belong to VF representors in order to expose
them to users through the rdma tool, allowing the host to track the VFs
statistics through their representors counters.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/ea31e1103c125cd27931ba213f307cde30d2eaed.1679566038.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add resize_cq verb support for user space CQs. Resize operation for
kernel CQs are not supported now.
Driver should free the current CQ only after user library polls
for all the completions and switch to new CQ. So after the resize_cq
is returned from the driver, user library polls for existing completions
and store it as temporary data. Once library reaps all completions in the
current CQ, it invokes the ibv_cmd_poll_cq to inform the driver about
the resize_cq completion. Adding a check for user CQs in driver's
poll_cq and complete the resize operation for user CQs.
Updating uverbs_cmd_mask with poll_cq to support this.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1678868215-23626-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Hardware's page size is 4096, but the kernel's page size may vary. Driver
should use hardware's page size when communicating with hardware.
Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation")
Link: https://lore.kernel.org/r/20230307102924.70577-2-chengyou@linux.alibaba.com
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This patch is a major rewrite of the tasklet routines in rxe_task.c. The
main motivation for this is the realization that the code violates the
safety of the qp pointer by correct reference counting. When a tasklet is
scheduled from a verbs API the calling thread has a valid reference to the
qp and schedules the tasklet to run at a later time carrying a pointer to
the qp. Once the calling code returns however the qp can be destroyed at
any time. In order to correct this a reference to the qp must be taken
when the task is scheduled and held until it finishes running. This is
complicated by the tasklet library not alwys running a task that is
scheduled depending on whether someone else has scheduled it.
This patch moves the logic for deciding whether to run or schedule a task
outside of do_task() and guarantees that there is only one copy of the
task scheduled or running at a time.
Secondly the separate flags controlling teardown and draining of the task
are included in the task state machine and all references to the state are
protected by spinlocks to avoid consistency and memory barrier issues.
Link: https://lore.kernel.org/r/20230304174533.11296-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Replace rxe_run_task() by rxe_sched_task() when tasks call each other.
These are not performance critical and mainly involve error paths but they
run the risk of causing deadlocks.
Link: https://lore.kernel.org/r/20230304174533.11296-8-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The subroutine __rxe_do_task is not thread safe and it has no way to
guarantee that the tasks, which are designed with the assumption that they
are non-reentrant, are not reentered. All of its uses are non-performance
critical.
This patch replaces calls to __rxe_do_task with calls to
rxe_sched_task. It also removes irrelevant or unneeded if tests.
Instead of calling the task machinery a single call to the tasklet
function (rxe_requester, etc.) is sufficient to draing the queues if task
execution has been disabled or stopped.
Together these changes allow the removal of __rxe_do_task.
Link: https://lore.kernel.org/r/20230304174533.11296-7-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Currently each of the three tasklets requester, completer and responder in
the rxe driver take and release a reference to the qp argument at the
beginning and end of the subroutines. The caller passing in the qp
argument should be responsible for holding a reference to qp so these are
not required. Further doing so breaks the qp cleanup code in
rxe_qp_do_cleanup which calls these routines after all the references have
been dropped so they cannot drain the packet and work request queues as
intended.
In fact if these routines are deferred by calling tasklet_schedule there
is no guarantee that the calling code does have a qp reference. That is a
bug in rxe_task.c which will be fixed later in this series.
Link: https://lore.kernel.org/r/20230304174533.11296-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Cleanup the handling of qp in the error state, reset state and during
rxe_qp_do_cleanup. Make the same as rxe_resp.c
Link: https://lore.kernel.org/r/20230304174533.11296-5-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Cleanup the handling of qp in the error state, reset state and during
rxe_qp_do_cleanup. The error state does about the same thing as the others
but has code spread all over.
This patch combines them in a cleaner way.
Link: https://lore.kernel.org/r/20230304174533.11296-4-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Originally is was thought that the tasklet machinery in rxe_task.c would
be used in other applications but that has not happened for years. This
patch replaces the 'void *arg' by struct 'rxe_qp *qp' in the parameters to
the tasklet calls. This change will have no affect on performance but may
make the code a little clearer.
Link: https://lore.kernel.org/r/20230304174533.11296-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This patch adds error and debug messages so that every interaction
with rdma-core through a verbs API call or a completion error return
will generate at least one error message backed up by debug messages
with more detail.
With dynamic debugging one can follow up after seeing an error message
by turning on the appropriate debug messages.
Link: https://lore.kernel.org/r/20230303221623.8053-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Extend the dbg log messages (e.g. rxe_dbg_xxx) to include
err and info types. rxe.c is modified to use these new log
messages as examples.
Link: https://lore.kernel.org/r/20230303221623.8053-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Replace the name rxe_dbg with rxe_dbg_dev which better matches
the remaining rxe_dbg_xxx macros for debug messages with a
rxe device parameter. Reuse the name rxe_dbg for debug messages
which do not have a rxe device parameter.
Link: https://lore.kernel.org/r/20230303221623.8053-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
'exists' looks like a boolean. This patch replaces it by the
normal name used for the rxe device, 'rxe', which should be a
little less confusing. The second rxe_dbg() message is
incorrect since rxe is known to be NULL and this will cause a
seg fault if this message were ever sent. Replace it by pr_debug
for the moment.
Fixes: c6aba5ea0055 ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.c")
Link: https://lore.kernel.org/r/20230303221623.8053-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
GCC-13 (and Clang)[1] does not like to access a partially allocated
object, since it cannot reason about it for bounds checking.
In this case 140 bytes are allocated for an object of type struct
ib_umad_packet:
packet = kzalloc(sizeof(*packet) + IB_MGMT_RMPP_HDR, GFP_KERNEL);
However, notice that sizeof(*packet) is only 104 bytes:
struct ib_umad_packet {
struct ib_mad_send_buf * msg; /* 0 8 */
struct ib_mad_recv_wc * recv_wc; /* 8 8 */
struct list_head list; /* 16 16 */
int length; /* 32 4 */
/* XXX 4 bytes hole, try to pack */
struct ib_user_mad mad __attribute__((__aligned__(8))); /* 40 64 */
/* size: 104, cachelines: 2, members: 5 */
/* sum members: 100, holes: 1, sum holes: 4 */
/* forced alignments: 1, forced holes: 1, sum forced holes: 4 */
/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
and 36 bytes extra bytes are allocated for a flexible-array member in
struct ib_user_mad:
include/rdma/ib_mad.h:
120 enum {
...
123 IB_MGMT_RMPP_HDR = 36,
... }
struct ib_user_mad {
struct ib_user_mad_hdr hdr; /* 0 64 */
/* --- cacheline 1 boundary (64 bytes) --- */
__u64 data[] __attribute__((__aligned__(8))); /* 64 0 */
/* size: 64, cachelines: 1, members: 2 */
/* forced alignments: 1 */
} __attribute__((__aligned__(8)));
So we have sizeof(*packet) + IB_MGMT_RMPP_HDR == 140 bytes
Then the address of the flex-array member (for which only 36 bytes were
allocated) is casted and copied into a pointer to struct ib_rmpp_mad,
which, in turn, is of size 256 bytes:
rmpp_mad = (struct ib_rmpp_mad *) packet->mad.data;
struct ib_rmpp_mad {
struct ib_mad_hdr mad_hdr; /* 0 24 */
struct ib_rmpp_hdr rmpp_hdr; /* 24 12 */
u8 data[220]; /* 36 220 */
/* size: 256, cachelines: 4, members: 3 */
};
The thing is that those 36 bytes allocated for flex-array member data
in struct ib_user_mad onlly account for the size of both struct ib_mad_hdr
and struct ib_rmpp_hdr, but nothing is left for array u8 data[220].
So, the compiler is legitimately complaining about accessing an object
for which not enough memory was allocated.
Apparently, the only members of struct ib_rmpp_mad that are relevant
(that are actually being used) in function ib_umad_write() are mad_hdr
and rmpp_hdr. So, instead of casting packet->mad.data to
(struct ib_rmpp_mad *) create a new structure
struct ib_rmpp_mad_hdr {
struct ib_mad_hdr mad_hdr;
struct ib_rmpp_hdr rmpp_hdr;
} __packed;
and cast packet->mad.data to (struct ib_rmpp_mad_hdr *).
Notice that
IB_MGMT_RMPP_HDR == sizeof(struct ib_rmpp_mad_hdr) == 36 bytes
Refactor the rest of the code, accordingly.
Fix the following warnings seen under GCC-13 and -Warray-bounds:
drivers/infiniband/core/user_mad.c:564:50: warning: array subscript ‘struct ib_rmpp_mad[0]’ is partly outside array bounds of ‘unsigned char[140]’ [-Warray-bounds=]
drivers/infiniband/core/user_mad.c:566:42: warning: array subscript ‘struct ib_rmpp_mad[0]’ is partly outside array bounds of ‘unsigned char[140]’ [-Warray-bounds=]
drivers/infiniband/core/user_mad.c:618:25: warning: array subscript ‘struct ib_rmpp_mad[0]’ is partly outside array bounds of ‘unsigned char[140]’ [-Warray-bounds=]
drivers/infiniband/core/user_mad.c:622:44: warning: array subscript ‘struct ib_rmpp_mad[0]’ is partly outside array bounds of ‘unsigned char[140]’ [-Warray-bounds=]
Link: https://github.com/KSPP/linux/issues/273
Link: https://godbolt.org/z/oYWaGM4Yb [1]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZBpB91qQcB10m3Fw@work
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|