| Age | Commit message (Collapse) | Author | Files | Lines |
|
phylink
The function allows for the configuration of a fixed link state for a given
phylink instance. This addition is particularly useful for network devices that
operate with a fixed link configuration, where the link parameters do not change
dynamically. By using `phylink_set_fixed_link()`, drivers can easily set up
the fixed link state during initialization or configuration changes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2024-08-29
HW-Managed Flow Steering in mlx5 driver
Yevgeny Kliteynik says:
=======================
1. Overview
-----------
ConnectX devices support packet matching, modification, and redirection.
This functionality is referred as Flow Steering.
To configure a steering rule, the rule is written to the device-owned
memory. This memory is accessed and cached by the device when processing
a packet.
The first implementation of Flow Steering was done in FW, and it is
referred in the mlx5 driver as Device-Managed Flow Steering (DMFS).
Later we introduced SW-managed Flow Steering (SWS or SMFS), where the
driver is writing directly to the device's configuration memory (ICM)
through RC QP using RDMA operations (RDMA-read and RDAM-write), thus
achieving higher rates of rule insertion/deletion.
Now we introduce a new flow steering implementation: HW-Managed Flow
Steering (HWS or HMFS).
In this new approach, the driver is configuring steering rules directly
to the HW using the WQs with a special new type of WQE. This way we can
reach higher rule insertion/deletion rate with much lower CPU utilization
compared to SWS.
The key benefits of HWS as opposed to SWS:
+ HW manages the steering decision tree
- HW calculates CRC for each entry
- HW handles tree hash collisions
- HW & FW manage objects refcount
+ HW keeps cache coherency:
- HW provides tree access locking and synchronization
- HW provides notification on completion
+ Insertion rate isn’t affected by background traffic
- Dedicated HW components that handle insertion
2. Performance
--------------
Measuring Connection Tracking with simple IPv4 flows w/o NAT, we
are able to get ~5 times more flows offloaded per second using HWS.
3. Configuration
----------------
The enablement of HWS mode in eswitch manager is done using the same
devlink param that is already used for switching between FW-managed
steering and SW-managed steering modes:
# devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs
4. Upstream Submission
----------------------
HWS support consists of 3 main components:
+ Steering:
- The lower layer that exposes HWS API to upper layers and implements
all the management of flow steering building blocks
+ FS-Core
- Implementation of fs_hws layer to enable fs_core to use HWS instead
of FW or SW steering
- Create HW steering action pools to utilize the ability of HWS to
share steering actions among different rules
- Add support for configuring HWS mode through devlink command,
similar to configuring SWS mode
+ Connection Tracking
- Implementation of CT support for HW steering
- Hooks up the CT ops for the new steering mode and uses the HWS API
to implement connection tracking.
Because of the large number of patches, we need to perform the submission
in several separate patch series. This series is the first submission that
lays the ground work for the next submissions, where an actual user of HWS
will be added.
5. Patches in this series
-------------------------
This patch series contains implementation of the first bullet from above.
=======================
* tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: HWS, added API and enabled HWS support
net/mlx5: HWS, added send engine and context handling
net/mlx5: HWS, added debug dump and internal headers
net/mlx5: HWS, added backward-compatible API handling
net/mlx5: HWS, added memory management handling
net/mlx5: HWS, added vport handling
net/mlx5: HWS, added modify header pattern and args handling
net/mlx5: HWS, added FW commands handling
net/mlx5: HWS, added matchers functionality
net/mlx5: HWS, added definers handling
net/mlx5: HWS, added rules handling
net/mlx5: HWS, added tables handling
net/mlx5: HWS, added actions handling
net/mlx5: Added missing definitions in preparation for HW Steering
net/mlx5: Added missing mlx5_ifc definition for HW Steering
====================
Link: https://patch.msgid.link/20240909181250.41596-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-09-10
1) Remove an unneeded WARN_ON on packet offload.
From Patrisious Haddad.
2) Add a copy from skb_seq_state to buffer function.
This is needed for the upcomming IPTFS patchset.
From Christian Hopps.
3) Spelling fix in xfrm.h.
From Simon Horman.
4) Speed up xfrm policy insertions.
From Florian Westphal.
5) Add and revert a patch to support xfrm interfaces
for packet offload. This patch was just half cooked.
6) Extend usage of the new xfrm_policy_is_dead_or_sk helper.
From Florian Westphal.
7) Update comments on sdb and xfrm_policy.
From Florian Westphal.
8) Fix a null pointer dereference in the new policy insertion
code From Florian Westphal.
9) Fix an uninitialized variable in the new policy insertion
code. From Nathan Chancellor.
* tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()
xfrm: policy: fix null dereference
Revert "xfrm: add SA information to the offloaded packet"
xfrm: minor update to sdb and xfrm_policy comments
xfrm: policy: use recently added helper in more places
xfrm: add SA information to the offloaded packet
xfrm: policy: remove remaining use of inexact list
xfrm: switch migrate to xfrm_policy_lookup_bytype
xfrm: policy: don't iterate inexact policies twice at insert time
selftests: add xfrm policy insertion speed test script
xfrm: Correct spelling in xfrm.h
net: add copy from skb_seq_state to buffer function
xfrm: Remove documentation WARN_ON to limit return values for offloaded SA
====================
Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive
path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter
out rx software timestamp report, especially after a process turns on
netstamp_needed_key which can time stamp every incoming skb.
Previously, we found out if an application starts first which turns on
netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE
could also get rx timestamp. Now we handle this case by introducing this
new flag without breaking users.
Quoting Willem to explain why we need the flag:
"why a process would want to request software timestamp reporting, but
not receive software timestamp generation. The only use I see is when
the application does request
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE."
Similarly, this new flag could also be used for hardware case where we
can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive
hardware receive timestamp.
Another thing about errqueue in this patch I have a few words to say:
In this case, we need to handle the egress path carefully, or else
reporting the tx timestamp will fail. Egress path and ingress path will
finally call sock_recv_timestamp(). We have to distinguish them.
Errqueue is a good indicator to reflect the flow direction.
Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240909015612.3856-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
By moving the fpe_cfg field to the stmmac_priv data, stmmac_fpe_cfg
becomes platform-data eventually, instead of a run-time config.
Suggested-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://patch.msgid.link/d9b3d7ecb308c5e39778a4c8ae9df288a2754379.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
dev_pick_tx_cpu_id() has been introduced with two users by
commit a4ea8a3dacc3 ("net: Add generic ndo_select_queue functions").
The use in AF_PACKET has been removed in 2019 by
commit b71b5837f871 ("packet: rework packet_pick_tx_queue() to use common code selection")
The other user was a Netlogic XLP driver, removed in 2021 by
commit 47ac6f567c28 ("staging: Remove Netlogic XLP network driver").
It's relatively unlikely that any modern driver will need an
.ndo_select_queue implementation which picks purely based on CPU ID
and skips XPS, delete dev_pick_tx_cpu_id()
Found by code inspection.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240906161059.715546-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As part of preparation for HWS, added missing definitions
in qp.h and fs_core.h:
- FS_FT_FDB_RX/TX table types that are used by HWS in addition
to an existing FS_FT_FDB
- MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE that is used by HWS to
require fence in WQE
Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add mlx5_ifc definitions that are required for HWS support.
Note that due to change in the mlx5_ifc_flow_table_context_bits
structure that now includes both SWS and HWS bits in a union,
this patch also includes small change in one of SWS files that
was required for compilation.
Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The ability to read the PHC (Physical Hardware Clock) alongside
multiple system clocks is currently dependent on the specific
hardware architecture. This limitation restricts the use of
PTP_SYS_OFFSET_PRECISE to certain hardware configurations.
The generic soultion which would work across all architectures
is to read the PHC along with the latency to perform PHC-read as
offered by PTP_SYS_OFFSET_EXTENDED which provides pre and post
timestamps. However, these timestamps are currently limited
to the CLOCK_REALTIME timebase. Since CLOCK_REALTIME is affected
by NTP (or similar time synchronization services), it can
experience significant jumps forward or backward. This hinders
the precise latency measurements that PTP_SYS_OFFSET_EXTENDED
is designed to provide.
This problem could be addressed by supporting MONOTONIC_RAW
timestamps within PTP_SYS_OFFSET_EXTENDED. Unlike CLOCK_REALTIME
or CLOCK_MONOTONIC, the MONOTONIC_RAW timebase is unaffected
by NTP adjustments.
This enhancement can be implemented by utilizing one of the three
reserved words within the PTP_SYS_OFFSET_EXTENDED struct to pass
the clock-id for timestamps. The current behavior aligns with
clock-id for CLOCK_REALTIME timebase (value of 0), ensuring
backward compatibility of the UAPI.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
Patch #1 adds ctnetlink support for kernel side filtering for
deletions, from Changliang Wu.
Patch #2 updates nft_counter support to Use u64_stats_t,
from Sebastian Andrzej Siewior.
Patch #3 uses kmemdup_array() in all xtables frontends,
from Yan Zhen.
Patch #4 is a oneliner to use ERR_CAST() in nf_conntrack instead
opencoded casting, from Shen Lichuan.
Patch #5 removes unused argument in nftables .validate interface,
from Florian Westphal.
Patch #6 is a oneliner to correct a typo in nftables kdoc,
from Simon Horman.
Patch #7 fixes missing kdoc in nftables, also from Simon.
Patch #8 updates nftables to handle timeout less than CONFIG_HZ.
Patch #9 rejects element expiration if timeout is zero,
otherwise it is silently ignored.
Patch #10 disallows element expiration larger than timeout.
Patch #11 removes unnecessary READ_ONCE annotation while mutex is held.
Patch #12 adds missing READ_ONCE/WRITE_ONCE annotation in dynset.
Patch #13 annotates data-races around element expiration.
Patch #14 allocates timeout and expiration in one single set element
extension, they are tighly couple, no reason to keep them
separated anymore.
Patch #15 updates nftables to interpret zero timeout element as never
times out. Note that it is already possible to declare sets
with elements that never time out but this generalizes to all
kind of set with timeouts.
Patch #16 supports for element timeout and expiration updates.
* tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: set element timeout update support
netfilter: nf_tables: zero timeout means element never times out
netfilter: nf_tables: consolidate timeout extension for elements
netfilter: nf_tables: annotate data-races around element expiration
netfilter: nft_dynset: annotate data-races around set timeout
netfilter: nf_tables: remove annotation to access set timeout while holding lock
netfilter: nf_tables: reject expiration higher than timeout
netfilter: nf_tables: reject element expiration with no timeout
netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire
netfilter: nf_tables: Add missing Kernel doc
netfilter: nf_tables: Correct spelling in nf_tables.h
netfilter: nf_tables: drop unused 3rd argument from validate callback ops
netfilter: conntrack: Convert to use ERR_CAST()
netfilter: Use kmemdup_array instead of kmemdup for multiple allocation
netfilter: nft_counter: Use u64_stats_t for statistic.
netfilter: ctnetlink: support CTA_FILTER for flush
====================
Link: https://patch.msgid.link/20240905232920.5481-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace ksz8830 with ksz88x3 for CHIP_ID definition and other
strings. This due to KSZ8830 not being an actual switch but the Chip
ID shared among KSZ8863/8873 switches, impossible to differentiate
from their Chip ID or Revision ID registers.
Now all KSZ*_CHIP_ID macros refer to actual, existing switches which
removes confusion.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/phy/phy_device.c
2560db6ede1a ("net: phy: Fix missing of_node_put() for leds")
1dce520abd46 ("net: phy: Use for_each_available_child_of_node_scoped()")
https://lore.kernel.org/20240904115823.74333648@canb.auug.org.au
Adjacent changes:
drivers/net/ethernet/xilinx/xilinx_axienet.h
drivers/net/ethernet/xilinx/xilinx_axienet_main.c
858430db28a5 ("net: xilinx: axienet: Fix race in axienet_stop")
76abb5d675c4 ("net: xilinx: axienet: Add statistics support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature")
napi_defer_irqs was added to net_device and napi_defer_irqs_count was
added to napi_struct, both as type int.
This value never goes below zero, so there is not reason for it to be a
signed int. Change the type for both from int to u32, and add an
overflow check to sysfs to limit the value to S32_MAX.
The limit of S32_MAX was chosen because the practical limit before this
patch was S32_MAX (anything larger was an overflow) and thus there are
no behavioral changes introduced. If the extra bit is needed in the
future, the limit can be raised.
Before this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
$ cat /sys/class/net/eth4/napi_defer_hard_irqs
-2147483647
After this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
bash: line 0: echo: write error: Numerical result out of range
Similarly, /sys/class/net/XXXXX/tx_queue_len is defined as unsigned:
include/linux/netdevice.h: unsigned int tx_queue_len;
And has an overflow check:
dev_change_tx_queue_len(..., unsigned long new_len):
if (new_len != (unsigned int)new_len)
return -ERANGE;
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240904153431.307932-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, bluetooth and wireless.
No known regressions at this point. Another calm week, but chances are
that has more to do with vacation season than the quality of our work.
Current release - new code bugs:
- smc: prevent NULL pointer dereference in txopt_get
- eth: ti: am65-cpsw: number of XDP-related fixes
Previous releases - regressions:
- Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
BREDR/LE", it breaks existing user space
- Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
later problems with suspend
- can: mcp251x: fix deadlock if an interrupt occurs during
mcp251x_open
- eth: r8152: fix the firmware communication error due to use of bulk
write
- ptp: ocp: fix serial port information export
- eth: igb: fix not clearing TimeSync interrupts for 82580
- Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo
Previous releases - always broken:
- eth: intel: fix crashes and bugs when reconfiguration and resets
happening in parallel
- wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()
Misc:
- docs: netdev: document guidance on cleanup.h"
* tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
ila: call nf_unregister_net_hooks() sooner
tools/net/ynl: fix cli.py --subscribe feature
MAINTAINERS: fix ptp ocp driver maintainers address
selftests: net: enable bind tests
net: dsa: vsc73xx: fix possible subblocks range of CAPT block
sched: sch_cake: fix bulk flow accounting logic for host fairness
docs: netdev: document guidance on cleanup.h
net: xilinx: axienet: Fix race in axienet_stop
net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN
r8152: fix the firmware doesn't work
fou: Fix null-ptr-deref in GRO.
bareudp: Fix device stats updates.
net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
bpf, net: Fix a potential race in do_sock_getsockopt()
net: dqs: Do not use extern for unused dql_group
sch/netem: fix use after free in netem_dequeue
usbnet: modern method to get random MAC
MAINTAINERS: wifi: cw1200: add net-cw1200.h
ice: do not bring the VSI up, if it was down before the XDP setup
ice: remove ICE_CFG_BUSY locking from AF_XDP code
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A fix from Doug Anderson for a missing stub, required to fix the build
for some newly added users of devm_regulator_bulk_get_const() in
!REGULATOR configurations"
* tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Stub devm_regulator_bulk_get_const() if !CONFIG_REGULATOR
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fix fromShuah Khan:
"One single fix to a use-after-free bug resulting from
kunit_driver_create() failing to copy the driver name leaving it on
the stack or freeing it"
* tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Device wrappers should also manage driver name
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
pull-request: wireless-next-2024-09-04
here's a pull request to net-next tree, more info below. Please let me know if
there are any problems.
====================
Conflicts:
drivers/net/wireless/ath/ath12k/hw.c
38055789d151 ("wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850")
8be12629b428 ("wifi: ath12k: restore ASPM for supported hardwares only")
https://lore.kernel.org/87msldyj97.fsf@kernel.org
Link: https://patch.msgid.link/20240904153205.64C11C4CEC2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RT_TOS() from include/uapi/linux/in_route.h is defined using
IPTOS_TOS_MASK from include/uapi/linux/ip.h. This is problematic for
files such as include/net/ip_fib.h that want to use RT_TOS() as without
including both header files kernel compilation fails:
In file included from ./include/net/ip_fib.h:25,
from ./include/net/route.h:27,
from ./include/net/lwtunnel.h:9,
from net/core/dst.c:24:
./include/net/ip_fib.h: In function ‘fib_dscp_masked_match’:
./include/uapi/linux/in_route.h:31:32: error: ‘IPTOS_TOS_MASK’ undeclared (first use in this function)
31 | #define RT_TOS(tos) ((tos)&IPTOS_TOS_MASK)
| ^~~~~~~~~~~~~~
./include/net/ip_fib.h:440:45: note: in expansion of macro ‘RT_TOS’
440 | return dscp == inet_dsfield_to_dscp(RT_TOS(fl4->flowi4_tos));
Therefore, cited commit changed linux/in_route.h to include linux/ip.h.
However, as reported by David, this breaks iproute2 compilation due
overlapping definitions between linux/ip.h and
/usr/include/netinet/ip.h:
In file included from ../include/uapi/linux/in_route.h:5,
from iproute.c:19:
../include/uapi/linux/ip.h:25:9: warning: "IPTOS_TOS" redefined
25 | #define IPTOS_TOS(tos) ((tos)&IPTOS_TOS_MASK)
| ^~~~~~~~~
In file included from iproute.c:17:
/usr/include/netinet/ip.h:222:9: note: this is the location of the previous definition
222 | #define IPTOS_TOS(tos) ((tos) & IPTOS_TOS_MASK)
Fix by changing include/net/ip_fib.h to include linux/ip.h. Note that
usage of RT_TOS() should not spread further in the kernel due to recent
work in this area.
Fixes: 1fa3314c14c6 ("ipv4: Centralize TOS matching")
Reported-by: David Ahern <dsahern@kernel.org>
Closes: https://lore.kernel.org/netdev/2f5146ff-507d-4cab-a195-b28c0c9e654e@kernel.org/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20240903133554.2807343-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mana_set_channels() function requires detaching the mana
driver and reattaching it with changed channel values.
During this operation if the system is low on memory, the reattach
might fail, causing the network device being down.
To avoid this we pre-allocate buffers at the beginning of set operation,
to prevent complete network loss
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1725248734-21760-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently napi_disable() gets called during rxq and txq cleanup,
even before napi is enabled and hrtimer is initialized. It causes
kernel panic.
? page_fault_oops+0x136/0x2b0
? page_counter_cancel+0x2e/0x80
? do_user_addr_fault+0x2f2/0x640
? refill_obj_stock+0xc4/0x110
? exc_page_fault+0x71/0x160
? asm_exc_page_fault+0x27/0x30
? __mmdrop+0x10/0x180
? __mmdrop+0xec/0x180
? hrtimer_active+0xd/0x50
hrtimer_try_to_cancel+0x2c/0xf0
hrtimer_cancel+0x15/0x30
napi_disable+0x65/0x90
mana_destroy_rxq+0x4c/0x2f0
mana_create_rxq.isra.0+0x56c/0x6d0
? mana_uncfg_vport+0x50/0x50
mana_alloc_queues+0x21b/0x320
? skb_dequeue+0x5f/0x80
Cc: stable@vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There's a potential race when `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` is
false during the execution of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`, but
becomes true when `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is called.
This inconsistency can lead to `BPF_CGROUP_RUN_PROG_GETSOCKOPT` receiving
an "-EFAULT" from `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)`.
Scenario shown as below:
`process A` `process B`
----------- ------------
BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN
enable CGROUP_GETSOCKOPT
BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT)
To resolve this, remove the `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` macro and
directly uses `copy_from_sockptr` to ensure that `max_optlen` is always
set before `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is invoked.
Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Co-developed-by: Yanghui Li <yanghui.li@mediatek.com>
Signed-off-by: Yanghui Li <yanghui.li@mediatek.com>
Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://patch.msgid.link/20240830082518.23243-1-Tze-nan.Wu@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2024-09-01
Simon Horman catched two typos in our headers. No functional change.
* tag 'ieee802154-for-net-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
ieee802154: Correct spelling in nl802154.h
mac802154: Correct spelling in mac802154.h
====================
Link: https://patch.msgid.link/20240901184213.2303047-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Store new timeout and expiration in transaction object, use them to
update elements from .commit path. Otherwise, discard update if .abort
path is exercised.
Use update_flags in the transaction to note whether the timeout,
expiration, or both need to be updated.
Annotate access to timeout extension now that it can be updated while
lockless read access is possible.
Reject timeout updates on elements with no timeout extension.
Element transaction remains in the 96 bytes kmalloc slab on x86_64 after
this update.
This patch requires ("netfilter: nf_tables: use timestamp to check for
set element timeout") to make sure an element does not expire while
transaction is ongoing.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch uses zero as timeout marker for those elements that never expire
when the element is created.
If userspace provides no timeout for an element, then the default set
timeout applies. However, if no default set timeout is specified and
timeout flag is set on, then timeout extension is allocated and timeout
is set to zero to allow for future updates.
Use of zero a never timeout marker has been suggested by Phil Sutter.
Note that, in older kernels, it is already possible to define elements
that never expire by declaring a set with the set timeout flag set on
and no global set timeout, in this case, new element with no explicit
timeout never expire do not allocate the timeout extension, hence, they
never expire. This approach makes it complicated to accomodate element
timeout update, because element extensions do not support reallocations.
Therefore, allocate the timeout extension and use the new marker for
this case, but do not expose it to userspace to retain backward
compatibility in the set listing.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Expiration and timeout are stored in separated set element extensions,
but they are tightly coupled. Consolidate them in a single extension to
simplify and prepare for set element updates.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
element expiration can be read-write locklessly, it can be written by
dynset and read from netlink dump, add annotation.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Correct spelling in iw_handler.h.
As reported by codespell.
Also, while the "few shortcomings" line is being updated,
correct its grammar.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240903-wifi-spell-v2-1-bfcf7062face@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
NETIF_F_ALL_FCOE is used only in vlan_dev.c, 2 times. Now that it's only
2 bits, open-code it and remove the definition from netdev_features.h.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Ability to handle maximum FCoE frames of 2158 bytes can never be changed
and thus more of an attribute, not a toggleable feature.
Move it from netdev_features_t to "cold" priv flags (bitfield bool) and
free yet another feature bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
"Interface can't change network namespaces" is rather an attribute,
not a feature, and it can't be changed via Ethtool.
Make it a "cold" private flag instead of a netdev_feature and free
one more bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
rather an attribute, very similar to IFF_NO_QUEUE (and hot).
Free one netdev_features_t bit and make it a "hot" private flag.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Make dev->priv_flags `u32` back and define bits higher than 31 as
bitfield booleans as per Jakub's suggestion. This simplifies code
which accesses these bits with no optimization loss (testb both
before/after), allows to not extend &netdev_priv_flags each time,
but also scales better as bits > 63 in the future would only add
a new u64 to the structure with no complications, comparing to
that extending ::priv_flags would require converting it to a bitmap.
Note that I picked `unsigned long :1` to not lose any potential
optimizations comparing to `bool :1` etc.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
- Add missing documentation of struct field and enum items.
- Add missing documentation of function parameter.
Flagged by ./scripts/kernel-doc -none.
No functional change intended.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Correct spelling in nf_tables.h.
As reported by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Since commit a654de8fdc18 ("netfilter: nf_tables: fix chain dependency validation")
the validate() callback no longer needs the return pointer argument.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- x2apic_disable() clears x2apic_state and x2apic_mode unconditionally,
even when the state is X2APIC_ON_LOCKED, which prevents the kernel to
disable it thereby creating inconsistent state.
Reorder the logic so it actually works correctly
- The XSTATE logic for handling LBR is incorrect as it assumes that
XSAVES supports LBR when the CPU supports LBR. In fact both
conditions need to be true. Otherwise the enablement of LBR in the
IA32_XSS MSR fails and subsequently the machine crashes on the next
XRSTORS operation because IA32_XSS is not initialized.
Cache the XSTATE support bit during init and make the related
functions use this cached information and the LBR CPU feature bit to
cure this.
- Cure a long standing bug in KASLR
KASLR uses the full address space between PAGE_OFFSET and vaddr_end
to randomize the starting points of the direct map, vmalloc and
vmemmap regions. It thereby limits the size of the direct map by
using the installed memory size plus an extra configurable margin for
hot-plug memory. This limitation is done to gain more randomization
space because otherwise only the holes between the direct map,
vmalloc, vmemmap and vaddr_end would be usable for randomizing.
The limited direct map size is not exposed to the rest of the kernel,
so the memory hot-plug and resource management related code paths
still operate under the assumption that the available address space
can be determined with MAX_PHYSMEM_BITS.
request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
downwards. That means the first allocation happens past the end of
the direct map and if unlucky this address is in the vmalloc space,
which causes high_memory to become greater than VMALLOC_START and
consequently causes iounmap() to fail for valid ioremap addresses.
Cure this by exposing the end of the direct map via PHYSMEM_END and
use that for the memory hot-plug and resource management related
places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case
PHYSMEM_END maps to a variable which is initialized by the KASLR
initialization and otherwise it is based on MAX_PHYSMEM_BITS as
before.
- Prevent a data leak in mmio_read(). The TDVMCALL exposes the value of
an initialized variabled on the stack to the VMM. The variable is
only required as output value, so it does not have to exposed to the
VMM in the first place.
- Prevent an array overrun in the resource control code on systems with
Sub-NUMA Clustering enabled because the code failed to adjust the
index by the number of SNC nodes per L3 cache.
* tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix arch_mbm_* array overrun on SNC
x86/tdx: Fix data leak in mmio_read()
x86/kaslr: Expose and use the end of the physical memory address space
x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported
x86/apic: Make x2apic_disable() work correctly
|
|
Pull smb client fixes from Steve French:
- copy_file_range fix
- two read fixes including read past end of file rc fix and read retry
crediting fix
- falloc zero range fix
* tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
cifs: Fix copy offload to flush destination region
netfs, cifs: Fix handling of short DIO read
cifs: Fix lack of credit renegotiation on read retry
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There is a fairly large number of bug fixes for Qualcomm platforms,
most of them addressing issues with the devicetree files for the newly
added Snapdragon X1 based laptops to make them more reliable.
The Qualcomm driver changes address a few build-time issues as well as
runtime problems in the tzmem and scm firmware, the USB Type-C driver,
and the cmd-db and pmic_glink soc drivers.
The NXP i.MX usually gets a bunch of devicetree fixes that is
proportional to the number of supported machines. This includes both
warning fixes and correctness for the 64-bit i.MX9, i.MX8 and
layerscape platforms, as well as a single fix for a 32-bit i.MX6 based
board.
The other changes are the usual minor changes, including an update to
the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs
(risc-v)"
* tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits)
firmware: microchip: fix incorrect error report of programming:timeout on success
soc: qcom: pd-mapper: Fix singleton refcount
firmware: qcom: tzmem: disable sdm670 platform
soc: qcom: pmic_glink: Actually communicate when remote goes down
usb: typec: ucsi: Move unregister out of atomic section
soc: qcom: pmic_glink: Fix race during initialization
firmware: qcom: qseecom: remove unused functions
firmware: qcom: tzmem: fix virtual-to-physical address conversion
firmware: qcom: scm: Mark get_wq_ctx() as atomic call
arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt
arm64: dts: qcom: disable GPU on x1e80100 by default
arm64: dts: imx8mm-phygate: fix typo pinctrcl-0
arm64: dts: imx95: correct L3Cache cache-sets
arm64: dts: imx95: correct a55 power-domains
arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo
arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges
ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design
arm64: dts: imx93: update default value for snps,clk-csr
arm64: dts: freescale: tqma9352: Fix watchdog reset
arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962
...
|
|
The function returns a value that is used to initialize 'flowi4_tos'
before being passed to the FIB lookup API in the following call chain:
xfrm_bundle_create()
tos = xfrm_get_tos(fl, family)
xfrm_dst_lookup(..., tos, ...)
__xfrm_dst_lookup(..., tos, ...)
xfrm4_dst_lookup(..., tos, ...)
__xfrm4_dst_lookup(..., tos, ...)
fl4->flowi4_tos = tos
__ip_route_output_key(net, fl4)
Unmask the upper DSCP bits so that in the future the output route lookup
could be performed according to the full DSCP value.
Remove IPTOS_RT_MASK since it is no longer used.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function is used by a few socket types to retrieve the TOS value
with which to perform the FIB lookup for packets sent through the socket
(flowi4_tos). If a DS field was passed using the IP_TOS control message,
then it is used. Otherwise the one specified via the IP_TOS socket
option.
Unmask the upper DSCP bits so that in the future the lookup could be
performed according to the full DSCP value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function is used to read the DS field that was stored in IPv4
sockets via the IP_TOS socket option so that it could be used to
initialize the flowi4_tos field before resolving an output route.
Unmask the upper DSCP bits so that in the future the output route lookup
could be performed according to the full DSCP value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 59b047bc98084f8af2c41483e4d68a5adf2fa7f7 which
breaks compatibility with commands like:
bluetoothd[46328]: @ MGMT Command: Load.. (0x0013) plen 74 {0x0001} [hci0]
Keys: 2
BR/EDR Address: C0:DC:DA:A5:E5:47 (Samsung Electronics Co.,Ltd)
Key type: Authenticated key from P-256 (0x03)
Central: 0x00
Encryption size: 16
Diversifier[2]: 0000
Randomizer[8]: 0000000000000000
Key[16]: 6ed96089bd9765be2f2c971b0b95f624
LE Address: D7:2A:DE:1E:73:A2 (Static)
Key type: Unauthenticated key from P-256 (0x02)
Central: 0x00
Encryption size: 16
Diversifier[2]: 0000
Randomizer[8]: 0000000000000000
Key[16]: 87dd2546ededda380ffcdc0a8faa4597
@ MGMT Event: Command Status (0x0002) plen 3 {0x0001} [hci0]
Load Long Term Keys (0x0013)
Status: Invalid Parameters (0x0d)
Cc: stable@vger.kernel.org
Link: https://github.com/bluez/bluez/issues/875
Fixes: 59b047bc9808 ("Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|