summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)AuthorFilesLines
2023-09-07Merge tag 'net-6.6-rc1' of ↵Linus Torvalds33-92/+229
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking updates from Jakub Kicinski: "Including fixes from netfilter and bpf. Current release - regressions: - eth: stmmac: fix failure to probe without MAC interface specified Current release - new code bugs: - docs: netlink: fix missing classic_netlink doc reference Previous releases - regressions: - deal with integer overflows in kmalloc_reserve() - use sk_forward_alloc_get() in sk_get_meminfo() - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc - fib: avoid warn splat in flow dissector after packet mangling - skb_segment: call zero copy functions before using skbuff frags - eth: sfc: check for zero length in EF10 RX prefix Previous releases - always broken: - af_unix: fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR() - netfilter: - nft_exthdr: fix non-linear header modification - xt_u32, xt_sctp: validate user space input - nftables: exthdr: fix 4-byte stack OOB write - nfnetlink_osf: avoid OOB read - one more fix for the garbage collection work from last release - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t - handshake: fix null-deref in handshake_nl_done_doit() - ip: ignore dst hint for multipath routes to ensure packets are hashed across the nexthops - phy: micrel: - correct bit assignments for cable test errata - disable EEE according to the KSZ9477 errata Misc: - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations - Revert "net: macsec: preserve ingress frame ordering", it appears to have been developed against an older kernel, problem doesn't exist upstream" * tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs() Revert "net: team: do not use dynamic lockdep key" net: hns3: remove GSO partial feature bit net: hns3: fix the port information display when sfp is absent net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue net: hns3: fix debugfs concurrency issue between kfree buffer and read net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read() net: hns3: Support query tx timeout threshold by debugfs net: hns3: fix tx timeout issue net: phy: Provide Module 4 KSZ9477 errata (DS80000754C) netfilter: nf_tables: Unbreak audit log reset netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID netfilter: nfnetlink_osf: avoid OOB read netfilter: nftables: exthdr: fix 4-byte stack OOB write selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc bpf: bpf_sk_storage: Fix invalid wait context lockdep report s390/bpf: Pass through tail call counter in trampolines ...
2023-09-07net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()Vladimir Oltean1-1/+1
enetc_psi_create() returns an ERR_PTR() or a valid station interface pointer, but checking for the non-NULL quality of the return code blurs that difference away. So if enetc_psi_create() fails, we call enetc_psi_destroy() when we shouldn't. This will likely result in crashes, since enetc_psi_create() cleans up everything after itself when it returns an ERR_PTR(). Fixes: f0168042a212 ("net: enetc: reimplement RFS/RSS memory clearing as PCI quirk") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/netdev/582183ef-e03b-402b-8e2d-6d9bb3c83bd9@moroto.mountain/ Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230906141609.247579-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-09-07Revert "net: team: do not use dynamic lockdep key"Jakub Kicinski2-56/+59
This reverts commit 39285e124edbc752331e98ace37cc141a6a3747a. Looks like the change has unintended consequences in exposing objects before they are initialized. Let's drop this patch and try again in net-next. Reported-by: syzbot+44ae022028805f4600fc@syzkaller.appspotmail.com Fixes: 39285e124edb ("net: team: do not use dynamic lockdep key") Link: https://lore.kernel.org/all/20230907103124.6adb7256@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-09-07net: hns3: remove GSO partial feature bitJie Wang1-2/+0
HNS3 NIC does not support GSO partial packets segmentation. Actually tunnel packets for example NvGRE packets segment offload and checksum offload is already supported. There is no need to keep gso partial feature bit. So this patch removes it. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-07net: hns3: fix the port information display when sfp is absentYisen Zhuang1-1/+3
When sfp is absent or unidentified, the port type should be displayed as PORT_OTHERS, rather than PORT_FIBRE. Fixes: 88d10bd6f730 ("net: hns3: add support for multiple media type") Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-07net: hns3: fix invalid mutex between tc qdisc and dcb ets command issueJijie Shao4-19/+9
We hope that tc qdisc and dcb ets commands can not be used crosswise. If we want to use any of the commands to configure tc, We must use the other command to clear the existing configuration. However, when we configure a single tc with tc qdisc, we can still configure it with dcb ets. Because we use mqprio_active as the tag of tc qdisc configuration, but with dcb ets, we do not check mqprio_active. This patch fix this issue by check mqprio_active before executing the dcb ets command. and add dcb_ets_active to replace HCLGE_FLAG_DCB_ENABLE and HCLGE_FLAG_MQPRIO_ENABLE at the hclge layer, Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-07net: hns3: fix debugfs concurrency issue between kfree buffer and readHao Chen1-3/+4
Now in hns3_dbg_uninit(), there may be concurrency between kfree buffer and read, it may result in memory error. Moving debugfs_remove_recursive() in front of kfree buffer to ensure they don't happen at the same time. Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process") Signed-off-by: Hao Chen <chenhao418@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-07net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()Hao Chen1-7/+7
req1->tcam_data is defined as "u8 tcam_data[8]", and we convert it as (u32 *) without considerring byte order conversion, it may result in printing wrong data for tcam_data. Convert tcam_data to (__le32 *) first to fix it. Fixes: b5a0b70d77b9 ("net: hns3: refactor dump fd tcam of debugfs") Signed-off-by: Hao Chen <chenhao418@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-07net: hns3: Support query tx timeout threshold by debugfsJijie Shao1-0/+4
support query tx timeout threshold by debugfs Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-07net: hns3: fix tx timeout issueJian Shen1-5/+12
Currently, the driver knocks the ring doorbell before updating the ring->last_to_use in tx flow. if the hardware transmiting packet and napi poll scheduling are fast enough, it may get the old ring->last_to_use in drivers' napi poll. In this case, the driver will think the tx is not completed, and return directly without clear the flag __QUEUE_STATE_STACK_XOFF, which may cause tx timeout. Fixes: 20d06ca2679c ("net: hns3: optimize the tx clean process") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-06net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)Lukasz Majewski2-4/+21
The KSZ9477 errata points out (in 'Module 4') the link up/down problems when EEE (Energy Efficient Ethernet) is enabled in the device to which the KSZ9477 tries to auto negotiate. The suggested workaround is to clear advertisement of EEE for PHYs in this chip driver. To avoid regressions with other switch ICs the new MICREL_NO_EEE flag has been introduced. Moreover, the in-register disablement of MMD_DEVICE_ID_EEE_ADV.MMD_EEE_ADV MMD register is removed, as this code is both; now executed too late (after previous rework of the PHY and DSA for KSZ switches) and not required as setting all members of eee_broken_modes bit field prevents the KSZ9477 from advertising EEE. Fixes: 69d3b36ca045 ("net: dsa: microchip: enable EEE support") # for KSZ9477 Signed-off-by: Lukasz Majewski <lukma@denx.de> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # Confirmed disabled EEE with oscilloscope. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20230905093315.784052-1-lukma@denx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-09-06net: dsa: sja1105: complete tc-cbs offload support on SJA1110Vladimir Oltean3-0/+19
The blamed commit left this delta behind: struct sja1105_cbs_entry { - u64 port; - u64 prio; + u64 port; /* Not used for SJA1110 */ + u64 prio; /* Not used for SJA1110 */ u64 credit_hi; u64 credit_lo; u64 send_slope; u64 idle_slope; }; but did not actually implement tc-cbs offload fully for the new switch. The offload is accepted, but it doesn't work. The difference compared to earlier switch generations is that now, the table of CBS shapers is sparse, because there are many more shapers, so the mapping between a {port, prio} and a table index is static, rather than requiring us to store the port and prio into the sja1105_cbs_entry. So, the problem is that the code programs the CBS shaper parameters at a dynamic table index which is incorrect. All that needs to be done for SJA1110 CBS shapers to work is to bypass the logic which allocates shapers in a dense manner, as for SJA1105, and use the fixed mapping instead. Fixes: 3e77e59bf8cf ("net: dsa: sja1105: add support for the SJA1110 switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06net: dsa: sja1105: fix -ENOSPC when replacing the same tc-cbs too many timesVladimir Oltean1-3/+20
After running command [2] too many times in a row: [1] $ tc qdisc add dev sw2p0 root handle 1: mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0 [2] $ tc qdisc replace dev sw2p0 parent 1:1 cbs offload 1 \ idleslope 120000 sendslope -880000 locredit -1320 hicredit 180 (aka more than priv->info->num_cbs_shapers times) we start seeing the following error message: Error: Specified device failed to setup cbs hardware offload. This comes from the fact that ndo_setup_tc(TC_SETUP_QDISC_CBS) presents the same API for the qdisc create and replace cases, and the sja1105 driver fails to distinguish between the 2. Thus, it always thinks that it must allocate the same shaper for a {port, queue} pair, when it may instead have to replace an existing one. Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06net: dsa: sja1105: fix bandwidth discrepancy between tc-cbs software and offloadVladimir Oltean1-3/+12
More careful measurement of the tc-cbs bandwidth shows that the stream bandwidth (effectively idleslope) increases, there is a larger and larger discrepancy between the rate limit obtained by the software Qdisc, and the rate limit obtained by its offloaded counterpart. The discrepancy becomes so large, that e.g. at an idleslope of 40000 (40Mbps), the offloaded cbs does not actually rate limit anything, and traffic will pass at line rate through a 100 Mbps port. The reason for the discrepancy is that the hardware documentation I've been following is incorrect. UM11040.pdf (for SJA1105P/Q/R/S) states about IDLE_SLOPE that it is "the rate (in unit of bytes/sec) at which the credit counter is increased". Cross-checking with UM10944.pdf (for SJA1105E/T) and UM11107.pdf (for SJA1110), the wording is different: "This field specifies the value, in bytes per second times link speed, by which the credit counter is increased". So there's an extra scaling for link speed that the driver is currently not accounting for, and apparently (empirically), that link speed is expressed in Kbps. I've pondered whether to pollute the sja1105_mac_link_up() implementation with CBS shaper reprogramming, but I don't think it is worth it. IMO, the UAPI exposed by tc-cbs requires user space to recalculate the sendslope anyway, since the formula for that depends on port_transmit_rate (see man tc-cbs), which is not an invariant from tc's perspective. So we use the offload->sendslope and offload->idleslope to deduce the original port_transmit_rate from the CBS formula, and use that value to scale the offload->sendslope and offload->idleslope to values that the hardware understands. Some numerical data points: 40Mbps stream, max interfering frame size 1500, port speed 100M --------------------------------------------------------------- tc-cbs parameters: idleslope 40000 sendslope -60000 locredit -900 hicredit 600 which result in hardware values: Before (doesn't work) After (works) credit_hi 600 600 credit_lo 900 900 send_slope 7500000 75 idle_slope 5000000 50 40Mbps stream, max interfering frame size 1500, port speed 1G ------------------------------------------------------------- tc-cbs parameters: idleslope 40000 sendslope -960000 locredit -1440 hicredit 60 which result in hardware values: Before (doesn't work) After (works) credit_hi 60 60 credit_lo 1440 1440 send_slope 120000000 120 idle_slope 5000000 5 5.12Mbps stream, max interfering frame size 1522, port speed 100M ----------------------------------------------------------------- tc-cbs parameters: idleslope 5120 sendslope -94880 locredit -1444 hicredit 77 which result in hardware values: Before (doesn't work) After (works) credit_hi 77 77 credit_lo 1444 1444 send_slope 11860000 118 idle_slope 640000 6 Tested on SJA1105T, SJA1105S and SJA1110A, at 1Gbps and 100Mbps. Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc") Reported-by: Yanan Yang <yanan.yang@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06Merge branch '1GbE' of ↵David S. Miller3-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Change MIN_TXD and MIN_RXD to allow set rx/tx value between 64 and 80 Olga Zaborska says: Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igb, igbvf and igc devices can use as low as 64 descriptors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06mlx5/core: E-Switch, Create ACL FT for eswitch manager in switchdev modeBodong Wang2-19/+51
ACL flow table is required in switchdev mode when metadata is enabled, driver creates such table when loading each vport. However, not every vport is loaded in switchdev mode. Such as ECPF if it's the eswitch manager. In this case, ACL flow table is still needed. To make it modularized, create ACL flow table for eswitch manager as default and skip such operations when loading manager vport. Also, there is no need to load the eswitch manager vport in switchdev mode. This means there is no need to load it on regular connect-x HCAs where the PF is the eswitch manager. This will avoid creating duplicate ACL flow table for host PF vport. Fixes: 29bcb6e4fe70 ("net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules") Fixes: eb8e9fae0a22 ("mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager") Fixes: 5019833d661f ("net/mlx5: E-switch, Introduce helper function to enable/disable vports") Signed-off-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06net/mlx5e: Clear mirred devices array if the rule is splitJianbo Liu7-3/+13
In the cited commit, the mirred devices are recorded and checked while parsing the actions. In order to avoid system crash, the duplicate action in a single rule is not allowed. But the rule is actually break down into several FTEs in different tables, for either mirroring, or the specified types of actions which use post action infrastructure. It will reject certain action list by mistake, for example: actions:enp8s0f0_1,set(ipv4(ttl=63)),enp8s0f0_0,enp8s0f0_1. Here the rule is split to two FTEs because of pedit action. To fix this issue, when parsing the rule actions, reset if_count to clear the mirred devices array if the rule is split to multiple FTEs, and then the duplicate checking is restarted. Fixes: 554fe75c1b3f ("net/mlx5e: Avoid duplicating rule destinations") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06net: team: do not use dynamic lockdep keyTaehee Yoo2-59/+56
team interface has used a dynamic lockdep key to avoid false-positive lockdep deadlock detection. Virtual interfaces such as team usually have their own lock for protecting private data. These interfaces can be nested. team0 | team1 Each interface's lock is actually different(team0->lock and team1->lock). So, mutex_lock(&team0->lock); mutex_lock(&team1->lock); mutex_unlock(&team1->lock); mutex_unlock(&team0->lock); The above case is absolutely safe. But lockdep warns about deadlock. Because the lockdep understands these two locks are same. This is a false-positive lockdep warning. So, in order to avoid this problem, the team interfaces started to use dynamic lockdep key. The false-positive problem was fixed, but it introduced a new problem. When the new team virtual interface is created, it registers a dynamic lockdep key(creates dynamic lockdep key) and uses it. But there is the limitation of the number of lockdep keys. So, If so many team interfaces are created, it consumes all lockdep keys. Then, the lockdep stops to work and warns about it. In order to fix this problem, team interfaces use the subclass instead of the dynamic key. So, when a new team interface is created, it doesn't register(create) a new lockdep, but uses existed subclass key instead. It is already used by the bonding interface for a similar case. As the bonding interface does, the subclass variable is the same as the 'dev->nested_level'. This variable indicates the depth in the stacked interface graph. The 'dev->nested_level' is protected by RTNL and RCU. So, 'mutex_lock_nested()' for 'team->lock' requires RTNL or RCU. In the current code, 'team->lock' is usually acquired under RTNL, there is no problem with using 'dev->nested_level'. The 'team_nl_team_get()' and The 'lb_stats_refresh()' functions acquire 'team->lock' without RTNL. But these don't iterate their own ports nested so they don't need nested lock. Reproducer: for i in {0..1000} do ip link add team$i type team ip link add dummy$i master team$i type dummy ip link set dummy$i up ip link set team$i up done Splat looks like: BUG: MAX_LOCKDEP_ENTRIES too low! turning off the locking correctness validator. Please attach the output of /proc/lock_stat to the bug report CPU: 0 PID: 4104 Comm: ip Not tainted 6.5.0-rc7+ #45 Call Trace: <TASK> dump_stack_lvl+0x64/0xb0 add_lock_to_list+0x30d/0x5e0 check_prev_add+0x73a/0x23a0 ... sock_def_readable+0xfe/0x4f0 netlink_broadcast+0x76b/0xac0 nlmsg_notify+0x69/0x1d0 dev_open+0xed/0x130 ... Reported-by: syzbot+9bbbacfbf1e04d5221f7@syzkaller.appspotmail.com Fixes: 369f61bee0f5 ("team: fix nested locking lockdep warning") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-05igb: Change IGB_MIN to allow set rx/tx value between 64 and 80Olga Zaborska1-2/+2
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igb devices can use as low as 64 descriptors. This change will unify igb with other drivers. Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64") Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver") Signed-off-by: Olga Zaborska <olga.zaborska@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-05igbvf: Change IGBVF_MIN to allow set rx/tx value between 64 and 80Olga Zaborska1-2/+2
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igbvf devices can use as low as 64 descriptors. This change will unify igbvf with other drivers. Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64") Fixes: d4e0fe01a38a ("igbvf: add new driver to support 82576 virtual functions") Signed-off-by: Olga Zaborska <olga.zaborska@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-05igc: Change IGC_MIN to allow set rx/tx value between 64 and 80Olga Zaborska1-2/+2
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igc devices can use as low as 64 descriptors. This change will unify igc with other drivers. Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64") Fixes: 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers") Signed-off-by: Olga Zaborska <olga.zaborska@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-05octeontx2-af: Fix truncation of smq in CN10K NIX AQ enqueue mbox handlerGeetha sowjanya1-2/+19
The smq value used in the CN10K NIX AQ instruction enqueue mailbox handler was truncated to 9-bit value from 10-bit value because of typecasting the CN10K mbox request structure to the CN9K structure. Though this hasn't caused any problems when programming the NIX SQ context to the HW because the context structure is the same size. However, this causes a problem when accessing the structure parameters. This patch reads the right smq value for each platform. Fixes: 30077d210c83 ("octeontx2-af: cn10k: Update NIX/NPA context structure") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-05Revert "net: macsec: preserve ingress frame ordering"Sabrina Dubroca1-2/+1
This reverts commit ab046a5d4be4c90a3952a0eae75617b49c0cb01b. It was trying to work around an issue at the crypto layer by excluding ASYNC implementations of gcm(aes), because a bug in the AESNI version caused reordering when some requests bypassed the cryptd queue while older requests were still pending on the queue. This was fixed by commit 38b2f68b4264 ("crypto: aesni - Fix cryptd reordering problem on gcm"), which pre-dates ab046a5d4be4. Herbert Xu confirmed that all ASYNC implementations are expected to maintain the ordering of completions wrt requests, so we can use them in MACsec. On my test machine, this restores the performance of a single netperf instance, from 1.4Gbps to 4.4Gbps. Link: https://lore.kernel.org/netdev/9328d206c5d9f9239cae27e62e74de40b258471d.1692279161.git.sd@queasysnail.net/T/ Link: https://lore.kernel.org/netdev/1b0cec71-d084-8153-2ba4-72ce71abeb65@byu.edu/ Link: https://lore.kernel.org/netdev/d335ddaa-18dc-f9f0-17ee-9783d3b2ca29@mailbox.tu-dresden.de/ Fixes: ab046a5d4be4 ("net: macsec: preserve ingress frame ordering") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/11c952469d114db6fb29242e1d9545e61f52f512.1693757159.git.sd@queasysnail.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-04Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-26/+202
Pull virtio updates from Michael Tsirkin: "A small pull request this time around, mostly because the vduse network got postponed to next relase so we can be sure we got the security store right" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_ring: fix avail_wrap_counter in virtqueue_add_packed virtio_vdpa: build affinity masks conditionally virtio_net: merge dma operations when filling mergeable buffers virtio_ring: introduce dma sync api for virtqueue virtio_ring: introduce dma map api for virtqueue virtio_ring: introduce virtqueue_reset() virtio_ring: separate the logic of reset/enable from virtqueue_resize virtio_ring: correct the expression of the description of virtqueue_resize() virtio_ring: skip unmap for premapped virtio_ring: introduce virtqueue_dma_dev() virtio_ring: support add premapped buf virtio_ring: introduce virtqueue_set_dma_premapped() virtio_ring: put mapping error check in vring_map_one_sg virtio_ring: check use_dma_api before unmap desc for indirect vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK vdpa: add get_backend_features vdpa operation vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag vdpa/mlx5: Remove unused function declarations
2023-09-04veth: Fixing transmit return status for dropped packetsLiang Chen1-1/+3
The veth_xmit function returns NETDEV_TX_OK even when packets are dropped. This behavior leads to incorrect calculations of statistics counts, as well as things like txq->trans_start updates. Fixes: e314dbdc1c0d ("[NET]: Virtual ethernet device driver.") Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04gve: fix frag_list chainingEric Dumazet1-1/+4
gve_rx_append_frags() is able to build skbs chained with frag_list, like GRO engine. Problem is that shinfo->frag_list should only be used for the head of the chain. All other links should use skb->next pointer. Otherwise, built skbs are not valid and can cause crashes. Equivalent code in GRO (skb_gro_receive()) is: if (NAPI_GRO_CB(p)->last == p) skb_shinfo(p)->frag_list = skb; else NAPI_GRO_CB(p)->last->next = skb; NAPI_GRO_CB(p)->last = skb; Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Bailey Forrest <bcf@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Catherine Sullivan <csully@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-03virtio_net: merge dma operations when filling mergeable buffersXuan Zhuo1-26/+202
Currently, the virtio core will perform a dma operation for each buffer. Although, the same page may be operated multiple times. This patch, the driver does the dma operation and manages the dma address based the feature premapped of virtio core. This way, we can perform only one dma operation for the pages of the alloc frag. This is beneficial for the iommu device. kernel command line: intel_iommu=on iommu.passthrough=0 | strict=0 | strict=1 Before | 775496pps | 428614pps After | 1109316pps | 742853pps Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-13-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03igb: disable virtualization features on 82580Corinna Vinschen1-2/+3
Disable virtualization features on 82580 just as on i210/i211. This avoids that virt functions are acidentally called on 82850. Fixes: 55cac248caa4 ("igb: Add full support for 82580 devices") Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds3-1/+56
Pull rdma updates from Jason Gunthorpe: "Many small changes across the subystem, some highlights: - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca, hns, and bnxt_re - siw now works over tunnel and other netdevs with a MAC address by removing assumptions about a MAC/GID from the connection manager - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow userspace to slow down the doorbell rings if the HW gets full - irdma egress VLAN priority, better QP/WQ sizing - rxe bug fixes in queue draining and srq resizing - Support more ethernet speed options in the core layer - DMABUF support for bnxt_re - Multi-stage MTT support for erdma to allow much bigger MR registrations - A irdma fix with a CVE that came in too late to go to -rc, missing bounds checking for 0 length MRs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (87 commits) IB/hfi1: Reduce printing of errors during driver shut down RDMA/hfi1: Move user SDMA system memory pinning code to its own file RDMA/hfi1: Use list_for_each_entry() helper RDMA/mlx5: Fix trailing */ formatting in block comment RDMA/rxe: Fix redundant break statement in switch-case. RDMA/efa: Fix wrong resources deallocation order RDMA/siw: Call llist_reverse_order in siw_run_sq RDMA/siw: Correct wrong debug message RDMA/siw: Balance the reference of cep->kref in the error path Revert "IB/isert: Fix incorrect release of isert connection" RDMA/bnxt_re: Fix kernel doc errors RDMA/irdma: Prevent zero-length STAG registration RDMA/erdma: Implement hierarchical MTT RDMA/erdma: Refactor the storage structure of MTT entries RDMA/erdma: Renaming variable names and field names of struct erdma_mem RDMA/hns: Support hns HW stats RDMA/hns: Dump whole QP/CQ/MR resource in raw RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp() RDMA/mlx4: Copy union directly RDMA/irdma: Drop unused kernel push code ...
2023-09-01Merge tag 'tty-6.6-rc1' of ↵Linus Torvalds11-99/+59
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here is the big set of tty and serial driver changes for 6.6-rc1. Lots of cleanups in here this cycle, and some driver updates. Short summary is: - Jiri's continued work to make the tty code and apis be a bit more sane with regards to modern kernel coding style and types - cpm_uart driver updates - n_gsm updates and fixes - meson driver updates - sc16is7xx driver updates - 8250 driver updates for different hardware types - qcom-geni driver fixes - tegra serial driver change - stm32 driver updates - synclink_gt driver cleanups - tty structure size reduction All of these have been in linux-next this week with no reported issues. The last bit of cleanups from Jiri and the tty structure size reduction came in last week, a bit late but as they were just style changes and size reductions, I figured they should get into this merge cycle so that others can work on top of them with no merge conflicts" * tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits) tty: shrink the size of struct tty_struct by 40 bytes tty: n_tty: deduplicate copy code in n_tty_receive_buf_real_raw() tty: n_tty: extract ECHO_OP processing to a separate function tty: n_tty: unify counts to size_t tty: n_tty: use u8 for chars and flags tty: n_tty: simplify chars_in_buffer() tty: n_tty: remove unsigned char casts from character constants tty: n_tty: move newline handling to a separate function tty: n_tty: move canon handling to a separate function tty: n_tty: use MASK() for masking out size bits tty: n_tty: make n_tty_data::num_overrun unsigned tty: n_tty: use time_is_before_jiffies() in n_tty_receive_overrun() tty: n_tty: use 'num' for writes' counts tty: n_tty: use output character directly tty: n_tty: make flow of n_tty_receive_buf_common() a bool Revert "tty: serial: meson: Add a earlycon for the T7 SoC" Documentation: devices.txt: Fix minors for ttyCPM* Documentation: devices.txt: Remove ttySIOC* Documentation: devices.txt: Remove ttyIOC* serial: 8250_bcm7271: improve bcm7271 8250 port ...
2023-09-01Merge tag 'usb-6.6-rc1' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt / PHY driver updates from Greg KH: "Here is the big set of USB, Thunderbolt, and PHY driver updates for 6.6-rc1. Included in here are: - PHY driver additions and cleanups - Thunderbolt minor additions and fixes - USB MIDI 2 gadget support added - dwc3 driver updates and additions - Removal of some old USB wireless code that was missed when that codebase was originally removed a few years ago, cleaning up some core USB code paths - USB core potential use-after-free fixes that syzbot from different people/groups keeps tripping over - typec updates and additions - gadget fixes and cleanups - loads of smaller USB core and driver cleanups all over the place Full details are in the shortlog. All of these have been in linux-next for a while with no reported problems" * tag 'usb-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (154 commits) platform/chrome: cros_ec_typec: Configure Retimer cable type tcpm: Avoid soft reset when partner does not support get_status usb: typec: tcpm: reset counter when enter into unattached state after try role usb: typec: tcpm: set initial svdm version based on pd revision USB: serial: option: add FOXCONN T99W368/T99W373 product USB: serial: option: add Quectel EM05G variant (0x030e) usb: dwc2: add pci_device_id driver_data parse support usb: gadget: remove max support speed info in bind operation usb: gadget: composite: cleanup function config_ep_by_speed_and_alt() usb: gadget: config: remove max speed check in usb_assign_descriptors() usb: gadget: unconditionally allocate hs/ss descriptor in bind operation usb: gadget: f_uvc: change endpoint allocation in uvc_function_bind() usb: gadget: add a inline function gether_bitrate() usb: gadget: use working speed to calcaulate network bitrate and qlen dt-bindings: usb: samsung,exynos-dwc3: Add Exynos850 support usb: dwc3: exynos: Add support for Exynos850 variant usb: gadget: udc-xilinx: fix incorrect type in assignment warning usb: gadget: udc-xilinx: fix cast from restricted __le16 warning usb: gadget: udc-xilinx: fix restricted __le16 degrades to integer warning USB: dwc2: hande irq on dead controller correctly ...
2023-09-01sfc: check for zero length in EF10 RX prefixEdward Cree1-5/+15
When EF10 RXDP firmware is operating in cut-through mode, packet length is not known at the time the RX prefix is generated, so it is left as zero and RX event merging is inhibited to ensure that the length is available in the RX event. However, it has been found that in certain circumstances the RX events for these packets still get merged, meaning the driver cannot read the length from the RX event, and tries to use the length from the prefix. The resulting zero-length SKBs cause crashes in GRO since commit 1d11fa696733 ("net-gro: remove GRO_DROP"), so add a check to the driver to detect these zero-length RX events and discard the packet. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-31Merge tag 'powerpc-6.6-1' of ↵Linus Torvalds2-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the configured SMT state when hotplugging CPUs into the system - Combine final TLB flush and lazy TLB mm shootdown IPIs when using the Radix MMU to avoid a broadcast TLBIE flush on exit - Drop the exclusion between ptrace/perf watchpoints, and drop the now unused associated arch hooks - Add support for the "nohlt" command line option to disable CPU idle - Add support for -fpatchable-function-entry for ftrace, with GCC >= 13.1 - Rework memory block size determination, and support 256MB size on systems with GPUs that have hotpluggable memory - Various other small features and fixes Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar, Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain, Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai. * tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits) macintosh/ams: linux/platform_device.h is needed powerpc/xmon: Reapply "Relax frame size for clang" powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled powerpc/iommu: Fix notifiers being shared by PCI and VIO buses powerpc/mpc5xxx: Add missing fwnode_handle_put() powerpc/config: Disable SLAB_DEBUG_ON in skiroot powerpc/pseries: Remove unused hcall tracing instruction powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n powerpc: dts: add missing space before { powerpc/eeh: Use pci_dev_id() to simplify the code powerpc/64s: Move CPU -mtune options into Kconfig powerpc/powermac: Fix unused function warning powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT powerpc: Don't include lppaca.h in paca.h powerpc/pseries: Move hcall_vphn() prototype into vphn.h powerpc/pseries: Move VPHN constants into vphn.h cxl: Drop unused detach_spa() powerpc: Drop zalloc_maybe_bootmem() powerpc/powernv: Use struct opal_prd_msg in more places ...
2023-08-30Merge tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds1-17/+7
Pull VFIO updates from Alex Williamson: - VFIO direct character device (cdev) interface support. This extracts the vfio device fd from the container and group model, and is intended to be the native uAPI for use with IOMMUFD (Yi Liu) - Enhancements to the PCI hot reset interface in support of cdev usage (Yi Liu) - Fix a potential race between registering and unregistering vfio files in the kvm-vfio interface and extend use of a lock to avoid extra drop and acquires (Dmitry Torokhov) - A new vfio-pci variant driver for the AMD/Pensando Distributed Services Card (PDS) Ethernet device, supporting live migration (Brett Creeley) - Cleanups to remove redundant owner setup in cdx and fsl bus drivers, and simplify driver init/exit in fsl code (Li Zetao) - Fix uninitialized hole in data structure and pad capability structures for alignment (Stefan Hajnoczi) * tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio: (53 commits) vfio/pds: Send type for SUSPEND_STATUS command vfio/pds: fix return value in pds_vfio_get_lm_file() pds_core: Fix function header descriptions vfio: align capability structures vfio/type1: fix cap_migration information leak vfio/fsl-mc: Use module_fsl_mc_driver macro to simplify the code vfio/cdx: Remove redundant initialization owner in vfio_cdx_driver vfio/pds: Add Kconfig and documentation vfio/pds: Add support for firmware recovery vfio/pds: Add support for dirty page tracking vfio/pds: Add VFIO live migration support vfio/pds: register with the pds_core PF pds_core: Require callers of register/unregister to pass PF drvdata vfio/pds: Initial support for pds VFIO driver vfio: Commonize combine_ranges for use in other VFIO drivers kvm/vfio: avoid bouncing the mutex when adding and deleting groups kvm/vfio: ensure kvg instance stays around in kvm_vfio_group_add() docs: vfio: Add vfio device cdev description vfio: Compile vfio_group infrastructure optionally vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in __vfio_register_dev() ...
2023-08-30Merge tag 'pci-v6.6-changes' of ↵Linus Torvalds4-25/+25
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Add locking to read/modify/write PCIe Capability Register accessors for Link Control and Root Control - Use pci_dev_id() when possible instead of manually composing ID from dev->bus->number and dev->devfn Resource management: - Move prototypes for __weak sysfs resource files to linux/pci.h to fix 'no previous prototype' warnings - Make more I/O port accesses depend on HAS_IOPORT - Use devm_platform_get_and_ioremap_resource() instead of open-coding platform_get_resource() followed by devm_ioremap_resource() Power management: - Ensure devices are powered up while accessing VPD - If device is powered-up, keep it that way while polling for PME - Only read PCI_PM_CTRL register when available, to avoid reading the wrong register and corrupting dev->current_state Virtualization: - Avoid Secondary Bus Reset on NVIDIA T4 GPUs Error handling: - Remove unused pci_disable_pcie_error_reporting() - Unexport pci_enable_pcie_error_reporting(), used only by aer.c - Unexport pcie_port_bus_type, used only by PCI core VGA: - Simplify and clean up typos in VGA arbiter Apple PCIe controller driver: - Initialize pcie->nvecs (number of available MSIs) before use Broadcom iProc PCIe controller driver: - Use of_property_read_bool() instead of low-level accessors for boolean properties Broadcom STB PCIe controller driver: - Assert PERST# when probing BCM2711 because some bootloaders don't do it Freescale i.MX6 PCIe controller driver: - Add .host_deinit() callback so we can clean up things like regulators on probe failure or driver unload Freescale Layerscape PCIe controller driver: - Add support for link-down notification so the endpoint driver can process LINK_DOWN events - Add suspend/resume support, including manual PME_Turn_off/PME_TO_Ack handshake - Save Link Capabilities during probe so they can be restored when handling a link-up event, since the controller loses the Link Width and Link Speed values during reset Intel VMD host bridge driver: - Fix disable of bridge windows during domain reset; previously we cleared the base/limit registers, which actually left the windows enabled Marvell MVEBU PCIe controller driver: - Remove unused busn member Microchip PolarFlare PCIe controller driver: - Fix interrupt bit definitions so the SEC and DED interrupt handlers work correctly - Make driver buildable as a module - Read FPGA MSI configuration parameters from hardware instead of hard-coding them Microsoft Hyper-V host bridge driver: - To avoid a NULL pointer dereference, skip MSI restore after hibernate if MSI/MSI-X hasn't been enabled NVIDIA Tegra194 PCIe controller driver: - Revert 'PCI: tegra194: Enable support for 256 Byte payload' because Linux doesn't know how to reduce MPS from to 256 to 128 bytes for endpoints below a switch (because other devices below the switch might already be operating), which leads to 'Malformed TLP' errors Qualcomm PCIe controller driver: - Add DT and driver support for interconnect bandwidth voting for 'pcie-mem' and 'cpu-pcie' interconnects - Fix broken SDX65 'compatible' DT property - Configure controller so MHI bus master clock will be switched off while in ASPM L1.x states - Use alignment restriction from EPF core in EPF MHI driver - Add Endpoint eDMA support - Add MHI eDMA support - Add Snapdragon SM8450 support to the EPF MHI driversupport - Add MHI eDMA support - Add Snapdragon SM8450 support to the EPF MHI driversupport - Add MHI eDMA support - Add Snapdragon SM8450 support to the EPF MHI driversupport - Add MHI eDMA support - Add Snapdragon SM8450 support to the EPF MHI driver - Use iATU for EPF MHI transfers smaller than 4K to avoid eDMA setup latency - Add sa8775p DT binding and driver support Rockchip PCIe controller driver: - Use 64-bit mask on MSI 64-bit PCI address to avoid zeroing out the upper 32 bits SiFive FU740 PCIe controller driver: - Set the supported number of MSI vectors so we can use all available MSI interrupts Synopsys DesignWare PCIe controller driver: - Add generic dwc suspend/resume APIs (dw_pcie_suspend_noirq() and dw_pcie_resume_noirq()) to be called by controller driver suspend/resume ops, and a controller callback to send PME_Turn_Off MicroSemi Switchtec management driver: - Add support for PCIe Gen5 devices Miscellaneous: - Reorder and compress to reduce size of struct pci_dev - Fix race in DOE destroy_work_on_stack() - Add stubs to avoid casts between incompatible function types - Explicitly include correct DT includes to untangle headers" * tag 'pci-v6.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (96 commits) PCI: qcom-ep: Add ICC bandwidth voting support dt-bindings: PCI: qcom: ep: Add interconnects path PCI: qcom-ep: Treat unknown IRQ events as an error dt-bindings: PCI: qcom: Fix SDX65 compatible PCI: endpoint: Add kernel-doc for pci_epc_mem_init() API PCI: epf-mhi: Use iATU for small transfers PCI: epf-mhi: Add support for SM8450 PCI: epf-mhi: Add eDMA support PCI: qcom-ep: Add eDMA support PCI: epf-mhi: Make use of the alignment restriction from EPF core PCI/PM: Only read PCI_PM_CTRL register when available