summaryrefslogtreecommitdiff
path: root/include/linux/dsa
AgeCommit message (Collapse)AuthorFilesLines
2024-08-16net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and ↵Vladimir Oltean1-0/+47
register injection Problem description ------------------- On an NXP LS1028A (felix DSA driver) with the following configuration: - ocelot-8021q tagging protocol - VLAN-aware bridge (with STP) spanning at least swp0 and swp1 - 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700 - ptp4l on swp0.700 and swp1.700 we see that the ptp4l instances do not see each other's traffic, and they all go to the grand master state due to the ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition. Jumping to the conclusion for the impatient ------------------------------------------- There is a zero-day bug in the ocelot switchdev driver in the way it handles VLAN-tagged packet injection. The correct logic already exists in the source code, in function ocelot_xmit_get_vlan_info() added by commit 5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit"). But it is used only for normal NPI-based injection with the DSA "ocelot" tagging protocol. The other injection code paths (register-based and FDMA-based) roll their own wrong logic. This affects and was noticed on the DSA "ocelot-8021q" protocol because it uses register-based injection. By moving ocelot_xmit_get_vlan_info() to a place that's common for both the DSA tagger and the ocelot switch library, it can also be called from ocelot_port_inject_frame() in ocelot.c. We need to touch the lines with ocelot_ifh_port_set()'s prototype anyway, so let's rename it to something clearer regarding what it does, and add a kernel-doc. ocelot_ifh_set_basic() should do. Investigation notes ------------------- Debugging reveals that PTP event (aka those carrying timestamps, like Sync) frames injected into swp0.700 (but also swp1.700) hit the wire with two VLAN tags: 00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc ~~~~~~~~~~~ 00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00 ~~~~~~~~~~~ 00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03 00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00 00000040: 00 00 The second (unexpected) VLAN tag makes felix_check_xtr_pkt() -> ptp_classify_raw() fail to see these as PTP packets at the link partner's receiving end, and return PTP_CLASS_NONE (because the BPF classifier is not written to expect 2 VLAN tags). The reason why packets have 2 VLAN tags is because the transmission code treats VLAN incorrectly. Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX feature. Therefore, at xmit time, all VLANs should be in the skb head, and none should be in the hwaccel area. This is done by: static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb, netdev_features_t features) { if (skb_vlan_tag_present(skb) && !vlan_hw_offload_capable(features, skb->vlan_proto)) skb = __vlan_hwaccel_push_inside(skb); return skb; } But ocelot_port_inject_frame() handles things incorrectly: ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb)); void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op) { (...) if (vlan_tag) ocelot_ifh_set_vlan_tci(ifh, vlan_tag); (...) } The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head is by calling: static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb) { skb->vlan_present = 0; } which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get(). This means that ocelot, when it calls skb_vlan_tag_get(), sees (and uses) a residual skb->vlan_tci, while the same VLAN tag is _already_ in the skb head. The trivial fix for double VLAN headers is to replace the content of ocelot_ifh_port_set() with: if (skb_vlan_tag_present(skb)) ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb)); but this would not be correct either, because, as mentioned, vlan_hw_offload_capable() is false for us, so we'd be inserting dead code and we'd always transmit packets with VID=0 in the injection frame header. I can't actually test the ocelot switchdev driver and rely exclusively on code inspection, but I don't think traffic from 8021q uppers has ever been injected properly, and not double-tagged. Thus I'm blaming the introduction of VLAN fields in the injection header - early driver code. As hinted at in the early conclusion, what we _want_ to happen for VLAN transmission was already described once in commit 5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit"). ocelot_xmit_get_vlan_info() intends to ensure that if the port through which we're transmitting is under a VLAN-aware bridge, the outer VLAN tag from the skb head is stripped from there and inserted into the injection frame header (so that the packet is processed in hardware through that actual VLAN). And in all other cases, the packet is sent with VID=0 in the injection frame header, since the port is VLAN-unaware and has logic to strip this VID on egress (making it invisible to the wire). Fixes: 08d02364b12f ("net: mscc: fix the injection header") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-15dsa: lan9303: consistent naming for PHY address parameterChristian Eggers1-2/+2
Name it 'addr' instead of 'port' or 'phy'. Signed-off-by: Christian Eggers <ceggers@arri.de> Link: https://patch.msgid.link/20240715123050.21202-1-ceggers@arri.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: prepare 'dsa_tag_8021q_bridge_join' for standalone usePawel Dembicki1-1/+2
The 'dsa_tag_8021q_bridge_join' could be used as a generic implementation of the 'ds->ops->port_bridge_join()' function. However, it is necessary to synchronize their arguments. This patch also moves the 'tx_fwd_offload' flag configuration line into 'dsa_tag_8021q_bridge_join' body. Currently, every (sja1105) driver sets it, and the future vsc73xx implementation will also need it for simplification. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20240713211620.1125910-11-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: Define max num of bridges in tag8021q implementationPawel Dembicki1-0/+5
Max number of bridges in tag8021q implementation is strictly limited by VBID size: 3 bits. But zero is reserved and only 7 values can be used. This patch adds define which describe maximum possible value. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20240713211620.1125910-10-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: dsa: Use conduit and user termsFlorian Fainelli1-1/+1
Use more inclusive terms throughout the DSA subsystem by moving away from "master" which is replaced by "conduit" and "slave" which is replaced by "user". No functional changes. Acked-by: Rob Herring <robh@kernel.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-2-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-04net: dsa: sja1105: always enable the send_meta optionsVladimir Oltean1-4/+0
incl_srcpt has the limitation, mentioned in commit b4638af8885a ("net: dsa: sja1105: always enable the INCL_SRCPT option"), that frames with a MAC DA of 01:80:c2:xx:yy:zz will be received as 01:80:c2:00:00:zz unless PTP RX timestamping is enabled. The incl_srcpt option was initially unconditionally enabled, then that changed with commit 42824463d38d ("net: dsa: sja1105: Limit use of incl_srcpt to bridge+vlan mode"), then again with b4638af8885a ("net: dsa: sja1105: always enable the INCL_SRCPT option"). Bottom line is that it now needs to be always enabled, otherwise the driver does not have a reliable source of information regarding source_port and switch_id for link-local traffic (tag_8021q VLANs may be imprecise since now they identify an entire bridging domain when ports are not standalone). If we accept that PTP RX timestamping (and therefore, meta frame generation) is always enabled in hardware, then that limitation could be avoided and packets with any MAC DA can be properly received, because meta frames do contain the original bytes from the MAC DA of their associated link-local packet. This change enables meta frame generation unconditionally, which also has the nice side effects of simplifying the switch control path (a switch reset is no longer required on hwtstamping settings change) and the tagger data path (it no longer needs to be informed whether to expect meta frames or not - it always does). Fixes: 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through standalone ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: move pdelay_rsp correction field to tail tagChristian Eggers1-0/+2
For PDelay_Resp messages we will likely have a negative value in the correction field. The switch hardware cannot correctly update such values (produces an off by one error in the UDP checksum), so it must be moved to the time stamp field in the tail tag. Format of the correction field is 48 bit ns + 16 bit fractional ns. After updating the correction field, clone is no longer required hence it is freed. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: add packet transmission timestampingChristian Eggers1-0/+8
This patch adds the routines for transmission of ptp packets. When the ptp pdelay_req packet to be transmitted, it uses the deferred xmit worker to schedule the packets. During irq_setup, interrupt for Sync, Pdelay_req and Pdelay_rsp are enabled. So interrupt is triggered for all three packets. But for p2p1step, we require only time stamp of Pdelay_req packet. Hence to avoid posting of the completion from ISR routine for Sync and Pdelay_resp packets, ts_en flag is introduced. This controls which packets need to processed for timestamp. After the packet is transmitted, ISR is triggered. The time at which packet transmitted is recorded to separate register. This value is reconstructed to absolute time and posted to the user application through socket error queue. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: add packet reception timestampingChristian Eggers1-0/+21
Rx Timestamping is done through 4 additional bytes in tail tag. Whenever the ptp packet is received, the 4 byte hardware time stamped value is added before 1 byte tail tag. Also, bit 7 in tail tag indicates it as PTP frame. This 4 byte value is extracted from the tail tag and reconstructed to absolute time and assigned to skb hwtstamp. If the packet received in PDelay_Resp, then partial ingress timestamp is subtracted from the correction field. Since user space tools expects to be done in hardware. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: add 4 bytes in tail tag when ptp enabledArun Ramadoss1-0/+22
When the PTP is enabled in hardware bit 6 of PTP_MSG_CONF1 register, the transmit frame needs additional 4 bytes before the tail tag. It is needed for all the transmission packets irrespective of PTP packets or not. The 4-byte timestamp field is 0 for frames other than Pdelay_Resp. For the one-step Pdelay_Resp, the switch needs the receive timestamp of the Pdelay_Req message so that it can put the turnaround time in the correction field. Since PTP has to be enabled for both Transmission and reception timestamping, driver needs to track of the tx and rx setting of the all the user ports in the switch. Two flags hw_tx_en and hw_rx_en are added in ksz_port to track the timestampping setting of each port. When any one of ports has tx or rx timestampping enabled, bit 6 of PTP_MSG_CONF1 is set and it is indicated to tag_ksz.c through tagger bytes. This flag adds 4 additional bytes to the tail tag. When tx and rx timestamping of all the ports are disabled, then 4 bytes are not added. Tested using hwstamp -i <interface> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> # mostly api Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-01net: dsa: tag_qca: fix wrong MGMT_DATA2 sizeChristian Marangi1-2/+2
It was discovered that MGMT_DATA2 can contain up to 28 bytes of data instead of the 12 bytes written in the Documentation by accounting the limit of 16 bytes declared in Documentation subtracting the first 4 byte in the packet header. Update the define with the real world value. Tested-by: Ronald Wahl <ronald.wahl@raritan.com> Fixes: c2ee8181fddb ("net: dsa: tag_qca: add define for handling mgmt Ethernet packet") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Cc: stable@vger.kernel.org # v5.18+ Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-22net: dsa: move tag_8021q headers to their proper placeVladimir Oltean1-30/+1
tag_8021q definitions are all over the place. Some are exported to linux/dsa/8021q.h (visible by DSA core, taggers, switch drivers and everyone else), and some are in dsa_priv.h. Move the structures that don't need external visibility into tag_8021q.c, and the ones which don't need the world or switch drivers to see them into tag_8021q.h. We also have the tag_8021q.h inclusion from switch.c, which is basically the entire reason why tag_8021q.c was built into DSA in commit 8b6e638b4be2 ("net: dsa: build tag_8021q.c as part of DSA core"). I still don't know how to better deal with that, so leave it alone. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-14net: dsa: qca8k: fix ethtool autocast mib for big-endian systemsChristian Marangi1-1/+1
The switch sends autocast mib in little-endian. This is problematic for big-endian system as the values needs to be converted. Fix this by converting each mib value to cpu byte order. Fixes: 5c957c7ca78c ("net: dsa: qca8k: add support for mib autocast in Ethernet packet") Tested-by: Pawel Dembicki <paweldembicki@gmail.com> Tested-by: Lech Perczak <lech.perczak@gmail.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-14net: dsa: qca8k: fix inband mgmt for big-endian systemsChristian Marangi1-3/+3
The header and the data of the skb for the inband mgmt requires to be in little-endian. This is problematic for big-endian system as the mgmt header is written in the cpu byte order. Fix this by converting each value for the mgmt header and data to little-endian, and convert to cpu byte order the mgmt header and data sent by the switch. Fixes: 5950c7c0a68c ("net: dsa: qca8k: add support for mgmt read/write in Ethernet packet") Tested-by: Pawel Dembicki <paweldembicki@gmail.com> Tested-by: Lech Perczak <lech.perczak@gmail.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Lech Perczak <lech.perczak@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-28add missing includes and forward declarations to networking includes under ↵Jakub Kicinski1-0/+5
linux/ Similarly to a recent include/net/ cleanup, this patch adds missing includes to networking headers under include/linux. All these problems are currently masked by the existing users including the missing dependency before the broken header. Link: https://lore.kernel.org/all/20220723045755.2676857-1-kuba@kernel.org/ v1 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220726215652.158167-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-02-27net: dsa: tag_8021q: rename dsa_8021q_bridge_tx_fwd_offload_vidVladimir Oltean1-1/+1
The dsa_8021q_bridge_tx_fwd_offload_vid is no longer used just for bridge TX forwarding offload, it is the private VLAN reserved for VLAN-unaware bridging in a way that is compatible with FDB isolation. So just rename it dsa_tag_8021q_bridge_vid. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: tag_8021q: merge RX and TX VLANsVladimir Oltean1-7/+1
In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: tag_8021q: add support for imprecise RX based on the VBIDVladimir Oltean1-1/+5
The sja1105 switch can't populate the PORT field of the tag_8021q header when sending a frame to the CPU with a non-zero VBID. Similar to dsa_find_designated_bridge_port_by_vid() which performs imprecise RX for VLAN-aware bridges, let's introduce a helper in tag_8021q for performing imprecise RX based on the VLAN that it has allocated for a VLAN-unaware bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridgingVladimir Oltean1-6/+6
For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too tied with the sja1105 switch: each port uses the same pvid which is also used for standalone operation (a unique one from which the source port and device ID can be retrieved when packets from that port are forwarded to the CPU). Since each port has a unique pvid when performing autonomous forwarding, the switch must be configured for Shared VLAN Learning (SVL) such that the VLAN ID itself is ignored when performing FDB lookups. Without SVL, packets would always be flooded, since FDB lookup in the source port's VLAN would never find any entry. First of all, to make tag_8021q more palatable to switches which might not support Shared VLAN Learning, let's just use a common VLAN for all ports that are under the same bridge. Secondly, using Shared VLAN Learning means that FDB isolation can never be enforced. But if all ports under the same VLAN-unaware bridge share the same VLAN ID, it can. The disadvantage is that the CPU port can no longer perform precise source port identification for these packets. But at least we have a mechanism which has proven to be adequate for that situation: imprecise RX (dsa_find_designated_bridge_port_by_vid), which is what we use for termination on VLAN-aware bridges. The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the same one as we were previously using for imprecise TX (bridge TX forwarding offload). It is already allocated, it is just a matter of using it. Note that because now all ports under the same bridge share the same VLAN, the complexity of performing a tag_8021q bridge join decreases dramatically. We no longer have to install the RX VLAN of a newly joining port into the port membership of the existing bridge ports. The newly joining port just becomes a member of the VLAN corresponding to that bridge, and the other ports are already members of it from when they joined the bridge themselves. So forwarding works properly. This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put these calls directly into the sja1105 driver. With this new mode of operation, a port controlled by tag_8021q can have two pvids whereas before it could only have one. The pvid for standalone operation is different from the pvid used for VLAN-unaware bridging. This is done, again, so that FDB isolation can be enforced. Let tag_8021q manage this by deleting the standalone pvid when a port joins a bridge, and restoring it when it leaves it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-02net: dsa: tag_qca: add support for handling mgmt and MIB Ethernet packetAnsuel Smith1-0/+7
Add connect/disconnect helper to assign private struct to the DSA switch. Add support for Ethernet mgmt and MIB if the DSA driver provide an handler to correctly parse and elaborate the data. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-02net: dsa: tag_qca: add define for handling MIB packetAnsuel Smith1-0/+10
Add struct to correctly parse a mib Ethernet packet. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-02net: dsa: tag_qca: add define for handling mgmt Ethernet packetAnsuel Smith1-0/+44
Add all the required define to prepare support for mgmt read/write in Ethernet packet. Any packet of this type has to be dropped as the only use of these special packet is receive ack for an mgmt write request or receive data for an mgmt read request. A struct is used that emulates the Ethernet header but is used for a different purpose. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-02net: dsa: tag_qca: move define to include linux/dsaAnsuel Smith1-0/+21
Move tag_qca define to include dir linux/dsa as the qca8k require access to the tagger define to support in-band mdio read/write using ethernet packet. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-0/+1
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-29net: Don't include filter.h from net/sock.hJakub Kicinski1-0/+1
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-14net: dsa: sja1105: fix broken connection with the sja1110 taggerVladimir Oltean1-1/+2
The driver was incorrectly converted assuming that "sja1105" is the only tagger supported by this driver. This results in SJA1110 switches failing to probe: sja1105 spi1.0: Unable to connect to tag protocol "sja1110": -EPROTONOSUPPORT sja1105: probe of spi1.2 failed with error -93 Add DSA_TAG_PROTO_SJA1110 to the list of supported taggers by the sja1105 driver. The sja1105_tagger_data structure format is common for the two tagging protocols. Fixes: c79e84866d2a ("net: dsa: tag_sja1105: convert to tagger-owned data") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: dsa: tag_sja1105: split sja1105_tagger_data into private and public ↵Vladimir Oltean1-9/+4
sections The sja1105 driver messes with the tagging protocol's state when PTP RX timestamping is enabled/disabled. This is fundamentally necessary because the tagger needs to know what to do when it receives a PTP packet. If RX timestamping is enabled, then a metadata follow-up frame is expected, and this holds the (partial) timestamp. So the tagger plays hide-and-seek with the network stack until it also gets the metadata frame, and then presents a single packet, the timestamped PTP packet. But when RX timestamping isn't enabled, there is no metadata frame expected, so the hide-and-seek game must be turned off and the packet must be delivered right away to the network stack. Considering this, we create a pseudo isolation by devising two tagger methods callable by the switch: one to get the RX timestamping state, and one to set it. Since we can't export symbols between the tagger and the switch driver, these methods are exposed through function pointers. After this change, the public portion of the sja1105_tagger_data contains only function pointers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12Revert "net: dsa: move sja1110_process_meta_tstamp inside the tagging ↵Vladimir Oltean1-19/+7
protocol driver" This reverts commit 6d709cadfde68dbd12bef12fcced6222226dcb06. The above change was done to avoid calling symbols exported by the switch driver from the tagging protocol driver. With the tagger-owned storage model, we have a new option on our hands, and that is for the switch driver to provide a data consumer handler in the form of a function pointer inside the ->connect_tag_protocol() method. Having a function pointer avoids the problems of the exported symbols approach. By creating a handler for metadata frames holding TX timestamps on SJA1110, we are able to eliminate an skb queue from the tagger data, and replace it with a simple, and stateless, function pointer. This skb queue is now handled exclusively by sja1105_ptp.c, which makes the code easier to follow, as it used to be before the reverted patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: dsa: tag_sja1105: convert to tagger-owned dataVladimir Oltean1-2/+5
Currently, struct sja1105_tagger_data is a part of struct sja1105_private, and is used by the sja1105 driver to populate dp->priv. With the movement towards tagger-owned storage, the sja1105 driver should not be the owner of this memory. This change implements the connection between the sja1105 switch driver and its tagging protocol, which means that sja1105_tagger_data no longer stays in dp->priv but in ds->tagger_data, and that the sja1105 driver now only populates the sja1105_port_deferred_xmit callback pointer. The kthread worker is now the responsibility of the tagger. The sja1105 driver also alters the tagger's state some more, especially with regard to the PTP RX timestamping state. This will be fixed up a bit in further changes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: dsa: sja1105: move ts_id from sja1105_tagger_dataVladimir Oltean1-1/+0
The TX timestamp ID is incremented by the SJA1110 PTP timestamping callback (->port_tx_timestamp) for every packet, when cloning it. It isn't used by the tagger at all, even though it sits inside the struct sja1105_tagger_data. Also, serialization to this structure is currently done through tagger_data->meta_lock, which is a cheap hack because the meta_lock isn't used for anything else on SJA1110 (sja1105_rcv_meta_state_machine isn't called). This change moves ts_id from sja1105_tagger_data to sja1105_private and introduces a dedicated spinlock for it, also in sja1105_private. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: dsa: sja1105: make dp->priv point directly to sja1105_tagger_dataVladimir Oltean1-7/+1
The design of the sja1105 tagger dp->priv is that each port has a separate struct sja1105_port, and the sp->data pointer points to a common struct sja1105_tagger_data. We have removed all per-port members accessible by the tagger, and now only struct sja1105_tagger_data remains. Make dp->priv point directly to this. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: dsa: sja1105: remove hwts_tx_en from tagger dataVladimir Oltean1-1/+0
This tagger property is in fact not used at all by the tagger, only by the switch driver. Therefore it makes sense to be moved to sja1105_private. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: dsa: sja1105: bring deferred xmit implementation in line with ocelot-8021qVladimir Oltean1-3/+8
When the ocelot-8021q driver was converted to deferred xmit as part of commit 8d5f7954b7c8 ("net: dsa: felix: break at first CPU port during init and teardown"), the deferred implementation was deliberately made subtly different from what sja1105 has. The implementation differences lied on the following observations: - There might be a race between these two lines in tag_sja1105.c: skb_queue_tail(&sp->xmit_queue, skb_get(skb)); kthread_queue_work(sp->xmit_worker, &sp->xmit_work); and the skb dequeue logic in sja1105_port_deferred_xmit(). For example, the xmit_work might be already queued, however the work item has just finished walking through the skb queue. Because we don't check the return code from kthread_queue_work, we don't do anything if the work item is already queued. However, nobody will take that skb and send it, at least until the next timestampable skb is sent. This creates additional (and avoidable) TX timestamping latency. To close that race, what the ocelot-8021q driver does is it doesn't keep a single work item per port, and a skb timestamping queue, but rather dynamically allocates a work item per packet. - It is also unnecessary to have more than one kthread that does the work. So delete the per-port kthread allocations and replace them with a single kthread which is global to the switch. This change brings the two implementations in line by applying those observations to the sja1105 driver as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: dsa: tag_ocelot: convert to tagger-owned dataVladimir Oltean1-2/+10
The felix driver makes very light use of dp->priv, and the tagger is effectively stateless. dp->priv is practically only needed to set up a callback to perform deferred xmit of PTP and STP packets using the ocelot-8021q tagging protocol (the main ocelot tagging protocol makes no use of dp->priv, although this driver sets up dp->priv irrespective of actual tagging protocol in use). struct felix_port (what used to be pointed to by dp->priv) is removed and replaced with a two-sided structure. The public side of this structure, visible to the switch driver, is ocelot_8021q_tagger_data. The private side is ocelot_8021q_tagger_private, and the latter structure physically encapsulates the former. The public half of the tagger data structure can be accessed through a helper of the same name (ocelot_8021q_tagger_data) which also sanity-checks the protocol currently in use by the switch. The public/private split was requested by Andrew Lunn. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-08net: dsa: keep the bridge_dev and bridge_num as part of the same structureVladimir Oltean1-4/+3
The main desire behind this is to provide coherent bridge information to the fast path without locking. For example, right now we set dp->bridge_dev and dp->bridge_num from separate code paths, it is theoretically possible for a packet transmission to read these two port properties consecutively and find a bridge number which does not correspond with the bridge device. Another desire is to start passing more complex bridge information to dsa_switch_ops functions. For example, with FDB isolation, it is expected that drivers will need to be passed the bridge which requested an FDB/MDB entry to be offloaded, and along with that bridge_dev, the associated bridge_num should be passed too, in case the driver might want to implement an isolation scheme based on that number. We already pass the {bridge_dev, bridge_num} pair to the TX forwarding offload switch API, however we'd like to remove that and squash it into the basic bridge join/leave API. So that means we need to pass this pair to the bridge join/leave API. During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we call the driver's .port_bridge_leave with what used to be our dp->bridge_dev, but provided as an argument. When bridge_dev and bridge_num get folded into a single structure, we need to preserve this behavior in dsa_port_bridge_leave: we need a copy of what used to be in dp->bridge. Switch drivers check bridge membership by comparing dp->bridge_dev with the provided bridge_dev, but now, if we provide the struct dsa_bridge as a pointer, they cannot keep comparing dp->bridge to the provided pointer, since this only points to an on-stack copy. To make this obvious and prevent driver writers from forgetting and doing stupid things, in this new API, the struct dsa_bridge is provided as a full structure (not very large, contains an int and a pointer) instead of a pointer. An explicit comparison function needs to be used to determine bridge membership: dsa_port_offloads_bridge(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: dsa: make dp->bridge_num one-basedVladimir Oltean1-3/+3
I have seen too many bugs already due to the fact that we must encode an invalid dp->bridge_num as a negative value, because the natural tendency is to check that invalid value using (!dp->bridge_num). Latest example can be seen in commit 1bec0f05062c ("net: dsa: fix bridge_num not getting cleared after ports leaving the bridge"). Convert the existing users to assume that dp->bridge_num == 0 is the encoding for invalid, and valid bridge numbers start from 1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-03net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridgeVladimir Oltean1-0/+1
Normally it is expected that the dsa_device_ops :: rcv() method finishes parsing the DSA tag and consumes it, then never looks at it again. But commit c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix") added support for RX timestamping in a very unconventional way. On this switch, a partial timestamp is available in the DSA header, but the driver got away with not parsing that timestamp right away, but instead delayed that parsing for a little longer: dsa_switch_rcv(): nskb = cpu_dp->rcv(skb, dev); <------------- not here -> ocelot_rcv() ... skb = nskb; skb_push(skb, ETH_HLEN); skb->pkt_type = PACKET_HOST; skb->protocol = eth_type_trans(skb, skb->dev); ... if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here -> felix_rxtstamp() return 0; When in felix_rxtstamp(), this driver accounted for the fact that eth_type_trans() happened in the meanwhile, so it got a hold of the extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes from the current skb->data. This worked for quite some time but was quite fragile from the very beginning. Not to mention that having DSA tag parsing split in two different files, under different folders (net/dsa/tag_ocelot.c vs drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to come that they might break this. Finally, the blamed commit does the following: at the end of ocelot_rcv(), it checks whether the skb payload contains a VLAN header. If it does, and this port is under a VLAN-aware bridge, that VLAN ID might not be correct in the sense that the packet might have suffered VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID from the skb payload using __skb_vlan_pop(), and take the classified VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the classified VLAN, and the skb payload is VLAN-untagged. The big problem is that __skb_vlan_pop() does: memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); __skb_pull(skb, VLAN_HLEN); aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes from the skb headroom (effectively also moving skb->data, by definition). So for felix_rxtstamp()'s fragile logic, all bets are off now. Instead of having the "extraction" pointer point to the DSA header, it actually points to 4 bytes _inside_ the extraction header. Corollary, the last 4 bytes of the "extraction" header are in fact 4 stale bytes of the destination MAC address from the Ethernet header, from prior to the __skb_vlan_pop() movement. So of course, RX timestamps are completely bogus when the system is configured in this way. The fix is actually very simple: just don't structure the code like that. For better or worse, the DSA PTP timestamping API does not offer a straightforward way for drivers to present their RX timestamps, but other drivers (sja1105) have established a simple mechanism to carry their RX timestamp from dsa_device_ops :: rcv() all the way to dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to simply save the partial timestamp to the skb->cb, and complete it later. Question: why don't we simply populate the skb's struct skb_shared_hwtstamps from ocelot_rcv(), and bother with this complication of propagating the timestamp to felix_rxtstamp()? Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether PTP packets need sleepable context to retrieve the full RX timestamp. Currently felix_rxtstamp() answers "no, thanks" to that question, and calls ocelot_ptp_gettime64() from softirq atomic context. This is understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so hardware access does not require sleeping. But the felix driver is preparing for the introduction of other switches where hardware access is over a slow bus like SPI or MDIO: https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/ So I would like to keep this code structure, so the rework needed when that driver will need PTP support will be minimal (answer "yes, I need deferred context for this skb's RX timestamp", then the partial timestamp will still be found in the skb->cb. Fixes: ea440cd2d9b2 ("net: dsa: tag_ocelot: use VLAN information from tagging header when available") Reported-by: Po Liu <po.liu@nxp.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: tag_8021q: make dsa_8021q_{rx,tx}_vid take dp as argumentVladimir Oltean1-2/+3
Pass a single argument to dsa_8021q_rx_vid and dsa_8021q_tx_vid that contains the necessary information from the two arguments that are currently provided: the switch and the port number. Also rename those functions so that they have a dsa_port_* prefix, since they operate on a struct dsa_port *. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-0/+62
tools/testing/selftests/net/ioam6.sh 7b1700e009cc ("selftests: net: modify IOAM tests for undef bits") bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch libVladimir Oltean1-0/+12
Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driverVladimir Oltean1-0/+37
As explained here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ DSA tagging protocol drivers cannot depend on symbols exported by switch drivers, because this creates a circular dependency that breaks module autoloading. The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function exported by the common ocelot switch lib. This function looks at OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the DSA tag, for PTP timestamping