summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2024-10-10close_range(): fix the logics in descriptor table trimmingAl Viro1-4/+4
commit 678379e1d4f7443b170939525d3312cfc37bf86b upstream. Cloning a descriptor table picks the size that would cover all currently opened files. That's fine for clone() and unshare(), but for close_range() there's an additional twist - we clone before we close, and it would be a shame to have close_range(3, ~0U, CLOSE_RANGE_UNSHARE) leave us with a huge descriptor table when we are not going to keep anything past stderr, just because some large file descriptor used to be open before our call has taken it out. Unfortunately, it had been dealt with in an inherently racy way - sane_fdtable_size() gets a "don't copy anything past that" argument (passed via unshare_fd() and dup_fd()), close_range() decides how much should be trimmed and passes that to unshare_fd(). The problem is, a range that used to extend to the end of descriptor table back when close_range() had looked at it might very well have stuff grown after it by the time dup_fd() has allocated a new files_struct and started to figure out the capacity of fdtable to be attached to that. That leads to interesting pathological cases; at the very least it's a QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes a snapshot of descriptor table one might have observed at some point. Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination of unshare(CLONE_FILES) with plain close_range(), ending up with a weird state that would never occur with unshare(2) is confusing, to put it mildly. It's not hard to get rid of - all it takes is passing both ends of the range down to sane_fdtable_size(). There we are under ->files_lock, so the race is trivially avoided. So we do the following: * switch close_files() from calling unshare_fd() to calling dup_fd(). * undo the calling convention change done to unshare_fd() in 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE" * introduce struct fd_range, pass a pointer to that to dup_fd() and sane_fdtable_size() instead of "trim everything past that point" they are currently getting. NULL means "we are not going to be punching any holes"; NR_OPEN_MAX is gone. * make sane_fdtable_size() use find_last_bit() instead of open-coding it; it's easier to follow that way. * while we are at it, have dup_fd() report errors by returning ERR_PTR(), no need to use a separate int *errorp argument. Fixes: 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE" Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10cpufreq: Avoid a bad reference count on CPU nodeMiquel Sabaté Solà1-5/+1
commit c0f02536fffbbec71aced36d52a765f8c4493dc2 upstream. In the parse_perf_domain function, if the call to of_parse_phandle_with_args returns an error, then the reference to the CPU device node that was acquired at the start of the function would not be properly decremented. Address this by declaring the variable with the __free(device_node) cleanup attribute. Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20240917134246.584026-1-mikisabate@gmail.com Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10mm/hugetlb: fix memfd_pin_folios resv_huge_pages leakSteve Sistare1-0/+10
commit 26a8ea80929c518bdec5e53a5776f95919b7c88e upstream. memfd_pin_folios followed by unpin_folios leaves resv_huge_pages elevated if the pages were not already faulted in. During a normal page fault, resv_huge_pages is consumed here: hugetlb_fault() alloc_hugetlb_folio() dequeue_hugetlb_folio_vma() dequeue_hugetlb_folio_nodemask() dequeue_hugetlb_folio_node_exact() free_huge_pages-- resv_huge_pages-- During memfd_pin_folios, the page is created by calling alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and resv_huge_pages is not modified: memfd_alloc_folio() alloc_hugetlb_folio_nodemask() dequeue_hugetlb_folio_nodemask() dequeue_hugetlb_folio_node_exact() free_huge_pages-- alloc_hugetlb_folio_nodemask has other callers that must not modify resv_huge_pages. Therefore, to fix, define an alternate version of alloc_hugetlb_folio_nodemask for this call site that adjusts resv_huge_pages. Link: https://lkml.kernel.org/r/1725373521-451395-4-git-send-email-steven.sistare@oracle.com Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10dt-bindings: clock: qcom: Add GPLL9 support on gcc-sc8180xSatya Priya Kakitapalli1-0/+1
commit 648b4bde0aca2980ebc0b90cdfbb80d222370c3d upstream. Add the missing GPLL9 which is required for the gcc sdcc2 clock. Fixes: 0fadcdfdcf57 ("dt-bindings: clock: Add SC8180x GCC binding") Cc: stable@vger.kernel.org Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@quicinc.com> Link: https://lore.kernel.org/r/20240812-gcc-sc8180x-fixes-v2-2-8b3eaa5fb856@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flagsHans Verkuil1-1/+5
commit 599f6899051cb70c4e0aa9fd591b9ee220cb6f14 upstream. The cec_msg_set_reply_to() helper function never zeroed the struct cec_msg flags field, this can cause unexpected behavior if flags was uninitialized to begin with. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: 0dbacebede1e ("[media] cec: move the CEC framework out of staging and to media") Cc: <stable@vger.kernel.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10sched/deadline: Comment sched_dl_entity::dl_server variableDaniel Bristot de Oliveira1-0/+2
commit f23c042ce34ba265cf3129d530702b5d218e3f4b upstream. Add an explanation for the newly added variable. Fixes: 63ba8422f876 ("sched/deadline: Introduce deadline servers") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/147f7aa8cb8fd925f36aa8059af6a35aad08b45a.1716811044.git.bristot@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10dt-bindings: clock: exynos7885: Fix duplicated bindingDavid Virag1-1/+1
commit abf3a3ea9acb5c886c8729191a670744ecd42024 upstream. The numbering in Exynos7885's FSYS CMU bindings has 4 duplicated by accident, with the rest of the bindings continuing with 5. Fix this by moving CLK_MOUT_FSYS_USB30DRD_USER to the end as 11. Since CLK_MOUT_FSYS_USB30DRD_USER is not used in any device tree as of now, and there are no other clocks affected (maybe apart from CLK_MOUT_FSYS_MMC_SDIO_USER which the number was shared with, also not used in a device tree), this is the least impactful way to solve this problem. Fixes: cd268e309c29 ("dt-bindings: clock: Add bindings for Exynos7885 CMU_FSYS") Cc: stable@vger.kernel.org Signed-off-by: David Virag <virag.david003@gmail.com> Link: https://lore.kernel.org/r/20240806121157.479212-2-virag.david003@gmail.com Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10i2c: core: Lock address during client device instantiationHeiner Kallweit1-0/+3
commit 8d3cefaf659265aa82b0373a563fdb9d16a2b947 upstream. Krzysztof reported an issue [0] which is caused by parallel attempts to instantiate the same I2C client device. This can happen if driver supports auto-detection, but certain devices are also instantiated explicitly. The original change isn't actually wrong, it just revealed that I2C core isn't prepared yet to handle this scenario. Calls to i2c_new_client_device() can be nested, therefore we can't use a simple mutex here. Parallel instantiation of devices at different addresses is ok, so we just have to prevent parallel instantiation at the same address. We can use a bitmap with one bit per 7-bit I2C client address, and atomic bit operations to set/check/clear bits. Now a parallel attempt to instantiate a device at the same address will result in -EBUSY being returned, avoiding the "sysfs: cannot create duplicate filename" splash. Note: This patch version includes small cosmetic changes to the Tested-by version, only functional change is that address locking is supported for slave addresses too. [0] https://lore.kernel.org/linux-i2c/9479fe4e-eb0c-407e-84c0-bd60c15baf74@ans.pl/T/#m12706546e8e2414d8f1a0dc61c53393f731685cc Fixes: caba40ec3531 ("eeprom: at24: Probe for DDR3 thermal sensor in the SPD case") Cc: stable@vger.kernel.org Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10perf,x86: avoid missing caller address in stack traces captured in uprobeAndrii Nakryiko1-0/+2
[ Upstream commit cfa7f3d2c526c224a6271cc78a4a27a0de06f4f0 ] When tracing user functions with uprobe functionality, it's common to install the probe (e.g., a BPF program) at the first instruction of the function. This is often going to be `push %rbp` instruction in function preamble, which means that within that function frame pointer hasn't been established yet. This leads to consistently missing an actual caller of the traced function, because perf_callchain_user() only records current IP (capturing traced function) and then following frame pointer chain (which would be caller's frame, containing the address of caller's caller). So when we have target_1 -> target_2 -> target_3 call chain and we are tracing an entry to target_3, captured stack trace will report target_1 -> target_3 call chain, which is wrong and confusing. This patch proposes a x86-64-specific heuristic to detect `push %rbp` (`push %ebp` on 32-bit architecture) instruction being traced. Given entire kernel implementation of user space stack trace capturing works under assumption that user space code was compiled with frame pointer register (%rbp/%ebp) preservation, it seems pretty reasonable to use this instruction as a strong indicator that this is the entry to the function. In that case, return address is still pointed to by %rsp/%esp, so we fetch it and add to stack trace before proceeding to unwind the rest using frame pointer-based logic. We also check for `endbr64` (for 64-bit modes) as another common pattern for function entry, as suggested by Josh Poimboeuf. Even if we get this wrong sometimes for uprobes attached not at the function entry, it's OK because stack trace will still be overall meaningful, just with one extra bogus entry. If we don't detect this, we end up with guaranteed to be missing caller function entry in the stack trace, which is worse overall. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20240729175223.23914-1-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10drm/printer: Allow NULL data in devcoredump printerMatthew Brost1-1/+53
[ Upstream commit 53369581dc0c68a5700ed51e1660f44c4b2bb524 ] We want to determine the size of the devcoredump before writing it out. To that end, we will run the devcoredump printer with NULL data to get the size, alloc data based on the generated offset, then run the devcorecump again with a valid data pointer to print. This necessitates not writing data to the data pointer on the initial pass, when it is NULL. v5: - Better commit message (Jonathan) - Add kerenl doc with examples (Jani) Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240801154118.2547543-3-matthew.brost@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10netfs: Cancel dirty folios that have no storage destinationDavid Howells1-0/+1
[ Upstream commit 8f246b7c0a1be0882374f2ff831a61f0dbe77678 ] Kafs wants to be able to cache the contents of directories (and symlinks), but whilst these are downloaded from the server with the FS.FetchData RPC op and similar, the same as for regular files, they can't be updated by FS.StoreData, but rather have special operations (FS.MakeDir, etc.). Now, rather than redownloading a directory's content after each change made to that directory, kafs modifies the local blob. This blob can be saved out to the cache, and since it's using netfslib, kafs just marks the folios dirty and lets ->writepages() on the directory take care of it, as for an regular file. This is fine as long as there's a cache as although the upload stream is disabled, there's a cache stream to drive the procedure. But if the cache goes away in the meantime, suddenly there's no way do any writes and the code gets confused, complains "R=%x: No submit" to dmesg and leaves the dirty folio hanging. Fix this by just cancelling the store of the folio if neither stream is active. (If there's no cache at the time of dirtying, we should just not mark the folio dirty). Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-23-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10net: napi: Prevent overflow of napi_defer_hard_irqsJoe Damato1-2/+2
[ Upstream commit 08062af0a52107a243f7608fd972edb54ca5b7f8 ] In commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature") napi_defer_irqs was added to net_device and napi_defer_irqs_count was added to napi_struct, both as type int. This value never goes below zero, so there is not reason for it to be a signed int. Change the type for both from int to u32, and add an overflow check to sysfs to limit the value to S32_MAX. The limit of S32_MAX was chosen because the practical limit before this patch was S32_MAX (anything larger was an overflow) and thus there are no behavioral changes introduced. If the extra bit is needed in the future, the limit can be raised. Before this patch: $ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs' $ cat /sys/class/net/eth4/napi_defer_hard_irqs -2147483647 After this patch: $ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs' bash: line 0: echo: write error: Numerical result out of range Similarly, /sys/class/net/XXXXX/tx_queue_len is defined as unsigned: include/linux/netdevice.h: unsigned int tx_queue_len; And has an overflow check: dev_change_tx_queue_len(..., unsigned long new_len): if (new_len != (unsigned int)new_len) return -ERANGE; Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240904153431.307932-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10drivers/perf: arm_spe: Use perf_allow_kernel() for permissionsJames Clark1-7/+1
[ Upstream commit 5e9629d0ae977d6f6916d7e519724804e95f0b07 ] Use perf_allow_kernel() for 'pa_enable' (physical addresses), 'pct_enable' (physical timestamps) and context IDs. This means that perf_event_paranoid is now taken into account and LSM hooks can be used, which is more consistent with other perf_event_open calls. For example PERF_SAMPLE_PHYS_ADDR uses perf_allow_kernel() rather than just perfmon_capable(). This also indirectly fixes the following error message which is misleading because perf_event_paranoid is not taken into account by perfmon_capable(): $ perf record -e arm_spe/pa_enable/ Error: Access to performance monitoring and observability operations is limited. Consider adjusting /proc/sys/kernel/perf_event_paranoid setting ... Suggested-by: Al Grant <al.grant@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240827145113.1224604-1-james.clark@linaro.org Link: https://lore.kernel.org/all/20240807120039.GD37996@noisy.programming.kicks-ass.net/ Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10crypto: simd - Do not call crypto_alloc_tfm during registrationHerbert Xu1-10/+2
[ Upstream commit 3c44d31cb34ce4eb8311a2e73634d57702948230 ] Algorithm registration is usually carried out during module init, where as little work as possible should be carried out. The SIMD code violated this rule by allocating a tfm, this then triggers a full test of the algorithm which may dead-lock in certain cases. SIMD is only allocating the tfm to get at the alg object, which is in fact already available as it is what we are registering. Use that directly and remove the crypto_alloc_tfm call. Also remove some obsolete and unused SIMD API. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10nvme-tcp: check for invalidated or revoked keyHannes Reinecke1-1/+5
[ Upstream commit 5bc46b49c828a6dfaab80b71ecb63fe76a1096d2 ] key_lookup() will always return a key, even if that key is revoked or invalidated. So check for invalid keys before continuing. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10drm/connector: hdmi: Fix writing Dynamic Range Mastering infoframesDerek Foreman1-0/+9
[ Upstream commit f0fa69b5011a45394554fb8061d74fee4d7cd72c ] The largest infoframe we create is the DRM (Dynamic Range Mastering) infoframe which is 26 bytes + a 4 byte header, for a total of 30 bytes. With HDMI_MAX_INFOFRAME_SIZE set to 29 bytes, as it is now, we allocate too little space to pack a DRM infoframe in write_device_infoframe(), leading to an ENOSPC return from hdmi_infoframe_pack(), and never calling the connector's write_infoframe() vfunc. Instead of having HDMI_MAX_INFOFRAME_SIZE defined in two places, replace HDMI_MAX_INFOFRAME_SIZE with HDMI_INFOFRAME_SIZE(MAX) and make MAX 27 bytes - which is defined by the HDMI specification to be the largest infoframe payload. Fixes: f378b77227bc ("drm/connector: hdmi: Add Infoframes generation") Fixes: c602e4959a0c ("drm/connector: hdmi: Create Infoframe DebugFS entries") Signed-off-by: Derek Foreman <derek.foreman@collabora.com> Acked-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240827163918.48160-1-derek.foreman@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10net: test for not too small csum_start in virtio_net_hdr_to_skb()Eric Dumazet1-1/+3
[ Upstream commit 49d14b54a527289d09a9480f214b8c586322310a ] syzbot was able to trigger this warning [1], after injecting a malicious packet through af_packet, setting skb->csum_start and thus the transport header to an incorrect value. We can at least make sure the transport header is after the end of the network header (with a estimated minimal size). [1] [ 67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0 mac=(-1,-1) mac_len=0 net=(16,-6) trans=10 shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0 priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) [ 67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9 [ 67.877764] sk family=17 type=3 proto=0 [ 67.878279] skb linear: 00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00 [ 67.879128] skb frag: 00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02 [ 67.879877] skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.880647] skb frag: 00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00 [ 67.881156] skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.881753] skb frag: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882173] skb frag: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882790] skb frag: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883171] skb frag: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883733] skb frag: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.884206] skb frag: 00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e [ 67.884704] skb frag: 000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00 [ 67.885139] skb frag: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.885677] skb frag: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886042] skb frag: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886408] skb frag: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887020] skb frag: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887384] skb frag: 00000100: 00 00 [ 67.887878] ------------[ cut here ]------------ [ 67.887908] offset (-6) >= skb_headlen() (14) [ 67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs [ 67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme #1011 [ 67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891043] Call Trace: [ 67.891173] <TASK> [ 67.891274] ? __warn (kernel/panic.c:741) [ 67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 67.891348] ? handle_bug (arch/x86/kernel/traps.c:239) [ 67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) [ 67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1)) [ 67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113) [ 67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200) [ 67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13)) [ 67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1)) [ 67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670) [ 67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227) [ 67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604) [ 67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25)) [ 67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1)) [ 67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4)) [ 67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657) [ 67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3)) [ 67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1)) [ 67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4)) [ 67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1)) [ 67.891734] ? do_sock_setsockopt (net/socket.c:2335) [ 67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355) [ 67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1)) [ 67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fixes: 9181d6f8a2bb ("net: add more sanity check in virtio_net_hdr_to_skb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10net: Fix gso_features_check to check for both dev->gso_{ipv4_,}max_sizeDaniel Borkmann1-0/+9
[ Upstream commit e609c959a939660c7519895f853dfa5624c6827a ] Commit 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()") added a dev->gso_max_size test to gso_features_check() in order to fall back to GSO when needed. This was added as it was noticed that some drivers could misbehave if TSO packets get too big. However, the check doesn't respect dev->gso_ipv4_max_size limit. For instance, a device could be configured with BIG TCP for IPv4, but not IPv6. Therefore, add a netif_get_gso_max_size() equivalent to netif_get_gro_max_size() and use the helper to respect both limits before falling back to GSO engine. Fixes: 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240923212242.15669-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10net: Add netif_get_gro_max_size helper for GRODaniel Borkmann1-0/+9
[ Upstream commit e8d4d34df715133c319fabcf63fdec684be75ff8 ] Add a small netif_get_gro_max_size() helper which returns the maximum IPv4 or IPv6 GRO size of the netdevice. We later add a netif_get_gso_max_size() equivalent as well for GSO, so that these helpers can be used consistently instead of open-coded checks. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240923212242.15669-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: e609c959a939 ("net: Fix gso_features_check to check for both dev->gso_{ipv4_,}max_size") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTEDPhil Sutter1-1/+1
[ Upstream commit 76f1ed087b562a469f2153076f179854b749c09a ] Fix the comment which incorrectly defines it as NLA_U32. Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04compiler.h: specify correct attribute for .rodata..c_jump_tableTiezhu Yang1-1/+1
commit c5b1184decc819756ae549ba54c63b6790c4ddfd upstream. Currently, there is an assembler message when generating kernel/bpf/core.o under CONFIG_OBJTOOL with LoongArch compiler toolchain: Warning: setting incorrect section attributes for .rodata..c_jump_table This is because the section ".rodata..c_jump_table" should be readonly, but there is a "W" (writable) part of the flags: $ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c" [34] .rodata..c_j[...] PROGBITS 0000000000000000 0000d2e0 0000000000000800 0000000000000000 WA 0 0 8 There is no above issue on x86 due to the generated section flag is only "A" (allocatable). In order to silence the warning on LoongArch, specify the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly, then the section attribute of ".rodata..c_jump_table" must be readonly in the kernel/bpf/core.o file. Before: $ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c" 21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3 CONTENTS, ALLOC, LOAD, RELOC, DATA After: $ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c" 21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA By the way, AFAICT, maybe the root cause is related with the different compiler behavior of various archs, so to some extent this change is a workaround for LoongArch, and also there is no effect for x86 which is the only port supported by objtool before LoongArch with this patch. Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> [6.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04lsm: infrastructure management of the sock securityCasey Schaufler1-0/+1
[ Upstream commit 2aff9d20d50ac45dd13a013ef5231f4fb8912356 ] Move management of the sock->sk_security blob out of the individual security modules and into the security infrastructure. Instead of allocating the blobs from within the modules the modules tell the infrastructure how much space is required, and the space is allocated there. Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject tweak] Signed-off-by: Paul Moore <paul@paul-moore.com> Stable-dep-of: 63dff3e48871 ("lsm: add the inode_free_security_rcu() LSM implementation hook") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04soc: qcom: geni-se: add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registersDouglas Anderson1-0/+9
[ Upstream commit b03ffc76b83c1a7d058454efbcf1bf0e345ef1c2 ] For UART devices the M_GP_LENGTH is the TX word count. For other devices this is the transaction word count. For UART devices the S_GP_LENGTH is the RX word count. The IRQ_EN set/clear registers allow you to set or clear bits in the IRQ_EN register without needing a read-modify-write. Acked-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20240610152420.v4.1.Ife7ced506aef1be3158712aa3ff34a006b973559@changeid Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20240906131336.23625-4-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: cc4a0e5754a1 ("serial: qcom-geni: fix console corruption") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04lsm: add the inode_free_security_rcu() LSM implementation hookPaul Moore1-0/+1
commit 63dff3e48871b0583be5032ff8fb7260c349a18c upstream. The LSM framework has an existing inode_free_security() hook which is used by LSMs that manage state associated with an inode, but due to the use of RCU to protect the inode, special care must be taken to ensure that the LSMs do not fully release the inode state until it is safe from a RCU perspective. This patch implements a new inode_free_security_rcu() implementation hook which is called when it is safe to free the LSM's internal inode state. Unfortunately, this new hook does not have access to the inode itself as it may already be released, so the existing inode_free_security() hook is retained for those LSMs which require access to the inode. Cc: stable@vger.kernel.org Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04icmp: change the order of rate limitsEric Dumazet1-0/+2
commit 8c2bd38b95f75f3d2a08c93e35303e26d480d24e upstream. ICMP messages are ratelimited : After the blamed commits, the two rate limiters are applied in this order: 1) host wide ratelimit (icmp_global_allow()) 2) Per destination ratelimit (inetpeer based) In order to avoid side-channels attacks, we need to apply the per destination check first. This patch makes the following change : 1) icmp_global_allow() checks if the host wide limit is reached. But credits are not yet consumed. This is deferred to 3) 2) The per destination limit is checked/updated. This might add a new node in inetpeer tree. 3) icmp_global_consume() consumes tokens if prior operations succeeded. This means that host wide ratelimit is still effective in keeping inetpeer tree small even under DDOS. As a bonus, I removed icmp_global.lock as the fast path can use a lock-free operation. Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited") Fixes: 4cdf507d5452 ("icmp: add a global rate limitation") Reported-by: Keyu Man <keyu.man@email.ucr.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240829144641.3880376-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04usbnet: fix cyclical race on disconnect with work queueOliver Neukum1-0/+15
commit 04e906839a053f092ef53f4fb2d610983412b904 upstream. The work can submit URBs and the URBs can schedule the work. This cycle needs to be broken, when a device is to be stopped. Use a flag to do so. This is a design issue as old as the driver. Signed-off-by: Oliver Neukum <oneukum@suse.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") CC: stable@vger.kernel.org Link: https://patch.msgid.link/20240919123525.688065-1-oneukum@suse.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04tcp: check skb is non-NULL in tcp_rto_delta_us()Josh Hunt1-2/+19
[ Upstream commit c8770db2d54437a5f49417ae7b46f7de23d14db6 ] We have some machines running stock Ubuntu 20.04.6 which is their 5.4.0-174-generic kernel that are running ceph and recently hit a null ptr dereference in tcp_rearm_rto(). Initially hitting it from the TLP path, but then later we also saw it getting hit from the RACK case as well. Here are examples of the oops messages we saw in each of those cases: Jul 26 15:05:02 rx [11061395.780353] BUG: kernel NULL pointer dereference, address: 0000000000000020 Jul 26 15:05:02 rx [11061395.787572] #PF: supervisor read access in kernel mode Jul 26 15:05:02 rx [11061395.792971] #PF: error_code(0x0000) - not-present page Jul 26 15:05:02 rx [11061395.798362] PGD 0 P4D 0 Jul 26 15:05:02 rx [11061395.801164] Oops: 0000 [#1] SMP NOPTI Jul 26 15:05:02 rx [11061395.805091] CPU: 0 PID: 9180 Comm: msgr-worker-1 Tainted: G W 5.4.0-174-generic #193-Ubuntu Jul 26 15:05:02 rx [11061395.814996] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023 Jul 26 15:05:02 rx [11061395.825952] RIP: 0010:tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061395.830656] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3 Jul 26 15:05:02 rx [11061395.849665] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246 Jul 26 15:05:02 rx [11061395.855149] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000 Jul 26 15:05:02 rx [11061395.862542] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60 Jul 26 15:05:02 rx [11061395.869933] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8 Jul 26 15:05:02 rx [11061395.877318] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900 Jul 26 15:05:02 rx [11061395.884710] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30 Jul 26 15:05:02 rx [11061395.892095] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000 Jul 26 15:05:02 rx [11061395.900438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 26 15:05:02 rx [11061395.906435] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0 Jul 26 15:05:02 rx [11061395.913822] PKRU: 55555554 Jul 26 15:05:02 rx [11061395.916786] Call Trace: Jul 26 15:05:02 rx [11061395.919488] Jul 26 15:05:02 rx [11061395.921765] ? show_regs.cold+0x1a/0x1f Jul 26 15:05:02 rx [11061395.925859] ? __die+0x90/0xd9 Jul 26 15:05:02 rx [11061395.929169] ? no_context+0x196/0x380 Jul 26 15:05:02 rx [11061395.933088] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0 Jul 26 15:05:02 rx [11061395.938216] ? ip6_sublist_rcv_finish+0x3d/0x50 Jul 26 15:05:02 rx [11061395.943000] ? __bad_area_nosemaphore+0x50/0x1a0 Jul 26 15:05:02 rx [11061395.947873] ? bad_area_nosemaphore+0x16/0x20 Jul 26 15:05:02 rx [11061395.952486] ? do_user_addr_fault+0x267/0x450 Jul 26 15:05:02 rx [11061395.957104] ? ipv6_list_rcv+0x112/0x140 Jul 26 15:05:02 rx [11061395.961279] ? __do_page_fault+0x58/0x90 Jul 26 15:05:02 rx [11061395.965458] ? do_page_fault+0x2c/0xe0 Jul 26 15:05:02 rx [11061395.969465] ? page_fault+0x34/0x40 Jul 26 15:05:02 rx [11061395.973217] ? tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061395.977313] ? tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061395.981408] tcp_send_loss_probe+0x10b/0x220 Jul 26 15:05:02 rx [11061395.985937] tcp_write_timer_handler+0x1b4/0x240 Jul 26 15:05:02 rx [11061395.990809] tcp_write_timer+0x9e/0xe0 Jul 26 15:05:02 rx [11061395.994814] ? tcp_write_timer_handler+0x240/0x240 Jul 26 15:05:02 rx [11061395.999866] call_timer_fn+0x32/0x130 Jul 26 15:05:02 rx [11061396.003782] __run_timers.part.0+0x180/0x280 Jul 26 15:05:02 rx [11061396.008309] ? recalibrate_cpu_khz+0x10/0x10 Jul 26 15:05:02 rx [11061396.012841] ? native_x2apic_icr_write+0x30/0x30 Jul 26 15:05:02 rx [11061396.017718] ? lapic_next_event+0x21/0x30 Jul 26 15:05:02 rx [11061396.021984] ? clockevents_program_event+0x8f/0xe0 Jul 26 15:05:02 rx [11061396.027035] run_timer_softirq+0x2a/0x50 Jul 26 15:05:02 rx [11061396.031212] __do_softirq+0xd1/0x2c1 Jul 26 15:05:02 rx [11061396.035044] do_softirq_own_stack+0x2a/0x40 Jul 26 15:05:02 rx [11061396.039480] Jul 26 15:05:02 rx [11061396.041840] do_softirq.part.0+0x46/0x50 Jul 26 15:05:02 rx [11061396.046022] __local_bh_enable_ip+0x50/0x60 Jul 26 15:05:02 rx [11061396.050460] _raw_spin_unlock_bh+0x1e/0x20 Jul 26 15:05:02 rx [11061396.054817] nf_conntrack_tcp_packet+0x29e/0xbe0 [nf_conntrack] Jul 26 15:05:02 rx [11061396.060994] ? get_l4proto+0xe7/0x190 [nf_conntrack] Jul 26 15:05:02 rx [11061396.066220] nf_conntrack_in+0xe9/0x670 [nf_conntrack] Jul 26 15:05:02 rx [11061396.071618] ipv6_conntrack_local+0x14/0x20 [nf_conntrack] Jul 26 15:05:02 rx [11061396.077356] nf_hook_slow+0x45/0xb0 Jul 26 15:05:02 rx [11061396.081098] ip6_xmit+0x3f0/0x5d0 Jul 26 15:05:02 rx [11061396.084670] ? ipv6_anycast_cleanup+0x50/0x50 Jul 26 15:05:02 rx [11061396.089282] ? __sk_dst_check+0x38/0x70 Jul 26 15:05:02 rx [11061396.093381] ? inet6_csk_route_socket+0x13b/0x200 Jul 26 15:05:02 rx [11061396.098346] inet6_csk_xmit+0xa7/0xf0 Jul 26 15:05:02 rx [11061396.102263] __tcp_transmit_skb+0x550/0xb30 Jul 26 15:05:02 rx [11061396.106701] tcp_write_xmit+0x3c6/0xc20 Jul 26 15:05:02 rx [11061396.110792] ? __alloc_skb+0x98/0x1d0 Jul 26 15:05:02 rx [11061396.114708] __tcp_push_pending_frames+0x37/0x100 Jul 26 15:05:02 rx [11061396.119667] tcp_push+0xfd/0x100 Jul 26 15:05:02 rx [11061396.123150] tcp_sendmsg_locked+0xc70/0xdd0 Jul 26 15:05:02 rx [11061396.127588] tcp_sendmsg+0x2d/0x50 Jul 26 15:05:02 rx [11061396.131245] inet6_sendmsg+0x43/0x70 Jul 26 15:05:02 rx [11061396.135075] __sock_sendmsg+0x48/0x70 Jul 26 15:05:02 rx [11061396.138994] ____sys_sendmsg+0x212/0x280 Jul 26 15:05:02 rx [11061396.143172] ___sys_sendmsg+0x88/0xd0 Jul 26 15:05:02 rx [11061396.147098] ? __seccomp_filter+0x7e/0x6b0 Jul 26 15:05:02 rx [11061396.151446] ? __switch_to+0x39c/0x460 Jul 26 15:05:02 rx [11061396.155453] ? __switch_to_asm+0x42/0x80 Jul 26 15:05:02 rx [11061396.159636] ? __switch_to_asm+0x5a/0x80 Jul 26 15:05:02 rx [11061396.163816] __sys_sendmsg+0x5c/0xa0 Jul 26 15:05:02 rx [11061396.167647] __x64_sys_sendmsg+0x1f/0x30 Jul 26 15:05:02 rx [11061396.171832] do_syscall_64+0x57/0x190 Jul 26 15:05:02 rx [11061396.175748] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 Jul 26 15:05:02 rx [11061396.181055] RIP: 0033:0x7f1ef692618d Jul 26 15:05:02 rx [11061396.184893] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ca ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 48 89 44 24 08 e8 fe ee ff ff 48 Jul 26 15:05:02 rx [11061396.203889] RSP: 002b:00007f1ef4a26aa0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e Jul 26 15:05:02 rx [11061396.211708] RAX: ffffffffffffffda RBX: 000000000000084b RCX: 00007f1ef692618d Jul 26 15:05:02 rx [11061396.219091] RDX: 0000000000004000 RSI: 00007f1ef4a26b10 RDI: 0000000000000275 Jul 26 15:05:02 rx [11061396.226475] RBP: 0000000000004000 R08: 0000000000000000 R09: 0000000000000020 Jul 26 15:05:02 rx [11061396.233859] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000084b Jul 26 15:05:02 rx [11061396.241243] R13: 00007f1ef4a26b10 R14: 0000000000000275 R15: 000055592030f1e8 Jul 26 15:05:02 rx [11061396.248628] Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet mii ast drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler nft_ct sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 mlx5_core hid_generic pci_hyperv_intf crc32_pclmul tls usbhid ahci mlxfw bnxt_en libahci hid nvme i2c_piix4 nvme_core wmi Jul 26 15:05:02 rx [11061396.324334] CR2: 0000000000000020 Jul 26 15:05:02 rx [11061396.327944] ---[ end trace 68a2b679d1cfb4f1 ]--- Jul 26 15:05:02 rx [11061396.433435] RIP: 0010:tcp_rearm_rto+0xe4/0x160 Jul 26 15:05:02 rx [11061396.438137] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3 Jul 26 15:05:02 rx [11061396.457144] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246 Jul 26 15:05:02 rx [11061396.462629] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000 Jul 26 15:05:02 rx [11061396.470012] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60 Jul 26 15:05:02 rx [11061396.477396] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8 Jul 26 15:05:02 rx [11061396.484779] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900 Jul 26 15:05:02 rx [11061396.492164] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30 Jul 26 15:05:02 rx [11061396.499547] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000 Jul 26 15:05:02 rx [11061396.507886] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 26 15:05:02 rx [11061396.513884] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0 Jul 26 15:05:02 rx [11061396.521267] PKRU: 55555554 Jul 26 15:05:02 rx [11061396.524230] Kernel panic - not syncing: Fatal exception in interrupt Jul 26 15:05:02 rx [11061396.530885] Kernel Offset: 0x1b200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Jul 26 15:05:03 rx [11061396.660181] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- After we hit this we disabled TLP by setting tcp_early_retrans to 0 and then hit the crash in the RACK case: Aug 7 07:26:16 rx [1006006.265582] BUG: kernel NULL pointer dereference, address: 0000000000000020 Aug 7 07:26:16 rx [1006006.272719] #PF: supervisor read access in kernel mode Aug 7 07:26:16 rx [1006006.278030] #PF: error_code(0x0000) - not-present page Aug 7 07:26:16 rx [1006006.283343] PGD 0 P4D 0 Aug 7 07:26:16 rx [1006006.286057] Oops: 0000 [#1] SMP NOPTI Aug 7 07:26:16 rx [1006006.289896] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.4.0-174-generic #193-Ubuntu Aug 7 07:26:16 rx [1006006.299107] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023 Aug 7 07:26:16 rx [1006006.309970] RIP: 0010:tcp_rearm_rto+0xe4/0x160 Aug 7 07:26:16 rx [1006006.314584] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3 Aug 7 07:26:16 rx [1006006.333499] RSP: 0018:ffffb42600a50960 EFLAGS: 00010246 Aug 7 07:26:16 rx [1006006.338895] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000 Aug 7 07:26:16 rx [1006006.346193] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff92d687ed8160 Aug 7 07:26:16 rx [1006006.353489] RBP: ffffb42600a50978 R08: 0000000000000000 R09: 00000000cd896dcc Aug 7 07:26:16 rx [1006006.360786] R10: ffff92dc3404f400 R11: 0000000000000001 R12: ffff92d687ed8000 Aug 7 07:26:16 rx [1006006.368084] R13: ffff92d687ed8160 R14: 00000000cd896dcc R15: 00000000cd8fca81 Aug 7 07:26:16 rx [1006006.375381] FS: 0000000000000000(0000) GS:ffff93158ad40000(0000) knlGS:0000000000000000 Aug 7 07:26:16 rx [1006006.383632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 7 07:26:16 rx [1006006.389544] CR2: 0000000000000020 CR3: 0000003e775ce006 CR4: 0000000000760ee0 Aug 7 07:26:16 rx [1006006.396839] PKRU: 55555554 Aug 7 07:26:16 rx [1006006.399717] Call Trace: Aug 7 07:26:16 rx [1006006.402335] Aug 7 07:26:16 rx [1006006.404525] ? show_regs.cold+0x1a/0x1f Aug 7 07:26:16 rx [1006006.408532] ? __die+0x90/0xd9 Aug 7 07:26:16 rx [1006006.411760] ? no_context+0x196/0x380 Aug 7 07:26:16 rx [1006006.415599] ? __bad_area_nosemaphore+0x50/0x1a0 Aug 7 07:26:16 rx [1006006.420392] ? _raw_spin_lock+0x1e/0x30 Aug 7 07:26:16 rx [1006006.424401] ? bad_area_nosemaphore+0x16/0x20 Aug 7 07:26:16 rx [1006006.428927] ? do_user_addr_fault+0x267/0x450 Aug 7 07:26:16 rx [1006006.433450] ? __do_page_fault+0x58/0x90 Aug 7 07:26:16 rx [1006006.437542] ? do_page_fault+0x2c/0xe0 Aug 7 07:26:16 rx [1006006.441470] ? page_fault+0x34/0x40 Aug 7 07:26:16 rx [1006006.445134] ? tcp_rearm_rto+0xe4/0x160 Aug 7 07:26:16 rx [1006006.449145] tcp_ack+0xa32/0xb30 Aug 7 07:26:16 rx [1006006.452542] tcp_rcv_established+0x13c/0x670 Aug 7 07:26:16 rx [1006006.456981] ? sk_filter_trim_cap+0x48/0x220 Aug 7 07:26:16 rx [1006006.461419] tcp_v6_do_rcv+0xdb/0x450 Aug 7 07:26:16 rx [1006006.465257] tcp_v6_rcv+0xc2b/0xd10 Aug 7 07:26:16 rx [1006006.468918] ip6_protocol_deliver_rcu+0xd3/0x4e0 Aug 7 07:26:16 rx [1006006.473706] ip6_input_finish+0x15/0x20 Aug 7 07:26:16 rx [1006006.477710] ip6_input+0xa2/0xb0 Aug 7 07:26:16 rx [1006006.481109] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0 Aug 7 07:26:16 rx [1006006.486151] ip6_sublist_rcv_finish+0x3d/0x50 Aug 7 07:26:16 rx [1006006.490679] ip6_sublist_rcv+0x1aa/0x250 Aug 7 07:26:16 rx [1006006.494779] ? ip6_rcv_finish_core.isra.0+0xa0/0xa0 Aug 7 07:26:16 rx [1006006.499828] ipv6_list_rcv+0x112/0x140 Aug 7 07:26:16 rx [1006006.503748] __netif_receive_skb_list_core+0x1a4/0x250 Aug 7 07:26:16 rx [1006006.509057] netif_receive_skb_list_internal+0x1a1/0x2b0 Aug 7 07:26:16 rx [1006006.514538] gro_normal_list.part.0+0x1e/0x40 Aug 7 07:26:16 rx [1006006.519068] napi_complete_done+0x91/0x130 Aug 7 07:26:16 rx [1006006.523352] mlx5e_napi_poll+0x18e/0x610 [mlx5_core] Aug 7 07:26:16 rx [1006006.528481] net_rx_action+0x142/0x390 Aug 7 07:26:16 rx [1006006.532398] __do_softirq+0xd1/0x2c1 Aug 7 07:26:16 rx [1006006.536142] irq_exit+0xae/0xb0 Aug 7 07:26:16 rx [1006006.