From 9630b585b607bd26f505d34620b14d75b9a5af7d Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Fri, 4 Nov 2022 19:04:59 +0300 Subject: drm/msm/gem: Prevent blocking within shrinker loop Consider this scenario: 1. APP1 continuously creates lots of small GEMs 2. APP2 triggers `drop_caches` 3. Shrinker starts to evict APP1 GEMs, while APP1 produces new purgeable GEMs 4. msm_gem_shrinker_scan() returns non-zero number of freed pages and causes shrinker to try shrink more 5. msm_gem_shrinker_scan() returns non-zero number of freed pages again, goto 4 6. The APP2 is blocked in `drop_caches` until APP1 stops producing purgeable GEMs To prevent this blocking scenario, check number of remaining pages that GPU shrinker couldn't release due to a GEM locking contention or shrinking rejection. If there are no remaining pages left to shrink, then there is no need to free up more pages and shrinker may break out from the loop. This problem was found during shrinker/madvise IOCTL testing of virtio-gpu driver. The MSM driver is affected in the same way. Reviewed-by: Rob Clark Reviewed-by: Thomas Zimmermann Fixes: b352ba54a820 ("drm/msm/gem: Convert to using drm_gem_lru") Signed-off-by: Dmitry Osipenko Link: https://lore.kernel.org/all/20230108210445.3948344-2-dmitry.osipenko@collabora.com/ --- include/drm/drm_gem.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index a17c2f903f81..b46ade812443 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -475,7 +475,9 @@ int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, void drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock); void drm_gem_lru_remove(struct drm_gem_object *obj); void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj); -unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan, +unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, + unsigned int nr_to_scan, + unsigned long *remaining, bool (*shrink)(struct drm_gem_object *obj)); #endif /* __DRM_GEM_H__ */ -- cgit v1.2.3 From c28cd1f3433c7e339315d1ddacaeacf0fdfbe252 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Mar 2023 17:46:38 -0800 Subject: clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro We already mark fwnodes as initialized when they are registered as clock providers. We do this so that fw_devlink can tell when a clock driver doesn't use the driver core framework to probe/initialize its device. This ensures fw_devlink doesn't block the consumers of such a clock provider indefinitely. However, some users of CLK_OF_DECLARE() macros don't use the same node that matches the macro as the node for the clock provider, but they initialize the entire node. To cover these cases, also mark the nodes that match the macros as initialized when the init callback function is called. An example of this is "stericsson,u8500-clks" that's handled using CLK_OF_DECLARE() and looks something like this: clocks { compatible = "stericsson,u8500-clks"; prcmu_clk: prcmu-clock { #clock-cells = <1>; }; prcc_pclk: prcc-periph-clock { #clock-cells = <2>; }; prcc_kclk: prcc-kernel-clock { #clock-cells = <2>; }; prcc_reset: prcc-reset-controller { #reset-cells = <2>; }; ... }; This patch makes sure that "clocks" is marked as initialized so that fw_devlink knows that all nodes under it have been initialized. If the driver creates struct devices for some of the subnodes, fw_devlink is smart enough to know to wait for those devices to probe, so no special handling is required for those cases. Cc: Greg Kroah-Hartman Reported-by: Linus Walleij Link: https://lore.kernel.org/lkml/CACRpkdamxDX6EBVjKX5=D3rkHp17f5pwGdBVhzFU90-0MHY6dQ@mail.gmail.com/ Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()") Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20230302014639.297514-1-saravanak@google.com Reviewed-by: Linus Walleij Tested-by: Linus Walleij Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 842e72a5348f..c9f5276006a0 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -1363,7 +1363,13 @@ struct clk_hw_onecell_data { struct clk_hw *hws[]; }; -#define CLK_OF_DECLARE(name, compat, fn) OF_DECLARE_1(clk, name, compat, fn) +#define CLK_OF_DECLARE(name, compat, fn) \ + static void __init name##_of_clk_init_declare(struct device_node *np) \ + { \ + fn(np); \ + fwnode_dev_initialized(of_fwnode_handle(np), true); \ + } \ + OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare) /* * Use this macro when you have a driver that requires two initialization -- cgit v1.2.3 From 5adc409340b1fc82bc1175e602d14ac82ac685e3 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 1 Mar 2023 11:04:34 +0100 Subject: ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helper x86 ACPI boards which ship with only Android as their factory image usually have pretty broken ACPI tables, relying on everything being hardcoded in the factory kernel image and often disabling parts of the ACPI enumeration kernel code to avoid the broken tables causing issues. Part of this broken ACPI code is that sometimes these boards have _AEI ACPI GPIO event handlers which are broken. So far this has been dealt with in the platform/x86/x86-android-tablets.c module, which contains various workarounds for these devices, by it calling acpi_gpiochip_free_interrupts() on gpiochip-s with troublesome handlers to disable the handlers. But in some cases this is too late, if the handlers are of the edge type then gpiolib-acpi.c's code will already have run them at boot. This can cause issues such as GPIOs ending up as owned by "ACPI:OpRegion", making them unavailable for drivers which actually need them. Boards with these broken ACPI tables are already listed in drivers/acpi/x86/utils.c for e.g. acpi_quirk_skip_i2c_client_enumeration(). Extend the quirks mechanism for a new acpi_quirk_skip_gpio_event_handlers() helper, this re-uses the DMI-ids rather then having to duplicate the same DMI table in gpiolib-acpi.c . Also add the new ACPI_QUIRK_SKIP_GPIO_EVENT_HANDLERS quirk to existing boards with troublesome ACPI gpio event handlers, so that the current acpi_gpiochip_free_interrupts() hack can be removed from x86-android-tablets.c . Signed-off-by: Hans de Goede Acked-by: Andy Shevchenko Signed-off-by: Rafael J. Wysocki --- include/acpi/acpi_bus.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 0584e9f6e339..57acb895c038 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -657,6 +657,7 @@ static inline bool acpi_quirk_skip_acpi_ac_and_battery(void) #if IS_ENABLED(CONFIG_X86_ANDROID_TABLETS) bool acpi_quirk_skip_i2c_client_enumeration(struct acpi_device *adev); int acpi_quirk_skip_serdev_enumeration(struct device *controller_parent, bool *skip); +bool acpi_quirk_skip_gpio_event_handlers(void); #else static inline bool acpi_quirk_skip_i2c_client_enumeration(struct acpi_device *adev) { @@ -668,6 +669,10 @@ acpi_quirk_skip_serdev_enumeration(struct device *controller_parent, bool *skip) *skip = false; return 0; } +static inline bool acpi_quirk_skip_gpio_event_handlers(void) +{ + return false; +} #endif #ifdef CONFIG_PM -- cgit v1.2.3 From eb59eca0d8ac15f8c1b7f1cd35999455a90292c0 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 6 Mar 2023 08:56:31 +0100 Subject: interconnect: fix provider registration API The current interconnect provider interface is inherently racy as providers are expected to be added before being fully initialised. Specifically, nodes are currently not added and the provider data is not initialised until after registering the provider which can cause racing DT lookups to fail. Add a new provider API which will be used to fix up the interconnect drivers. The old API is reimplemented using the new interface and will be removed once all drivers have been fixed. Fixes: 11f1ceca7031 ("interconnect: Add generic on-chip interconnect API") Fixes: 87e3031b6fbd ("interconnect: Allow endpoints translation via DT") Cc: stable@vger.kernel.org # 5.1 Reviewed-by: Konrad Dybcio Signed-off-by: Johan Hovold Tested-by: Luca Ceresoli # i.MX8MP MSC SM2-MB-EP1 Board Link: https://lore.kernel.org/r/20230306075651.2449-4-johan+linaro@kernel.org Signed-off-by: Georgi Djakov --- include/linux/interconnect-provider.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include') diff --git a/include/linux/interconnect-provider.h b/include/linux/interconnect-provider.h index cd5c5a27557f..d12cd18aab3f 100644 --- a/include/linux/interconnect-provider.h +++ b/include/linux/interconnect-provider.h @@ -122,6 +122,9 @@ int icc_link_destroy(struct icc_node *src, struct icc_node *dst); void icc_node_add(struct icc_node *node, struct icc_provider *provider); void icc_node_del(struct icc_node *node); int icc_nodes_remove(struct icc_provider *provider); +void icc_provider_init(struct icc_provider *provider); +int icc_provider_register(struct icc_provider *provider); +void icc_provider_deregister(struct icc_provider *provider); int icc_provider_add(struct icc_provider *provider); void icc_provider_del(struct icc_provider *provider); struct icc_node_data *of_icc_get_from_provider(struct of_phandle_args *spec); @@ -167,6 +170,15 @@ static inline int icc_nodes_remove(struct icc_provider *provider) return -ENOTSUPP; } +static inline void icc_provider_init(struct icc_provider *provider) { } + +static inline int icc_provider_register(struct icc_provider *provider) +{ + return -ENOTSUPP; +} + +static inline void icc_provider_deregister(struct icc_provider *provider) { } + static inline int icc_provider_add(struct icc_provider *provider) { return -ENOTSUPP; -- cgit v1.2.3 From 5cf9d015be160e2d90d29ae74ef1364390e8fce8 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Wed, 8 Mar 2023 13:47:11 -0700 Subject: clk: Avoid invalid function names in CLK_OF_DECLARE() After commit c28cd1f3433c ("clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro"), drivers/clk/mvebu/kirkwood.c fails to build: drivers/clk/mvebu/kirkwood.c:358:1: error: expected identifier or '(' CLK_OF_DECLARE(98dx1135_clk, "marvell,mv98dx1135-core-clock", ^ include/linux/clk-provider.h:1367:21: note: expanded from macro 'CLK_OF_DECLARE' static void __init name##_of_clk_init_declare(struct device_node *np) \ ^ :124:1: note: expanded from here 98dx1135_clk_of_clk_init_declare ^ drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE' OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare) ^ :125:3: note: expanded from here 98dx1135_clk_of_clk_init_declare ^ drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE' OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare) ^ :125:3: note: expanded from here 98dx1135_clk_of_clk_init_declare ^ drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE' OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare) ^ :125:3: note: expanded from here 98dx1135_clk_of_clk_init_declare ^ C function names must start with either an alphabetic letter or an underscore. To avoid generating invalid function names from clock names, add two underscores to the beginning of the identifier. Fixes: c28cd1f3433c ("clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro") Suggested-by: Saravana Kannan Signed-off-by: Nathan Chancellor Link: https://lore.kernel.org/r/20230308-clk_of_declare-fix-v1-1-317b741e2532@kernel.org Reviewed-by: Saravana Kannan Reported-by: Naresh Kamboju Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index c9f5276006a0..6f3175f0678a 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -1364,12 +1364,12 @@ struct clk_hw_onecell_data { }; #define CLK_OF_DECLARE(name, compat, fn) \ - static void __init name##_of_clk_init_declare(struct device_node *np) \ + static void __init __##name##_of_clk_init_declare(struct device_node *np) \ { \ fn(np); \ fwnode_dev_initialized(of_fwnode_handle(np), true); \ } \ - OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare) + OF_DECLARE_1(clk, name, compat, __##name##_of_clk_init_declare) /* * Use this macro when you have a driver that requires two initialization -- cgit v1.2.3 From 4b1a2c2a8e0ddcb89c5f6c5003bd9b53142f69e3 Mon Sep 17 00:00:00 2001 From: Lee Duncan Date: Wed, 28 Sep 2022 11:13:50 -0700 Subject: scsi: core: Add BLIST_NO_VPD_SIZE for some VDASD Some storage, such as AIX VDASD (virtual storage) and IBM 2076 (front end), fail as a result of commit c92a6b5d6335 ("scsi: core: Query VPD size before getting full page"). That commit changed getting SCSI VPD pages so that we now read just enough of the page to get the actual page size, then read the whole page in a second read. The problem is that the above mentioned hardware returns zero for the page size, because of a firmware error. In such cases, until the firmware is fixed, this new blacklist flag says to revert to the original method of reading the VPD pages, i.e. try to read a whole buffer's worth on the first try. [mkp: reworked somewhat] Fixes: c92a6b5d6335 ("scsi: core: Query VPD size before getting full page") Reported-by: Martin Wilck Suggested-by: Hannes Reinecke Signed-off-by: Lee Duncan Link: https://lore.kernel.org/r/20220928181350.9948-1-leeman.duncan@gmail.com Tested-by: Srikar Dronamraju Signed-off-by: Martin K. Petersen --- include/scsi/scsi_device.h | 2 ++ include/scsi/scsi_devinfo.h | 6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index de310f21406c..f10a008e5bfa 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -145,6 +145,7 @@ struct scsi_device { const char * model; /* ... after scan; point to static string */ const char * rev; /* ... "nullnullnullnull" before scan */ +#define SCSI_DEFAULT_VPD_LEN 255 /* default SCSI VPD page size (max) */ struct scsi_vpd __rcu *vpd_pg0; struct scsi_vpd __rcu *vpd_pg83; struct scsi_vpd __rcu *vpd_pg80; @@ -215,6 +216,7 @@ struct scsi_device { * creation time */ unsigned ignore_media_change:1; /* Ignore MEDIA CHANGE on resume */ unsigned silence_suspend:1; /* Do not print runtime PM related messages */ + unsigned no_vpd_size:1; /* No VPD size reported in header */ unsigned int queue_stopped; /* request queue is quiesced */ bool offline_already; /* Device offline message logged */ diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h index 5d14adae21c7..6b548dc2c496 100644 --- a/include/scsi/scsi_devinfo.h +++ b/include/scsi/scsi_devinfo.h @@ -32,7 +32,8 @@ #define BLIST_IGN_MEDIA_CHANGE ((__force blist_flags_t)(1ULL << 11)) /* do not do automatic start on add */ #define BLIST_NOSTARTONADD ((__force blist_flags_t)(1ULL << 12)) -#define __BLIST_UNUSED_13 ((__force blist_flags_t)(1ULL << 13)) +/* do not ask for VPD page size first on some broken targets */ +#define BLIST_NO_VPD_SIZE ((__force blist_flags_t)(1ULL << 13)) #define __BLIST_UNUSED_14 ((__force blist_flags_t)(1ULL << 14)) #define __BLIST_UNUSED_15 ((__force blist_flags_t)(1ULL << 15)) #define __BLIST_UNUSED_16 ((__force blist_flags_t)(1ULL << 16)) @@ -74,8 +75,7 @@ #define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \ (__force blist_flags_t) \ ((__force __u64)__BLIST_LAST_USED - 1ULL))) -#define __BLIST_UNUSED_MASK (__BLIST_UNUSED_13 | \ - __BLIST_UNUSED_14 | \ +#define __BLIST_UNUSED_MASK (__BLIST_UNUSED_14 | \ __BLIST_UNUSED_15 | \ __BLIST_UNUSED_16 | \ __BLIST_UNUSED_24 | \ -- cgit v1.2.3 From fe9ae05cfbe587dda724fcf537c00bc2f287da62 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 8 Mar 2023 11:50:12 +0100 Subject: fbdev: Fix incorrect page mapping clearance at fb_deferred_io_release() The recent fix for the deferred I/O by the commit 3efc61d95259 ("fbdev: Fix invalid page access after closing deferred I/O devices") caused a regression when the same fb device is opened/closed while it's being used. It resulted in a frozen screen even if something is redrawn there after the close. The breakage is because the patch was made under a wrong assumption of a single open; in the current code, fb_deferred_io_release() cleans up the page mapping of the pageref list and it calls cancel_delayed_work_sync() unconditionally, where both are no correct behavior for multiple opens. This patch adds a refcount for the opens of the device, and applies the cleanup only when all files get closed. As both fb_deferred_io_open() and _close() are called always in the fb_info lock (mutex), it's safe to use the normal int for the refcounting. Also, a useless BUG_ON() is dropped. Fixes: 3efc61d95259 ("fbdev: Fix invalid page access after closing deferred I/O devices") Cc: Signed-off-by: Takashi Iwai Reviewed-by: Patrik Jakobsson Signed-off-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20230308105012.1845-1-tiwai@suse.de --- include/linux/fb.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/fb.h b/include/linux/fb.h index 73eb1f85ea8e..05e40fcc7696 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -212,6 +212,7 @@ struct fb_deferred_io { /* delay between mkwrite and deferred handler */ unsigned long delay; bool sort_pagereflist; /* sort pagelist by offset */ + int open_count; /* number of opened files; protected by fb_info lock */ struct mutex lock; /* mutex that protects the pageref list */ struct list_head pagereflist; /* list of pagerefs for touched pages */ /* callback */ -- cgit v1.2.3 From 0b04d4c0542e8573a837b1d81b94209e48723b25 Mon Sep 17 00:00:00 2001 From: Douglas Raillard Date: Mon, 6 Mar 2023 12:25:49 +0000 Subject: f2fs: Fix f2fs_truncate_partial_nodes ftrace event Fix the nid_t field so that its size is correctly reported in the text format embedded in trace.dat files. As it stands, it is reported as being of size 4: field:nid_t nid[3]; offset:24; size:4; signed:0; Instead of 12: field:nid_t nid[3]; offset:24; size:12; signed:0; This also fixes the reported offset of subsequent fields so that they match with the actual struct layout. Signed-off-by: Douglas Raillard Reviewed-by: Mukesh Ojha Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim --- include/trace/events/f2fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h index 1322d34a5dfc..99cbc5949e3c 100644 --- a/include/trace/events/f2fs.h +++ b/include/trace/events/f2fs.h @@ -512,7 +512,7 @@ TRACE_EVENT(f2fs_truncate_partial_nodes, TP_STRUCT__entry( __field(dev_t, dev) __field(ino_t, ino) - __field(nid_t, nid[3]) + __array(nid_t, nid, 3) __field(int, depth) __field(int, err) ), -- cgit v1.2.3 From f85949f98206b3b11d92d695cea4efda6a81f00e Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Thu, 9 Mar 2023 13:25:27 +0100 Subject: xdp: add xdp_set_features_flag utility routine Introduce xdp_set_features_flag utility routine in order to update dynamically xdp_features according to the dynamic hw configuration via ethtool (e.g. changing number of hw rx/tx queues). Add xdp_clear_features_flag() in order to clear all xdp_feature flag. Reviewed-by: Shay Agroskin Signed-off-by: Lorenzo Bianconi Signed-off-by: Jakub Kicinski --- include/net/xdp.h | 11 +++++++++++ include/uapi/linux/netdev.h | 2 ++ 2 files changed, 13 insertions(+) (limited to 'include') diff --git a/include/net/xdp.h b/include/net/xdp.h index d517bfac937b..41c57b8b1671 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -428,12 +428,18 @@ MAX_XDP_METADATA_KFUNC, #ifdef CONFIG_NET u32 bpf_xdp_metadata_kfunc_id(int id); bool bpf_dev_bound_kfunc_id(u32 btf_id); +void xdp_set_features_flag(struct net_device *dev, xdp_features_t val); void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg); void xdp_features_clear_redirect_target(struct net_device *dev); #else static inline u32 bpf_xdp_metadata_kfunc_id(int id) { return 0; } static inline bool bpf_dev_bound_kfunc_id(u32 btf_id) { return false; } +static inline void +xdp_set_features_flag(struct net_device *dev, xdp_features_t val) +{ +} + static inline void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg) { @@ -445,4 +451,9 @@ xdp_features_clear_redirect_target(struct net_device *dev) } #endif +static inline void xdp_clear_features_flag(struct net_device *dev) +{ + xdp_set_features_flag(dev, 0); +} + #endif /* __LINUX_NET_XDP_H__ */ diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index 8c4e3e536c04..ed134fbdfd32 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -33,6 +33,8 @@ enum netdev_xdp_act { NETDEV_XDP_ACT_HW_OFFLOAD = 16, NETDEV_XDP_ACT_RX_SG = 32, NETDEV_XDP_ACT_NDO_XMIT_SG = 64, + + NETDEV_XDP_ACT_MASK = 127, }; enum { -- cgit v1.2.3 From 47053904e18282af4525a02e3e0f519f014fc7f9 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 24 Feb 2023 19:16:40 +0000 Subject: KVM: arm64: timers: Convert per-vcpu virtual offset to a global value Having a per-vcpu virtual offset is a pain. It needs to be synchronized on each update, and expands badly to a setup where different timers can have different offsets, or have composite offsets (as with NV). So let's start by replacing the use of the CNTVOFF_EL2 shadow register (which we want to reclaim for NV anyway), and make the virtual timer carry a pointer to a VM-wide offset. This simplifies the code significantly. It also addresses two terrible bugs: - The use of CNTVOFF_EL2 leads to some nice offset corruption when the sysreg gets reset, as reported by Joey. - The kvm mutex is taken from a vcpu ioctl, which goes against the locking rules... Reported-by: Joey Gouly Reviewed-by: Reiji Watanabe Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20230224173915.GA17407@e124191.cambridge.arm.com Tested-by: Joey Gouly Link: https://lore.kernel.org/r/20230224191640.3396734-1-maz@kernel.org Signed-off-by: Oliver Upton --- include/kvm/arm_arch_timer.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include') diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h index 71916de7c6c4..c52a6e6839da 100644 --- a/include/kvm/arm_arch_timer.h +++ b/include/kvm/arm_arch_timer.h @@ -23,6 +23,19 @@ enum kvm_arch_timer_regs { TIMER_REG_CTL, }; +struct arch_timer_offset { + /* + * If set, pointer to one of the offsets in the kvm's offset + * structure. If NULL, assume a zero offset. + */ + u64 *vm_offset; +}; + +struct arch_timer_vm_data { + /* Offset applied to the virtual timer/counter */ + u64 voffset; +}; + struct arch_timer_context { struct kvm_vcpu *vcpu; @@ -32,6 +45,8 @@ struct arch_timer_context { /* Emulated Timer (may be unused) */ struct hrtimer hrtimer; + /* Offset for this counter/timer */ + struct arch_timer_offset offset; /* * We have multiple paths which can save/restore the timer state onto * the hardware, so we need some way of keeping track of where the -- cgit v1.2.3 From ab909509850b27fd39b8ba99e44cda39dbc3858c Mon Sep 17 00:00:00 2001 From: Niklas Schnelle Date: Mon, 6 Mar 2023 16:10:11 +0100 Subject: PCI: s390: Fix use-after-free of PCI resources with per-function hotplug On s390 PCI functions may be hotplugged individually even when they belong to a multi-function device. In particular on an SR-IOV device VFs may be removed and later re-added. In commit a50297cf8235 ("s390/pci: separate zbus creation from scanning") it was missed however that struct pci_bus and struct zpci_bus's resource list retained a reference to the PCI functions MMIO resources even though those resources are released and freed on hot-unplug. These stale resources may subsequently be claimed when the PCI function re-appears resulting in use-after-free. One idea of fixing this use-after-free in s390 specific code that was investigated was to simply keep resources around from the moment a PCI function first appeared until the whole virtual PCI bus created for a multi-function device disappears. The problem with this however is that due to the requirement of artificial MMIO addreesses (address cookies) extra logic is then needed to keep the address cookies compatible on re-plug. At the same time the MMIO resources semantically belong to the PCI function so tying their lifecycle to the function seems more logical. Instead a simpler approach is to remove the resources of an individually hot-unplugged PCI function from the PCI bus's resource list while keeping the resources of other PCI functions on the PCI bus untouched. This is done by introducing pci_bus_remove_resource() to remove an individual resource. Similarly the resource also needs to be removed from the struct zpci_bus's resource list. It turns out however, that there is really no need to add the MMIO resources to the struct zpci_bus's resource list at all and instead we can simply use the zpci_bar_struct's resource pointer directly. Fixes: a50297cf8235 ("s390/pci: separate zbus creation from scanning") Signed-off-by: Niklas Schnelle Reviewed-by: Matthew Rosato Acked-by: Bjorn Helgaas Link: https://lore.kernel.org/r/20230306151014.60913-2-schnelle@linux.ibm.com Signed-off-by: Vasily Gorbik --- include/linux/pci.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/pci.h b/include/linux/pci.h index fafd8020c6d7..b50e5c79f7e3 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1438,6 +1438,7 @@ void pci_bus_add_resource(struct pci_bus *bus, struct resource *res, unsigned int flags); struct resource *pci_bus_resource_n(const struct pci_bus *bus, int n); void pci_bus_remove_resources(struct pci_bus *bus); +void pci_bus_remove_resource(struct pci_bus *bus, struct resource *res); int devm_request_pci_bus_resources(struct device *dev, struct list_head *resources); -- cgit v1.2.3 From 8b3a149db461d3286d1e211112de3b44ccaeaf71 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sun, 12 Mar 2023 23:00:03 +0100 Subject: efi: earlycon: Reprobe after parsing config tables Commit 732ea9db9d8a ("efi: libstub: Move screen_info handling to common code") reorganized the earlycon handling so that all architectures pass the screen_info data via a EFI config table instead of populating struct screen_info directly, as the latter is only possible when the EFI stub is baked into the kernel (and not into the decompressor). However, this means that struct screen_info may not have been populated yet by the time the earlycon probe takes place, and this results in a non-functional early console. So let's probe again right after parsing the config tables and populating struct screen_info. Note that this means that earlycon output starts a bit later than before, and so it may fail to capture issues that occur while doing the early EFI initialization. Fixes: 732ea9db9d8a ("efi: libstub: Move screen_info handling to common code") Reported-by: Shawn Guo Tested-by: Shawn Guo Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/efi.h b/include/linux/efi.h index 04a733f0ba95..7aa62c92185f 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -693,6 +693,7 @@ efi_guid_to_str(efi_guid_t *guid, char *out) } extern void efi_init (void); +extern void efi_earlycon_reprobe(void); #ifdef CONFIG_EFI extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */ #else -- cgit v1.2.3 From 934ef33ee75c3846f605f18b65048acd147e3918 Mon Sep 17 00:00:00 2001 From: Jan Beulich Date: Mon, 13 Mar 2023 15:45:48 +0100 Subject: x86/PVH: obtain VGA console info in Dom0 A new platform-op was added to Xen to allow obtaining the same VGA console information PV Dom0 is handed. Invoke the new function and have the output data processed by xen_init_vga(). Signed-off-by: Jan Beulich Reviewed-by: Juergen Gross Link: https://lore.kernel.org/r/8f315e92-7bda-c124-71cc-478ab9c5e610@suse.com Signed-off-by: Juergen Gross --- include/xen/interface/platform.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/xen/interface/platform.h b/include/xen/interface/platform.h index 655d92e803e1..79a443c65ea9 100644 --- a/include/xen/interface/platform.h +++ b/include/xen/interface/platform.h @@ -483,6 +483,8 @@ struct xenpf_symdata { }; DEFINE_GUEST_HANDLE_STRUCT(xenpf_symdata); +#define XENPF_get_dom0_console 64 + struct xen_platform_op { uint32_t cmd; uint32_t interface_version; /* XENPF_INTERFACE_VERSION */ @@ -506,6 +508,7 @@ struct xen_platform_op { struct xenpf_mem_hotadd mem_add; struct xenpf_core_parking core_parking; struct xenpf_symdata symdata; + struct dom0_vga_console_info dom0_console; uint8_t pad[128]; } u; }; -- cgit v1.2.3 From 34e0a279a993debaff03158fc2fbf6a00c093643 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 13 Mar 2023 10:30:02 +0100 Subject: block: do not reverse request order when flushing plug list Commit 26fed4ac4eab ("block: flush plug based on hardware and software queue order") changed flushing of plug list to submit requests one device at a time. However while doing that it also started using list_add_tail() instead of list_add() used previously thus effectively submitting requests in reverse order. Also when forming a rq_list with remaining requests (in case two or more devices are used), we effectively reverse the ordering of the plug list for each device we process. Submitting requests in reverse order has negative impact on performance for rotational disks (when BFQ is not in use). We observe 10-25% regression in random 4k write throughput, as well as ~20% regression in MariaDB OLTP benchmark on rotational storage on btrfs filesystem. Fix the problem by preserving ordering of the plug list when inserting requests into the queuelist as well as by appending to requeue_list instead of prepending to it. Fixes: 26fed4ac4eab ("block: flush plug based on hardware and software queue order") Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20230313093002.11756-1-jack@suse.cz Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index dd5ce1137f04..de0b0c3e7395 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -228,6 +228,12 @@ static inline unsigned short req_get_ioprio(struct request *req) *(listptr) = rq; \ } while (0) +#define rq_list_add_tail(lastpptr, rq) do { \ + (rq)->rq_next = NULL; \ + **(lastpptr) = rq; \ + *(lastpptr) = &rq->rq_next; \ +} while (0) + #define rq_list_pop(listptr) \ ({ \ struct request *__req = NULL; \ -- cgit v1.2.3 From 7ff84910c66c9144cc0de9d9deed9fb84c03aff0 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 14 Mar 2023 06:20:58 -0400 Subject: lockd: set file_lock start and end when decoding nlm4 testargs Commit 6930bcbfb6ce dropped the setting of the file_lock range when decoding a nlm_lock off the wire. This causes the client side grant callback to miss matching blocks and reject the lock, only to rerequest it 30s later. Add a helper function to set the file_lock range from the start and end values that the protocol uses, and have the nlm_lock decoder call that to set up the file_lock args properly. Fixes: 6930bcbfb6ce ("lockd: detect and reject lock arguments that overflow") Reported-by: Amir Goldstein Signed-off-by: Jeff Layton Tested-by: Amir Goldstein Cc: stable@vger.kernel.org #6.0 Signed-off-by: Anna Schumaker --- include/linux/lockd/xdr4.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 9a6b55da8fd6..72831e35dca3 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -22,6 +22,7 @@ #define nlm4_fbig cpu_to_be32(NLM_FBIG) #define nlm4_failed cpu_to_be32(NLM_FAILED) +void nlm4svc_set_file_lock_range(struct file_lock *fl, u64 off, u64 len); bool nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -- cgit v1.2.3 From c2679254b9c9980d9045f0f722cf093a2b1f7590 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Fri, 10 Mar 2023 17:28:56 -0500 Subject: tracing: Make tracepoint lockdep check actually test something A while ago where the trace events had the following: rcu_read_lock_sched_notrace(); rcu_dereference_sched(...); rcu_read_unlock_sched_notrace(); If the tracepoint is enabled, it could trigger RCU issues if called in the wrong place. And this warning was only triggered if lockdep was enabled. If the tracepoint was never enabled with lockdep, the bug would not be caught. To handle this, the above sequence was done when lockdep was enabled regardless if the tracepoint was enabled or not (although the always enabled code really didn't do anything, it would still trigger a warning). But a lot has changed since that lockdep code was added. One is, that sequence no longer triggers any warning. Another is, the tracepoint when enabled doesn't even do that sequence anymore. The main check we care about today is whether RCU is "watching" or not. So if lockdep is enabled, always check if rcu_is_watching() which will trigger a warning if it is not (tracepoints require RCU to be watching). Note, that old sequence did add a bit of overhead when lockdep was enabled, and with the latest kernel updates, would cause the system to slow down enough to trigger kernel "stalled" warnings. Link: http://lore.kernel.org/lkml/20140806181801.GA4605@redhat.com Link: http://lore.kernel.org/lkml/20140807175204.C257CAC5@viggo.jf.intel.com Link: https://lore.kernel.org/lkml/20230307184645.521db5c9@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20230310172856.77406446@gandalf.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu Cc: Dave Hansen Cc: "Paul E. McKenney" Cc: Mathieu Desnoyers Cc: Joel Fernandes Acked-by: Peter Zijlstra (Intel) Acked-by: Paul E. McKenney Fixes: e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Steven Rostedt (Google) --- include/linux/tracepoint.h | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) (limited to 'include') diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index fa1004fcf810..2083f2d2f05b 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -231,12 +231,11 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) * not add unwanted padding between the beginning of the section and the * structure. Force alignment to the same alignment as the section start. * - * When lockdep is enabled, we make sure to always do the RCU portions of - * the tracepoint code, regardless of whether tracing is on. However, - * don't check if the condition is false, due to interaction with idle - * instrumentation. This lets us find RCU issues triggered with tracepoints - * even when this tracepoint is off. This code has no purpose other than - * poking RCU a bit. + * When lockdep is enabled, we make sure to always test if RCU is + * "watching" regardless if the tracepoint is enabled or not. Tracepoints + * require RCU to be active, and it should always warn at the tracepoint + * site if it is not watching, as it will need to be active when the + * tracepoint is enabled. */ #define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ extern int __traceiter_##name(data_proto); \ @@ -249,9 +248,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) TP_ARGS(args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ - rcu_read_lock_sched_notrace(); \ - rcu_dereference_sched(__tracepoint_##name.funcs);\ - rcu_read_unlock_sched_notrace(); \ + WARN_ON_ONCE(!rcu_is_watching()); \ } \ } \ __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args), \ -- cgit v1.2.3 From 4b397c06cb987935b1b097336532aa6b4210e091 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 10 Mar 2023 19:11:09 +0000 Subject: net: tunnels: annotate lockless accesses to dev->needed_headroom IP tunnels can apparently update dev->needed_headroom in their xmit path. This patch takes care of three tunnels xmit, and also the core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA() helpers. More changes might be needed for completeness. BUG: KCSAN: data-race in ip_tunnel_xmit / ip_tunnel_xmit read to 0xffff88815b9da0ec of 2 bytes by task 888 on cpu 1: ip_tunnel_xmit+0x1270/0x1730 net/ipv4/ip_tunnel.c:803 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 write to 0xffff88815b9da0ec of 2 bytes by task 2379 on cpu 0: ip_tunnel_xmit+0x1294/0x1730 net/ipv4/ip_tunnel.c:804 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip6_finish_output2+0x9bc/0xc50 net/ipv6/ip6_output.c:134 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x39a/0x4e0 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] mld_sendpack+0x438/0x6a0 net/ipv6/mcast.c:1820 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x519/0x7b0 net/ipv6/mcast.c:2653 process_one_work+0x3e6/0x750 kernel/workqueue.c:2390 worker_thread+0x5f2/0xa10 kernel/workqueue.c:2537 kthread+0x1ac/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 value changed: 0x0dd4 -> 0x0e14 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 2379 Comm: kworker/0:0 Not tainted 6.3.0-rc1-syzkaller-00002-g8ca09d5fa354-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Workqueue: mld mld_ifc_work Fixes: 8eb30be0352d ("ipv6: Create ip6_tnl_xmit") Reported-by: syzbot Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230310191109.2384387-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 6a14b7b11766..470085b121d3 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -297,9 +297,11 @@ struct hh_cache { * relationship HH alignment <= LL alignment. */ #define LL_RESERVED_SPACE(dev) \ - ((((dev)->hard_header_len+(dev)->needed_headroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + ((((dev)->hard_header_len + READ_ONCE((dev)->needed_headroom)) \ + & ~(HH_DATA_MOD - 1)) + HH_DATA_MOD) #define LL_RESERVED_SPACE_EXTRA(dev,extra) \ - ((((dev)->hard_header_len+(dev)->needed_headroom+(extra))&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + ((((dev)->hard_header_len + READ_ONCE((dev)->needed_headroom) + (extra)) \ + & ~(HH_DATA_MOD - 1)) + HH_DATA_MOD) struct header_ops { int (*create) (struct sk_buff *skb, struct net_device *dev, -- cgit v1.2.3 From 0d3c9333d976af41d7dbc6bf4d9d2e95fbdf9c89 Mon Sep 17 00:00:00 2001 From: Liu Ying Date: Tue, 14 Mar 2023 13:50:35 +0800 Subject: drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdoc The returned array size for input formats is set through atomic_get_input_bus_fmts()'s 'num_input_fmts' argument, so use 'num_input_fmts' to represent the array size in the function's kdoc, not 'num_output_fmts'. Fixes: 91ea83306bfa ("drm/bridge: Fix the bridge kernel doc") Fixes: f32df58acc68 ("drm/bridge: Add the necessary bits to support bus format negotiation") Signed-off-by: Liu Ying Reviewed-by: Robert Foss Signed-off-by: Neil Armstrong Link: https://patchwork.freedesktop.org/patch/msgid/20230314055035.3731179-1-victor.liu@nxp.com --- include/drm/drm_bridge.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h index 42f86327b40a..bf964cdfb330 100644 --- a/include/drm/drm_bridge.h +++ b/include/drm/drm_bridge.h @@ -423,11 +423,11 @@ struct drm_bridge_funcs { * * The returned array must be allocated with kmalloc() and will be * freed by the caller. If the allocation fails, NULL should be - * returned. num_output_fmts must be set to the returned array size. + * returned. num_input_fmts must be set to the returned array size. * Formats listed in the returned array should be listed in decreasing * preference order (the core will try all formats until it finds one * that works). When the format is not supported NULL should be - * returned and num_output_fmts should be set to 0. + * returned and num_input_fmts should be set to 0. * * This method is called on all elements of the bridge chain as part of * the bus format negotiation process that happens in -- cgit v1.2.3 From 8e19b87cfce2de2125f11363d7dea3d08f16ccae Mon Sep 17 00:00:00 2001 From: Minwoo Im Date: Thu, 9 Mar 2023 23:31:18 +0900 Subject: nvme-trace: show more opcode names We have more commands to show in the trace. Sync up. Signed-off-by: Minwoo Im Reviewed-by: Chaitanya Kulkarni Signed-off-by: Christoph Hellwig --- include/linux/nvme.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 4fad4aa245fb..779507ac750b 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -812,6 +812,7 @@ enum nvme_opcode { nvme_opcode_name(nvme_cmd_compare), \ nvme_opcode_name(nvme_cmd_write_zeroes), \ nvme_opcode_name(nvme_cmd_dsm), \ + nvme_opcode_name(nvme_cmd_verify), \ nvme_opcode_name(nvme_cmd_resv_register), \ nvme_opcode_name(nvme_cmd_resv_report), \ nvme_opcode_name(nvme_cmd_resv_acquire), \ @@ -1144,10 +1145,14 @@ enum nvme_admin_opcode { nvme_admin_opcode_name(nvme_admin_ns_mgmt), \ nvme_admin_opcode_name(nvme_admin_activate_fw), \ nvme_admin_opcode_name(nvme_admin_download_fw), \ + nvme_admin_opcode_name(nvme_admin_dev_self_test), \ nvme_admin_opcode_name(nvme_admin_ns_attach), \ nvme_admin_opcode_name(nvme_admin_keep_alive), \ nvme_admin_opcode_name(nvme_admin_directive_send), \ nvme_admin_opcode_name(nvme_admin_directive_recv), \ + nvme_admin_opcode_name(nvme_admin_virtual_mgmt), \ + nvme_admin_opcode_name(nvme_admin_nvme_mi_send), \ + nvme_admin_opcode_name(nvme_admin_nvme_mi_recv), \ nvme_admin_opcode_name(nvme_admin_dbbuf), \ nvme_admin_opcode_name(nvme_admin_format_nvm), \ nvme_admin_opcode_name(nvme_admin_security_send), \ -- cgit v1.2.3 From 5f27571382ca42daa3e3d40d1b252bf18c2b61d2 Mon Sep 17 00:00:00 2001 From: Yu Kuai Date: Thu, 23 Feb 2023 17:12:26 +0800 Subject: block: count 'ios' and 'sectors' when io is done for bio-based device While using iostat for raid, I observed very strange 'await' occasionally, and turns out it's due to that 'ios' and 'sectors' is counted in bdev_start_io_acct(), while 'nsecs' is counted in bdev_end_io_acct(). I'm not sure why they are ccounted like that but I think this behaviour is obviously wrong because user will get wrong disk stats. Fix the problem by counting 'ios' and 'sectors' when io is done, like what rq-based device does. Fixes: 394ffa503bc4 ("blk: introduce generic io stat accounting help function") Signed-off-by: Yu Kuai Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20230223091226.1135678-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d1aee08f8c18..941304f17492 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1446,11 +1446,10 @@ static inline void blk_wake_io_task(struct task_struct *waiter) wake_up_process(waiter); } -unsigned long bdev_start_io_acct(struct block_device *bdev, - unsigned int sectors, enum req_op op, +unsigned long bdev_start_io_acct(struct block_device *bdev, enum req_op op, unsigned long start_time); void bdev_end_io_acct(struct block_device *bdev, enum req_op op, - unsigned long start_time); + unsigned int sectors, unsigned long start_time); unsigned long bio_start_io_acct(struct bio *bio); void bio_end_io_acct_remapped(struct bio *bio, unsigned long start_time, -- cgit v1.2.3 From 4e16b6a748df52800c90f3ab181aef8129bd332f Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 15 Mar 2023 16:03:50 -0700 Subject: ynl: broaden the license even more I relicensed Netlink spec code to GPL-2.0 OR BSD-3-Clause but we still put a slightly different license on the uAPI header than the rest of the code. Use the Linux-syscall-note on all the specs and all generated code. It's moot for kernel code, but should not hurt. This way the licenses match everywhere. Cc: Chuck Lever Fixes: 37d9df224d1e ("ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause") Reviewed-by: Chuck Lever Signed-off-by: Jakub Kicinski --- include/uapi/linux/fou.h | 2 +- include/uapi/linux/netdev.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/fou.h b/include/uapi/linux/fou.h index 5041c3598493..b5cd3e7b3775 100644 --- a/include/uapi/linux/fou.h +++ b/include/uapi/linux/fou.h @@ -1,4 +1,4 @@ -/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause */ +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ /* Do not edit directly, auto-generated from: */ /* Documentation/netlink/specs/fou.yaml */ /* YNL-GEN uapi header */ diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index ed134fbdfd32..639524b59930 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -1,4 +1,4 @@ -/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause */ +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ /* Do not edit directly, auto-generated from: */ /* Documentation/netlink/specs/netdev.yaml */ /* YNL-GEN uapi header */ -- cgit v1.2.3 From 2f59823fe696caa844249a90bb3f9aeda69cfe5c Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Thu, 16 Mar 2023 11:37:53 +0800 Subject: net/sched: act_api: add specific EXT_WARN_MSG for tc action In my previous commit 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") I didn't notice the tc action use different enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action. Let's add a TCA_ROOT_EXT_WARN_MSG for tc action specifically and put this param before going to the TCA_ACT_TAB nest. Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") Signed-off-by: Hangbin Liu Acked-by: Jamal Hadi Salim Signed-off-by: Jakub Kicinski --- include/uapi/linux/rtnetlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 25a0af57dd5e..51c13cf9c5ae 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -789,6 +789,7 @@ enum { TCA_ROOT_FLAGS, TCA_ROOT_COUNT, TCA_ROOT_TIME_DELTA, /* in msecs */ + TCA_ROOT_EXT_WARN_MSG, __TCA_ROOT_MAX, #define TCA_ROOT_MAX (__TCA_ROOT_MAX - 1) }; -- cgit v1.2.3 From 3615c78673c332b69aaacefbcde5937c5c706686 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Tue, 14 Mar 2023 13:31:02 +0100 Subject: efi: sysfb_efi: Fix DMI quirks not working for simpledrm Commit 8633ef82f101 ("drivers/firmware: consolidate EFI framebuffer setup for all arches") moved the sysfb_apply_efi_quirks() call in sysfb_init() from before the [sysfb_]parse_mode() call to after it. But sysfb_apply_efi_quirks() modifies the global screen_info struct which [sysfb_]parse_mode() parses, so doing it later is too late. This has broken all DMI based quirks for correcting wrong firmware efifb settings when simpledrm is used. To fix this move the sysfb_apply_efi_quirks() call back to its old place and split the new setup of the efifb_fwnode (which requires the platform_device) into its own function and call that at the place of the moved sysfb_apply_efi_quirks(pd) calls. Fixes: 8633ef82f101 ("drivers/firmware: consolidate EFI framebuffer setup for all arches") Cc: stable@vger.kernel.org Cc: Javier Martinez Canillas Cc: Thomas Zimmermann Signed-off-by: Hans de Goede Reviewed-by: Javier Martinez Canillas Signed-off-by: Ard Biesheuvel --- include/linux/sysfb.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h index 8ba8b5be5567..c1ef5fc60a3c 100644 --- a/include/linux/sysfb.h +++ b/include/linux/sysfb.h @@ -70,11 +70,16 @@ static inline void sysfb_disable(void) #ifdef CONFIG_EFI extern struct efifb_dmi_info efifb_dmi_list[]; -void sysfb_apply_efi_quirks(struct platform_device *pd); +void sysfb_apply_efi_quirks(void); +void sysfb_set_efifb_fwnode(struct platform_device *pd); #else /* CONFIG_EFI */ -static inline void sysfb_apply_efi_quirks(struct platform_device *pd) +static inline void sysfb_apply_efi_quirks(void) +{ +} + +static inline void sysfb_set_efifb_fwnode(struct platform_device *pd) { } -- cgit v1.2.3 From 99669259f3361d759219811e670b7e0742668556 Mon Sep 17 00:00:00 2001 From: Maxime Bizon Date: Thu, 16 Mar 2023 16:33:16 -0700 Subject: net: mdio: fix owner field for mdio buses registered using device-tree Bus ownership is wrong when using of_mdiobus_register() to register an mdio bus. That function is not inline, so when it calls mdiobus_register() the wrong THIS_MODULE value is captured. Signed-off-by: Maxime Bizon Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from PHYs") [florian: fix kdoc, added Fixes tag] Signed-off-by: Florian Fainelli Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/of_mdio.h | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/of_mdio.h b/include/linux/of_mdio.h index da633d34ab86..8a52ef2e6fa6 100644 --- a/include/linux/of_mdio.h +++ b/include/linux/of_mdio.h @@ -14,9 +14,25 @@ #if IS_ENABLED(CONFIG_OF_MDIO) bool of_mdiobus_child_is_phy(struct device_node *child); -int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np); -int devm_of_mdiobus_register(struct device *dev, struct mii_bus *mdio, - struct device_node *np); +int __of_mdiobus_register(struct mii_bus *mdio, struct device_node *np, + struct module *owner); + +static inline int of_mdiobus_register(struct mii_bus *mdio, + struct device_node *np) +{ + return __of_mdiobus_register(mdio, np, THIS_MODULE); +} + +int __devm_of_mdiobus_register(struct device *dev, struct mii_bus *mdio, + struct device_node *np, struct module *owner); + +static inline int devm_of_mdiobus_register(struct device *dev, + struct mii_bus *mdio, + struct device_node *np) +{ + return __devm_of_mdiobus_register(dev, mdio, np, THIS_MODULE); +} + struct mdio_device *of_mdio_find_device(struct device_node *np); struct phy_device *of_phy_find_device(struct device_node *phy_np); struct phy_device * -- cgit v1.2.3 From 30b605b8501e321f79e19c3238aa6ca31da6087c Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 16 Mar 2023 16:33:17 -0700 Subject: net: mdio: fix owner field for mdio buses registered using ACPI Bus ownership is wrong when using acpi_mdiobus_register() to register an mdio bus. That function is not inline, so when it calls mdiobus_register() the wrong THIS_MODULE value is captured. CC: Maxime Bizon Fixes: 803ca24d2f92 ("net: mdio: Add ACPI support code for mdio") Signed-off-by: Florian Fainelli Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/acpi_mdio.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/acpi_mdio.h b/include/linux/acpi_mdio.h index 0a24ab7cb66f..8e2eefa9fbc0 100644 --- a/include/linux/acpi_mdio.h +++ b/include/linux/acpi_mdio.h @@ -9,7 +9,14 @@ #include #if IS_ENABLED(CONFIG_ACPI_MDIO) -int acpi_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode); +int __acpi_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode, + struct module *owner); + +static inline int +acpi_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *handle) +{ + return __acpi_mdiobus_register(mdio, handle, THIS_MODULE); +} #else /* CONFIG_ACPI_MDIO */ static inline int acpi_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode) -- cgit v1.2.3 From 070246e4674b125860d311c18ce2623e73e2bd51 Mon Sep 17 00:00:00 2001 From: Jochen Henneberg Date: Fri, 17 Mar 2023 09:08:17 +0100 Subject: net: stmmac: Fix for mismatched host/device DMA address width Currently DMA address width is either read from a RO device register or force set from the platform data. This breaks DMA when the host DMA address width is <=32it but the device is >32bit. Right now the driver may decide to use a 2nd DMA descriptor for another buffer (happens in case of TSO xmit) assuming that 32bit addressing is used due to platform configuration but the device will still use both descriptor addresses as one address. This can be observed with the Intel EHL platform driver that sets 32bit for addr64 but the MAC reports 40bit. The TX queue gets stuck in case of TCP with iptables NAT configuration on TSO packets. The logic should be like this: Whatever we do on the host side (memory allocation GFP flags) should happen with the host DMA width, whenever we decide how to set addresses on the device registers we must use the device DMA address width. This patch renames the platform address width field from addr64 (term used in device datasheet) to host_addr and uses this value exclusively for host side operations while all chip operations consider the device DMA width as read from the device register. Fixes: 7cfc4486e7ea ("stmmac: intel: Configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing") Signed-off-by: Jochen Henneberg Signed-off-by: David S. Miller --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index a152678b82b7..a2414c187483 100644 --- a/include/linux/stm