summaryrefslogtreecommitdiff
path: root/kernel/resource.c
AgeCommit message (Collapse)AuthorFilesLines
2026-01-11Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"Ilias Stamatis1-1/+9
[ Upstream commit 6fb3acdebf65a72df0a95f9fd2c901ff2bc9a3a2 ] Commit 97523a4edb7b ("kernel/resource: remove first_lvl / siblings_only logic") removed an optimization introduced by commit 756398750e11 ("resource: avoid unnecessary lookups in find_next_iomem_res()"). That was not called out in the message of the first commit explicitly so it's not entirely clear whether removing the optimization happened inadvertently or not. As the original commit message of the optimization explains there is no point considering the children of a subtree in find_next_iomem_res() if the top level range does not match. Reinstating the optimization results in performance improvements in systems where /proc/iomem is ~5k lines long. Calling mmap() on /dev/mem in such platforms takes 700-1500μs without the optimisation and 10-50μs with the optimisation. Note that even though commit 97523a4edb7b removed the 'sibling_only' parameter from next_resource(), newer kernels have basically reinstated it under the name 'skip_children'. Link: https://lore.kernel.org/all/20251124165349.3377826-1-ilstam@amazon.com/T/#u Fixes: 97523a4edb7b ("kernel/resource: remove first_lvl / siblings_only logic") Signed-off-by: Ilias Stamatis <ilstam@amazon.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Huang, Ying" <huang.ying.caritas@gmail.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-11resource: introduce is_type_match() helper and use itAndy Shevchenko1-13/+10
[ Upstream commit ba1eccc114ffc62c4495a5e15659190fa2c42308 ] There are already a couple of places where we may replace a few lines of code by calling a helper, which increases readability while deduplicating the code. Introduce is_type_match() helper and use it. Link: https://lkml.kernel.org/r/20240925154355.1170859-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 6fb3acdebf65 ("Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-11resource: replace open coded resource_intersection()Andy Shevchenko1-9/+6
[ Upstream commit 5c1edea773c98707fbb23d1df168bcff52f61e4b ] Patch series "resource: A couple of cleanups". A couple of ad-hoc cleanups since there was a recent development of the code in question. No functional changes intended. This patch (of 2): __region_intersects() uses open coded resource_intersection(). Replace it with existing API which also make more clear what we are checking. Link: https://lkml.kernel.org/r/20240925154355.1170859-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20240925154355.1170859-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 6fb3acdebf65 ("Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-11resource: Reuse for_each_resource() macroAndy Shevchenko1-15/+19
[ Upstream commit 441f0dd8fa035a2c7cfe972047bb905d3be05c1b ] We have a few places where for_each_resource() is open coded. Replace that by the macro. This makes code easier to read and understand. With this, compile r_next() only for CONFIG_PROC_FS=y. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230912165312.402422-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 6fb3acdebf65 ("Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-11resource: Replace printk(KERN_WARNING) by pr_warn(), printk() by pr_info()Andy Shevchenko1-10/+7
[ Upstream commit 2a4e628570d42fcc13a94f1acf25e3cfeaec08f6 ] Replace printk(KERN_WARNING) by pr_warn() and printk() by pr_info(). While at it, use %pa for the resource_size_t variables. With that, for the sake of consistency, introduce a temporary variable for the end address in iomem_map_sanity_check() like it's done in another function in the same module. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20221109155618.42276-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 6fb3acdebf65 ("Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17resource: fix region_intersects() vs add_memory_driver_managed()Huang Ying1-8/+50
commit b4afe4183ec77f230851ea139d91e5cf2644c68b upstream. On a system with CXL memory, the resource tree (/proc/iomem) related to CXL memory may look like something as follows. 490000000-50fffffff : CXL Window 0 490000000-50fffffff : region0 490000000-50fffffff : dax0.0 490000000-50fffffff : System RAM (kmem) Because drivers/dax/kmem.c calls add_memory_driver_managed() during onlining CXL memory, which makes "System RAM (kmem)" a descendant of "CXL Window X". This confuses region_intersects(), which expects all "System RAM" resources to be at the top level of iomem_resource. This can lead to bugs. For example, when the following command line is executed to write some memory in CXL memory range via /dev/mem, $ dd if=data of=/dev/mem bs=$((1 << 10)) seek=$((0x490000000 >> 10)) count=1 dd: error writing '/dev/mem': Bad address 1+0 records in 0+0 records out 0 bytes copied, 0.0283507 s, 0.0 kB/s the command fails as expected. However, the error code is wrong. It should be "Operation not permitted" instead of "Bad address". More seriously, the /dev/mem permission checking in devmem_is_allowed() passes incorrectly. Although the accessing is prevented later because ioremap() isn't allowed to map system RAM, it is a potential security issue. During command executing, the following warning is reported in the kernel log for calling ioremap() on system RAM. ioremap on RAM at 0x0000000490000000 - 0x0000000490000fff WARNING: CPU: 2 PID: 416 at arch/x86/mm/ioremap.c:216 __ioremap_caller.constprop.0+0x131/0x35d Call Trace: memremap+0xcb/0x184 xlate_dev_mem_ptr+0x25/0x2f write_mem+0x94/0xfb vfs_write+0x128/0x26d ksys_write+0xac/0xfe do_syscall_64+0x9a/0xfd entry_SYSCALL_64_after_hwframe+0x4b/0x53 The details of command execution process are as follows. In the above resource tree, "System RAM" is a descendant of "CXL Window 0" instead of a top level resource. So, region_intersects() will report no System RAM resources in the CXL memory region incorrectly, because it only checks the top level resources. Consequently, devmem_is_allowed() will return 1 (allow access via /dev/mem) for CXL memory region incorrectly. Fortunately, ioremap() doesn't allow to map System RAM and reject the access. So, region_intersects() needs to be fixed to work correctly with the resource tree with "System RAM" not at top level as above. To fix it, if we found a unmatched resource in the top level, we will continue to search matched resources in its descendant resources. So, we will not miss any matched resources in resource tree anymore. In the new implementation, an example resource tree |------------- "CXL Window 0" ------------| |-- "System RAM" --| will behave similar as the following fake resource tree for region_intersects(, IORESOURCE_SYSTEM_RAM, ), |-- "System RAM" --||-- "CXL Window 0a" --| Where "CXL Window 0a" is part of the original "CXL Window 0" that isn't covered by "System RAM". Link: https://lkml.kernel.org/r/20240906030713.204292-2-ying.huang@intel.com Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12x86/kaslr: Expose and use the end of the physical memory address spaceThomas Gleixner1-4/+2
commit ea72ce5da22806d5713f3ffb39a6d5ae73841f93 upstream. iounmap() on x86 occasionally fails to unmap because the provided valid ioremap address is not below high_memory. It turned out that this happens due to KASLR. KASLR uses the full address space between PAGE_OFFSET and vaddr_end to randomize the starting points of the direct map, vmalloc and vmemmap regions. It thereby limits the size of the direct map by using the installed memory size plus an extra configurable margin for hot-plug memory. This limitation is done to gain more randomization space because otherwise only the holes between the direct map, vmalloc, vmemmap and vaddr_end would be usable for randomizing. The limited direct map size is not exposed to the rest of the kernel, so the memory hot-plug and resource management related code paths still operate under the assumption that the available address space can be determined with MAX_PHYSMEM_BITS. request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1 downwards. That means the first allocation happens past the end of the direct map and if unlucky this address is in the vmalloc space, which causes high_memory to become greater than VMALLOC_START and consequently causes iounmap() to fail for valid ioremap addresses. MAX_PHYSMEM_BITS cannot be changed for that because the randomization does not align with address bit boundaries and there are other places which actually require to know the maximum number of address bits. All remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found to be correct. Cure this by exposing the end of the direct map via PHYSMEM_END and use that for the memory hot-plug and resource management related places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END maps to a variable which is initialized by the KASLR initialization and otherwise it is based on MAX_PHYSMEM_BITS as before. To prevent future hickups add a check into add_pages() to catch callers trying to add memory above PHYSMEM_END. Fixes: 0483e1fa6e09 ("x86/mm: Implement ASLR for kernel memory regions") Reported-by: Max Ramanouski <max8rr8@gmail.com> Reported-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-By: Max Ramanouski <max8rr8@gmail.com> Tested-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Kees Cook <kees@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/87ed6soy3z.ffs@tglx Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13PCI: Allow drivers to request exclusive config regionsIra Weiny1-5/+8
[ Upstream commit 278294798ac9118412c9624a801d3f20f2279363 ] PCI config space access from user space has traditionally been unrestricted with writes being an understood risk for device operation. Unfortunately, device breakage or odd behavior from config writes lacks indicators that can leave driver writers confused when evaluating failures. This is especially true with the new PCIe Data Object Exchange (DOE) mailbox protocol where backdoor shenanigans from user space through things such as vendor defined protocols may affect device operation without complete breakage. A prior proposal restricted read and writes completely.[1] Greg and Bjorn pointed out that proposal is flawed for a couple of reasons. First, lspci should always be allowed and should not interfere with any device operation. Second, setpci is a valuable tool that is sometimes necessary and it should not be completely restricted.[2] Finally methods exist for full lock of device access if required. Even though access should not be restricted it would be nice for driver writers to be able to flag critical parts of the config space such that interference from user space can be detected. Introduce pci_request_config_region_exclusive() to mark exclusive config regions. Such regions trigger a warning and kernel taint if accessed via user space. Create pci_warn_once() to restrict the user from spamming the log. [1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/ [2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/ Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20220926215711.2893286-2-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Stable-dep-of: 5e70d0acf082 ("PCI: Add locking to RMW PCI Express Capability Register accessors") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10dax/kmem: Fix leak of memory-hotplug resourcesDan Williams1-14/+0
commit e686c32590f40bffc45f105c04c836ffad3e531a upstream. While experimenting with CXL region removal the following corruption of /proc/iomem appeared. Before: f010000000-f04fffffff : CXL Window 0 f010000000-f02fffffff : region4 f010000000-f02fffffff : dax4.0 f010000000-f02fffffff : System RAM (kmem) After (modprobe -r cxl_test): f010000000-f02fffffff : **redacted binary garbage** f010000000-f02fffffff : System RAM (kmem) ...and testing further the same is visible with persistent memory assigned to kmem: Before: 480000000-243fffffff : Persistent Memory 480000000-57e1fffff : namespace3.0 580000000-243fffffff : dax3.0 580000000-243fffffff : System RAM (kmem) After (ndctl disable-region all): 480000000-243fffffff : Persistent Memory 580000000-243fffffff : ***redacted binary garbage*** 580000000-243fffffff : System RAM (kmem) The corrupted data is from a use-after-free of the "dax4.0" and "dax3.0" resources, and it also shows that the "System RAM (kmem)" resource is not being removed. The bug does not appear after "modprobe -r kmem", it requires the parent of "dax4.0" and "dax3.0" to be removed which re-parents the leaked "System RAM (kmem)" instances. Those in turn reference the freed resource as a parent. First up for the fix is release_mem_region_adjustable() needs to reliably delete the resource inserted by add_memory_driver_managed(). That is thwarted by a check for IORESOURCE_SYSRAM that predates the dax/kmem driver, from commit: 65c78784135f ("kernel, resource: check for IORESOURCE_SYSRAM in release_mem_region_adjustable") That appears to be working around the behavior of HMM's "MEMORY_DEVICE_PUBLIC" facility that has since been deleted. With that check removed the "System RAM (kmem)" resource gets removed, but corruption still occurs occasionally because the "dax" resource is not reliably removed. The dax range information is freed before the device is unregistered, so the driver can not reliably recall (another use after free) what it is meant to release. Lastly if that use after free got lucky, the driver was covering up the leak of "System RAM (kmem)" due to its use of release_resource() which detaches, but does not free, child resources. The switch to remove_resource() forces remove_memory() to be responsible for the deletion of the resource added by add_memory_driver_managed(). Fixes: c2f3011ee697 ("device-dax: add an allocation interface for device-dax instances") Cc: <stable@vger.kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167653656244.3147810.5705900882794040229.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-21resource: Introduce alloc_free_mem_region()Dan Williams1-35/+143
The core of devm_request_free_mem_region() is a helper that searches for free space in iomem_resource and performs __request_region_locked() on the result of that search. The policy choices of the implementation conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is immediately marked busy, and a preference to search for the first-fit free range in descending order from the top of the physical address space. CXL has a need for a similar allocator, but with the following tweaks: 1/ Search for free space in ascending order 2/ Search for free space relative to a given CXL window 3/ 'insert' rather than 'request' the new resource given downstream drivers from the CXL Region driver (like the pmem or dax drivers) are responsible for request_mem_region() when they activate the memory range. Rework __request_free_mem_region() into get_free_mem_region() which takes a set of GFR_* (Get Free Region) flags to control the allocation policy (ascending vs descending), and "busy" policy (insert_resource() vs request_region()). As part of the consolidation of the legacy GFR_REQUEST_REGION case with the new default of just inserting a new resource into the free space some minor cleanups like not checking for NULL before calling devres_free() (which does its own check) is included. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-cxl/20220420143406.GY2120790@nvidia.com/ Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333333.1758207.13703329337805274043.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/acpi: Track CXL resources in iomem_resourceDan Williams1-0/+7
Recall that CXL capable address ranges, on ACPI platforms, are published in the CEDT.CFMWS (CXL Early Discovery Table: CXL Fixed Memory Window Structures). These windows represent both the actively mapped capacity and the potential address space that can be dynamically assigned to a new CXL decode configuration (region / interleave-set). CXL endpoints like DDR DIMMs can be mapped at any physical address including 0 and legacy ranges. There is an expectation and requirement that the /proc/iomem interface and the iomem_resource tree in the kernel reflect the full set of platform address ranges. I.e. that every address range that platform firmware and bus drivers enumerate be reflected as an iomem_resource entry. The hard requirement to do this for CXL arises from the fact that facilities like CONFIG_DEVICE_PRIVATE expect to be able to treat empty iomem_resource ranges as free for software to use as proxy address space. Without CXL publishing its potential address ranges in iomem_resource, the CONFIG_DEVICE_PRIVATE mechanism may inadvertently steal capacity reserved for runtime provisioning of new CXL regions. So, iomem_resource needs to know about both active and potential CXL resource ranges. The active CXL resources might already be reflected in iomem_resource as "System RAM". insert_resource_expand_to_fit() handles re-parenting "System RAM" underneath a CXL window. The "_expand_to_fit()" behavior handles cases where a CXL window is not a strict superset of an existing entry in the iomem_resource tree. The "_expand_to_fit()" behavior is acceptable from the perspective of resource allocation. The expansion happens because a conflicting resource range is already populated, which means the resource boundary expansion does not result in any additional free CXL address space being made available. CXL address space allocation is always bounded by the orginal unexpanded address range. However, the potential for expansion does mean that something like walk_iomem_res_desc(IORES_DESC_CXL...) can only return fuzzy answers on corner case platforms that cause the resource tree to expand a CXL window resource over a range that is not decoded by CXL. This would be an odd platform configuration, but if it becomes a problem in practice the CXL subsytem could just publish an API that returns definitive answers. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/165784325943.1758207.5310344844375305118.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-23kernel/resource: fix kfree() of bootmem memory againMiaohe Lin1-33/+8
Since commit ebff7d8f270d ("mem hotunplug: fix kfree() of bootmem memory"), we could get a resource allocated during boot via alloc_resource(). And it's required to release the resource using free_resource(). Howerver, many people use kfree directly which will result in kernel BUG. In order to fix this without fixing every call site, just leak a couple of bytes in such corner case. Link: https://lkml.kernel.org/r/20220217083619.19305-1-linmiaohe@huawei.com Fixes: ebff7d8f270d ("mem hotunplug: fix kfree() of bootmem memory") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22proc: remove PDE_DATA() completelyMuchun Song1-2/+2
Remove PDE_DATA() completely and replace it with pde_data(). [akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c] [akpm@linux-foundation.org: now fix it properly] Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09kernel/resource: disallow access to exclusive system RAM regionsDavid Hildenbrand1-10/+19
virtio-mem dynamically exposes memory inside a device memory region as system RAM to Linux, coordinating with the hypervisor which parts are actually "plugged" and consequently usable/accessible. On the one hand, the virtio-mem driver adds/removes whole memory blocks, creating/removing busy IORESOURCE_SYSTEM_RAM resources, on the other hand, it logically (un)plugs memory inside added memory blocks, dynamically either exposing them to the buddy or hiding them from the buddy and marking them PG_offline. In contrast to physical devices, like a DIMM, the virtio-mem driver is required to actually make use of any of the device-provided memory, because it performs the handshake with the hypervisor. virtio-mem memory cannot simply be access via /dev/mem without a driver. There is no safe way to: a) Access plugged memory blocks via /dev/mem, as they might contain unplugged holes or might get silently unplugged by the virtio-mem driver and consequently turned inaccessible. b) Access unplugged memory blocks via /dev/mem because the virtio-mem driver is required to make them actually accessible first. The virtio-spec states that unplugged memory blocks MUST NOT be written, and only selected unplugged memory blocks MAY be read. We want to make sure, this is the case in sane environments -- where the virtio-mem driver was loaded. We want to make sure that in a sane environment, nobody "accidentially" accesses unplugged memory inside the device managed region. For example, a user might spot a memory region in /proc/iomem and try accessing it via /dev/mem via gdb or dumping it via something else. By the time the mmap() happens, the memory might already have been removed by the virtio-mem driver silently: the mmap() would succeeed and user space might accidentially access unplugged memory. So once the driver was loaded and detected the device along the device-managed region, we just want to disallow any access via /dev/mem to it. In an ideal world, we would mark the whole region as busy ("owned by a driver") and exclude it; however, that would be wrong, as we don't really have actual system RAM at these ranges added to Linux ("busy system RAM"). Instead, we want to mark such ranges as "not actual busy system RAM but still soft-reserved and prepared by a driver for future use." Let's teach iomem_is_exclusive() to reject access to any range with "IORESOURCE_SYSTEM_RAM | IORESOURCE_EXCLUSIVE", even if not busy and even if "iomem=relaxed" is set. Introduce EXCLUSIVE_SYSTEM_RAM to make it easier for applicable drivers to depend on this setting in their Kconfig. For now, there are no applicable ranges and we'll modify virtio-mem next to properly set IORESOURCE_EXCLUSIVE on the parent resource container it creates to contain all actual busy system RAM added via add_memory_driver_managed(). Link: https://lkml.kernel.org/r/20210920142856.17758-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09kernel/resource: clean up and optimize iomem_is_exclusive()David Hildenbrand1-5/+20
Patch series "virtio-mem: disallow mapping virtio-mem memory via /dev/mem", v5. Let's add the basic infrastructure to exclude some physical memory regions marked as "IORESOURCE_SYSTEM_RAM" completely from /dev/mem access, even though they are not marked IORESOURCE_BUSY and even though "iomem=relaxed" is set. Resource IORESOURCE_EXCLUSIVE for that purpose instead of adding new flags to express something similar to "soft-busy" or "not busy yet, but already prepared by a driver and not to be mapped by user space". Use it for virtio-mem, to disallow mapping any virtio-mem memory via /dev/mem to user space after the virtio-mem driver was loaded. This patch (of 3): We end up traversing subtrees of ranges we are not interested in; let's optimize this case, skipping such subtrees, cleaning up the function a bit. For example, in the following configuration (/proc/iomem): 00000000-00000fff : Reserved 00001000-00057fff : System RAM 00058000-00058fff : Reserved 00059000-0009cfff : System RAM 0009d000-000fffff : Reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000c3fff : PCI Bus 0000:00 000c4000-000c7fff : PCI Bus 0000:00 000c8000-000cbfff : PCI Bus 0000:00 000cc000-000cffff : PCI Bus 0000:00 000d0000-000d3fff : PCI Bus 0000:00 000d4000-000d7fff : PCI Bus 0000:00 000d8000-000dbfff : PCI Bus 0000:00 000dc000-000dffff : PCI Bus 0000:00 000e0000-000e3fff : PCI Bus 0000:00 000e4000-000e7fff : PCI Bus 0000:00 000e8000-000ebfff : PCI Bus 0000:00 000ec000-000effff : PCI Bus 0000:00 000f0000-000fffff : PCI Bus 0000:00 000f0000-000fffff : System ROM 00100000-3fffffff : System RAM 40000000-403fffff : Reserved 40000000-403fffff : pnp 00:00 40400000-80a79fff : System RAM ... We don't have to look at any children of "0009d000-000fffff : Reserved" if we can just skip these 15 items directly because the parent range is not of interest. Link: https://lkml.kernel.org/r/20210920142856.17758-1-david@redhat.com Link: https://lkml.kernel.org/r/20210920142856.17758-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14kernel/resource: fix return code check in __request_free_mem_regionAlistair Popple1-1/+1
Splitting an earlier version of a patch that allowed calling __request_region() while holding the resource lock into a series of patches required changing the return code for the newly introduced __request_region_locked(). Unfortunately this change was not carried through to a subsequent commit 56fd94919b8b ("kernel/resource: fix locking in request_free_mem_region") in the series. This resulted in a use-after-free due to freeing the struct resource without properly releasing it. Fix this by correcting the return code check so that the struct is not freed if the request to add it was successful. Link: https://lkml.kernel.org/r/20210512073528.22334-1-apopple@nvidia.com Fixes: 56fd94919b8b ("kernel/resource: fix locking in request_free_mem_region") Signed-off-by: Alistair Popple <apopple@nvidia.com> Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Muchun Song <smuchun@gmail.com> Cc: Oliver Sang <oliver.sang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07kernel/resource: fix locking in request_free_mem_regionAlistair Popple1-7/+38
request_free_mem_region() is used to find an empty range of physical addresses for hotplugging ZONE_DEVICE memory. It does this by iterating over the range of possible addresses using region_intersects() to see if the range is free before calling request_mem_region() to allocate the region. However the resource_lock is dropped between these two calls meaning by the time request_mem_region() is called in request_free_mem_region() another thread may have already reserved the requested region. This results in unexpected failures and a message in the kernel log from hitting this condition: /* * mm/hmm.c reserves physical addresses which then * become unavailable to other users. Conflicts are * not expected. Warn to aid debugging if encountered. */ if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) { pr_warn("Unaddressable device %s %pR conflicts with %pR", conflict->name, conflict, res); These unexpected failures can be corrected by holding resource_lock across the two calls. This also requires memory allocation to be performed prior to taking the lock. Link: https://lkml.kernel.org/r/20210419070109.4780-3-apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Muchun Song <smuchun@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07kernel/resource: refactor __request_region to allow external lockingAlistair Popple1-20/+32
Refactor the portion of __request_region() done whilst holding the resource_lock into a separate function to allow callers to hold the lock. Link: https://lkml.kernel.org/r/20210419070109.4780-2-apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Muchun Song <smuchun@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07kernel/resource: allow region_intersects users to hold resource_lockAlistair Popple1-21/+31
Introduce a version of region_intersects() that can be called with the resource_lock already held. This will be used in a future fix to __request_free_mem_region(). [akpm@linux-foundation.org: make __region_intersects static] Link: https://lkml.kernel.org/r/20210419070109.4780-1-apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Muchun Song <smuchun@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07kernel/resource: remove first_lvl / siblings_only logicDavid Hildenbrand1-33/+12
All functions that search for IORESOURCE_SYSTEM_RAM or IORESOURCE_MEM resources now properly consider the whole resource tree, not just the first level. Let's drop the unused first_lvl / siblings_only logic. Remove documentation that indicates that some functions behave differently, all consider the full resource tree now. Link: https://lkml.kernel.org/r/20210325115326.7826-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Oscar Salvador <osalvador@suse.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resourcesDavid Hildenbrand1-1/+1
It used to be true that we can have system RAM (IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY) only on the first level in the resource tree. However, this is no longer holds for driver-managed system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on lower levels, for example, inside device containers. IORESOURCE_SYSTEM_RAM is defined as IORESOURCE_MEM | IORESOURCE_SYSRAM and just a special type of IORESOURCE_MEM. The function walk_mem_res() only considers the first level and is used in arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently fail to identify System RAM added by dax/kmem and virtio-mem as "IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such "normal RAM" in __ioremap_caller(). Let's find all IORESOURCE_MEM | IORESOURCE_BUSY resources, making the function behave similar to walk_system_ram_res(). Link: https://lkml.kernel.org/r/20210325115326.7826-3-david@redhat.com Fixes: ebf71552bb0e ("virtio-mem: Add parent resource for all added "System RAM"") Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Oscar Salvador <osalvador@suse.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07kernel/resource: make walk_system_ram_res() find all busy ↵David Hildenbrand1-1/+1
IORESOURCE_SYSTEM_RAM resources Patch series "kernel/resource: make walk_system_ram_res() and walk_mem_res() search the whole tree", v2. Playing with kdump+virtio-mem I noticed that kexec_file_load() does not consider System RAM added via dax/kmem and virtio-mem when preparing the elf header for kdump. Looking into the details, the logic used in walk_system_ram_res() and walk_mem_res() seems to be outdated. walk_system_ram_range() already does the right thing, let's change walk_system_ram_res() and walk_mem_res(), and clean up. Loading a kdump kernel via "kexec -p -s" ... will result in the kdump kernel to also dump dax/kmem and virtio-mem added System RAM now. Note: kexec-tools on x86-64 also have to be updated to consider this memory in the kexec_load() case when processing /proc/iomem. This patch (of 3): It used to be true that we can have system RAM (IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY) only on the first level in the resource tree. However, this is no longer holds for driver-managed system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on lower levels, for example, inside device containers. We have two users of walk_system_ram_res(), which currently only consideres the first level: a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip IORESOURCE_SYSRAM_DRIVER_MANAGED resources via locate_mem_hole_callback(), so even after this change, we won't be placing kexec images onto dax/kmem and virtio-mem added memory. No change. b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently not adding relevant ranges to the crash elf header, resulting in them not getting dumped via kdump. This change fixes loading a crashkernel via kexec_file_load() and including dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note that e.g,, arm64 relies on memblock data and, therefore, always considers all added System RAM already. Let's find all IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY resources, making the function behave like walk_system_ram_range(). Link: https://lkml.kernel.org/r/20210325115326.7826-1-david@redhat.com Link: https://lkml.kernel.org/r/20210325115326.7826-2-david@redhat.com Fixes: ebf71552bb0e ("virtio-mem: Add parent resource for all added "System RAM"") Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Oscar Salvador <osalvador@suse.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-12resource: Move devmem revoke code to resource frameworkDaniel Vetter1-1/+97
We want all iomem mmaps to consistently revoke ptes when the kernel takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the pci bar mmaps available through procfs and sysfs, which currently do not revoke mappings. To prepare for this, move the code from the /dev/kmem driver to kernel/resource.c. During review Jason spotted that barriers are used somewhat inconsistently. Fix that up while we shuffle this code, since it doesn't have an actual impact at runtime. Otherwise no semantic and behavioural changes intended, just code extraction and adjusting comments and names. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Hildenbrand <david@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-11-daniel.vetter@ffwll.ch
2020-12-15kernel/resource.c: fix kernel-doc markupsMauro Carvalho Chehab1-10/+14
Kernel-doc markups should use this format: identifier - description While here, fix a kernel-doc tag that was using, instead, a normal comment block. [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/c5e38e1070f8dbe2f9607a10b44afe2875bd966c.1605521731.git.mchehab+huawei@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: "Jonathan Corbet" <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-17resource: Simplify region_intersects() by reducing conditionalsAndy Shevchenko1-5/+5
Now we have for 'other' and 'type' variables other type return 0 0 REGION_DISJOINT 0 x REGION_INTERSECTS x 0 REGION_DISJOINT x x REGION_MIXED Obviously it's easier to check 'type' for 0 first instead of currently checked 'other'. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-16kernel/resource: make iomem_resource implicit in release_mem_region_adjustable()David Hildenbrand1-3/+2
"mem" in the name already indicates the root, similar to release_mem_region() and devm_request_mem_region(). Make it implicit. The only single caller always passes iomem_resource, other parents are not applicable. Suggested-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Baoquan He <bhe@redhat.com> Link: https://lkml.kernel.org/r/20200916073041.10355-1-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM ↵David Hildenbrand1-0/+60
resources Some add_memory*() users add memory in small, contiguous memory blocks. Examples include virtio-mem, hyper-v balloon, and the XEN balloon. This can quickly result in a lot of memory resources, whereby the actual resource boundaries are not of interest (e.g., it might be relevant for DIMMs, exposed via /proc/iomem to user space). We really want to merge added resources in this scenario where possible. Let's provide a flag (MEMHP_MERGE_RESOURCE) to specify that a resource either created within add_memory*() or passed via add_memory_resource() shall be marked mergeable and merged with applicable siblings. To implement that, we need a kernel/resource interface to mark selected System RAM resources mergeable (IORESOURCE_SYSRAM_MERGEABLE) and trigger merging. Note: We really want to merge after the whole operation succeeded, not directly when adding a resource to the resource tree (it would break add_memory_resource() and require splitting resources again when the operation failed - e.g., due to -ENOMEM). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Julien Grall <julien@xen.org> Cc: Baoquan He <bhe@redhat.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Leonardo Bras <leobras.c@gmail.com> Cc: Libor Pechacek <lpechacek@suse.cz> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: "Oliver