summaryrefslogtreecommitdiff
path: root/drivers/xen
AgeCommit message (Collapse)AuthorFilesLines
2022-03-11xen/gnttab: fix gnttab_end_foreign_access() without page specifiedJuergen Gross1-7/+29
Commit 42baefac638f06314298087394b982ead9ec444b upstream. gnttab_end_foreign_access() is used to free a grant reference and optionally to free the associated page. In case the grant is still in use by the other side processing is being deferred. This leads to a problem in case no page to be freed is specified by the caller: the caller doesn't know that the page is still mapped by the other side and thus should not be used for other purposes. The correct way to handle this situation is to take an additional reference to the granted page in case handling is being deferred and to drop that reference when the grant reference could be freed finally. This requires that there are no users of gnttab_end_foreign_access() left directly repurposing the granted page after the call, as this might result in clobbered data or information leaks via the not yet freed grant reference. This is part of CVE-2022-23041 / XSA-396. Reported-by: Simon Gaiser <simon@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11xen/pvcalls: use alloc/free_pages_exact()Juergen Gross1-4/+4
Commit b0576cc9c6b843d99c6982888d59a56209341888 upstream. Instead of __get_free_pages() and free_pages() use alloc_pages_exact() and free_pages_exact(). This is in preparation of a change of gnttab_end_foreign_access() which will prohibit use of high-order pages. This is part of CVE-2022-23041 / XSA-396. Reported-by: Simon Gaiser <simon@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11xen: remove gnttab_query_foreign_access()Juergen Gross1-25/+0
Commit 1dbd11ca75fe664d3e54607547771d021f531f59 upstream. Remove gnttab_query_foreign_access(), as it is unused and unsafe to use. All previous use cases assumed a grant would not be in use after gnttab_query_foreign_access() returned 0. This information is useless in best case, as it only refers to a situation in the past, which could have changed already. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11xen/gntalloc: don't use gnttab_query_foreign_access()Juergen Gross1-18/+7
Commit d3b6372c5881cb54925212abb62c521df8ba4809 upstream. Using gnttab_query_foreign_access() is unsafe, as it is racy by design. The use case in the gntalloc driver is not needed at all. While at it replace the call of gnttab_end_foreign_access_ref() with a call of gnttab_end_foreign_access(), which is what is really wanted there. In case the grant wasn't used due to an allocation failure, just free the grant via gnttab_free_grant_reference(). This is CVE-2022-23039 / part of XSA-396. Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11xen/grant-table: add gnttab_try_end_foreign_access()Juergen Gross1-2/+12
Commit 6b1775f26a2da2b05a6dc8ec2b5d14e9a4701a1a upstream. Add a new grant table function gnttab_try_end_foreign_access(), which will remove and free a grant if it is not in use. Its main use case is to either free a grant if it is no longer in use, or to take some other action if it is still in use. This other action can be an error exit, or (e.g. in the case of blkfront persistent grant feature) some special handling. This is CVE-2022-23036, CVE-2022-23038 / part of XSA-396. Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11xen/xenbus: don't let xenbus_grant_ring() remove grants in error caseJuergen Gross1-13/+11
Commit 3777ea7bac3113005b7180e6b9dadf16d19a5827 upstream. Letting xenbus_grant_ring() tear down grants in the error case is problematic, as the other side could already have used these grants. Calling gnttab_end_foreign_access_ref() without checking success is resulting in an unclear situation for any caller of xenbus_grant_ring() as in the error case the memory pages of the ring page might be partially mapped. Freeing them would risk unwanted foreign access to them, while not freeing them would leak memory. In order to remove the need to undo any gnttab_grant_foreign_access() calls, use gnttab_alloc_grant_references() to make sure no further error can occur in the loop granting access to the ring pages. It should be noted that this way of handling removes leaking of grant entries in the error case, too. This is CVE-2022-23040 / part of XSA-396. Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27xen/gntdev: fix unmap notification orderOleksandr Andrushchenko1-3/+3
commit ce2f46f3531a03781181b7f4bd1ff9f8c5086e7e upstream. While working with Xen's libxenvchan library I have faced an issue with unmap notifications sent in wrong order if both UNMAP_NOTIFY_SEND_EVENT and UNMAP_NOTIFY_CLEAR_BYTE were requested: first we send an event channel notification and then clear the notification byte which renders in the below inconsistency (cli_live is the byte which was requested to be cleared on unmap): [ 444.514243] gntdev_put_map UNMAP_NOTIFY_SEND_EVENT map->notify.event 6 libxenvchan_is_open cli_live 1 [ 444.515239] __unmap_grant_pages UNMAP_NOTIFY_CLEAR_BYTE at 14 Thus it is not possible to reliably implement the checks like - wait for the notification (UNMAP_NOTIFY_SEND_EVENT) - check the variable (UNMAP_NOTIFY_CLEAR_BYTE) because it is possible that the variable gets checked before it is cleared by the kernel. To fix that we need to re-order the notifications, so the variable is first gets cleared and then the event channel notification is sent. With this fix I can see the correct order of execution: [ 54.522611] __unmap_grant_pages UNMAP_NOTIFY_CLEAR_BYTE at 14 [ 54.537966] gntdev_put_map UNMAP_NOTIFY_SEND_EVENT map->notify.event 6 libxenvchan_is_open cli_live 0 Cc: stable@vger.kernel.org Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20211210092817.580718-1-andr2000@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01xen: detect uninitialized xenbus in xenbus_initStefano Stabellini1-0/+23
commit 36e8f60f0867d3b70d398d653c17108459a04efe upstream. If the xenstore page hasn't been allocated properly, reading the value of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return error. Instead, it will succeed and return zero. Instead of attempting to xen_remap a bad guest physical address, detect this condition and return early. Note that although a guest physical address of zero for HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice and zero has never been validly used in that capacity. Also recognize all bits set as an invalid value. For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns above ULONG_MAX should never be passed by the Xen tools to HVM guests anyway, so check for this condition and return early. Cc: stable@vger.kernel.org Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Link: https://lore.kernel.org/r/20211123210748.1910236-1-sstabellini@kernel.org Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01xen: don't continue xenstore initialization in case of errorsStefano Stabellini1-1/+3
commit 08f6c2b09ebd4b326dbe96d13f94fee8f9814c78 upstream. In case of errors in xenbus_init (e.g. missing xen_store_gfn parameter), we goto out_error but we forget to reset xen_store_domain_type to XS_UNKNOWN. As a consequence xenbus_probe_initcall and other initcalls will still try to initialize xenstore resulting into a crash at boot. [ 2.479830] Call trace: [ 2.482314] xb_init_comms+0x18/0x150 [ 2.486354] xs_init+0x34/0x138 [ 2.489786] xenbus_probe+0x4c/0x70 [ 2.498432] xenbus_probe_initcall+0x2c/0x7c [ 2.503944] do_one_initcall+0x54/0x1b8 [ 2.507358] kernel_init_freeable+0x1ac/0x210 [ 2.511617] kernel_init+0x28/0x130 [ 2.516112] ret_from_fork+0x10/0x20 Cc: <Stable@vger.kernel.org> Cc: jbeulich@suse.com Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Link: https://lore.kernel.org/r/20211115222719.2558207-1-sstabellini@kernel.org Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18xen-pciback: Fix return in pm_ctrl_init()YueHaibing1-1/+1
[ Upstream commit 4745ea2628bb43a7ec34b71763b5a56407b33990 ] Return NULL instead of passing to ERR_PTR while err is zero, this fix smatch warnings: drivers/xen/xen-pciback/conf_space_capability.c:163 pm_ctrl_init() warn: passing zero to 'ERR_PTR' Fixes: a92336a1176b ("xen/pciback: Drop two backends, squash and cleanup some code.") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20211008074417.8260-1-yuehaibing@huawei.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18xen/balloon: add late_initcall_sync() for initial ballooning doneJuergen Gross1-23/+63
commit 40fdea0284bb20814399da0484a658a96c735d90 upstream. When running as PVH or HVM guest with actual memory < max memory the hypervisor is using "populate on demand" in order to allow the guest to balloon down from its maximum memory size. For this to work correctly the guest must not touch more memory pages than its target memory size as otherwise the PoD cache will be exhausted and the guest is crashed as a result of that. In extreme cases ballooning down might not be finished today before the init process is started, which can consume lots of memory. In order to avoid random boot crashes in such cases, add a late init call to wait for ballooning down having finished for PVH/HVM guests. Warn on console if initial ballooning fails, panic() after stalling for more than 3 minutes per default. Add a module parameter for changing this timeout. [boris: replaced pr_info() with pr_notice()] Cc: <stable@vger.kernel.org> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20211102091944.17487-1-jgross@suse.com Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-13xen/balloon: fix cancelled balloon actionJuergen Gross1-6/+15
commit 319933a80fd4f07122466a77f93e5019d71be74c upstream. In case a ballooning action is cancelled the new kernel thread handling the ballooning might end up in a busy loop. Fix that by handling the cancelled action gracefully. While at it introduce a short wait for the BP_WAIT case. Cc: stable@vger.kernel.org Fixes: 8480ed9c2bbd56 ("xen/balloon: use a kernel thread instead a workqueue") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20211005133433.32008-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-13xen/privcmd: fix error handling in mmap-resource processingJan Beulich1-3/+4
commit e11423d6721dd63b23fb41ade5e8d0b448b17780 upstream. xen_pfn_t is the same size as int only on 32-bit builds (and not even on Arm32). Hence pfns[] can't be used directly to read individual error values returned from xen_remap_domain_mfn_array(); every other error indicator would be skipped/ignored on 64-bit. Fixes: 3ad0876554ca ("xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE") Cc: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/aa6d6a67-6889-338a-a910-51e889f792d5@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-30xen/balloon: fix balloon kthread freezingJuergen Gross1-2/+2
commit 96f5bd03e1be606987644b71899ea56a8d05f825 upstream. Commit 8480ed9c2bbd56 ("xen/balloon: use a kernel thread instead a workqueue") switched the Xen balloon driver to use a kernel thread. Unfortunately the patch omitted to call try_to_freeze() or to use wait_event_freezable_timeout(), causing a system suspend to fail. Fixes: 8480ed9c2bbd56 ("xen/balloon: use a kernel thread instead a workqueue") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210920100345.21939-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-30xen/balloon: use a kernel thread instead a workqueueJuergen Gross1-17/+45
[ Upstream commit 8480ed9c2bbd56fc86524998e5f2e3e22f5038f6 ] Today the Xen ballooning is done via delayed work in a workqueue. This might result in workqueue hangups being reported in case of large amounts of memory are being ballooned in one go (here 16GB): BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 stuck for 64s! Showing busy workqueues and worker pools: workqueue events: flags=0x0 pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=2/256 refcnt=3 in-flight: 229:balloon_process pending: cache_reap workqueue events_freezable_power_: flags=0x84 pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 pending: disk_events_workfn workqueue mm_percpu_wq: flags=0x8 pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 pending: vmstat_update pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=64s workers=3 idle: 2222 43 This can easily be avoided by using a dedicated kernel thread for doing the ballooning work. Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210827123206.15429-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-18xen/events: Fix race in set_evtchn_to_irqMaximilian Heyne1-6/+14
[ Upstream commit 88ca2521bd5b4e8b83743c01a2d4cb09325b51e9 ] There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq mapping are lazily allocated in this function. The check whether the row is already present and the row initialization is not synchronized. Two threads can at the same time allocate a new row for evtchn_to_irq and add the irq mapping to the their newly allocated row. One thread will overwrite what the other has set for evtchn_to_irq[row] and therefore the irq mapping is lost. This will trigger a BUG_ON later in bind_evtchn_to_cpu: INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802 INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002) INFO: nvme nvme77: 1/0/0 default/read/poll queues CRIT: kernel BUG at drivers/xen/events/events_base.c:427! WARN: invalid opcode: 0000 [#1] SMP NOPTI WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme] WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0 WARN: Call Trace: WARN: set_affinity_irq+0x121/0x150 WARN: irq_do_set_affinity+0x37/0xe0 WARN: irq_setup_affinity+0xf6/0x170 WARN: irq_startup+0x64/0xe0 WARN: __setup_irq+0x69e/0x740 WARN: ? request_threaded_irq+0xad/0x160 WARN: request_threaded_irq+0xf5/0x160 WARN: ? nvme_timeout+0x2f0/0x2f0 [nvme] WARN: pci_request_irq+0xa9/0xf0 WARN: ? pci_alloc_irq_vectors_affinity+0xbb/0x130 WARN: queue_request_irq+0x4c/0x70 [nvme] WARN: nvme_reset_work+0x82d/0x1550 [nvme] WARN: ? check_preempt_wakeup+0x14f/0x230 WARN: ? check_preempt_curr+0x29/0x80 WARN: ? nvme_irq_check+0x30/0x30 [nvme] WARN: process_one_work+0x18e/0x3c0 WARN: worker_thread+0x30/0x3a0 WARN: ? process_one_work+0x3c0/0x3c0 WARN: kthread+0x113/0x130 WARN: ? kthread_park+0x90/0x90 WARN: ret_from_fork+0x3a/0x50 This patch sets evtchn_to_irq rows via a cmpxchg operation so that they will be set only once. The row is now cleared before writing it to evtchn_to_irq in order to not create a race once the row is visible for other threads. While at it, do not require the page to be zeroed, because it will be overwritten with -1's in clear_evtchn_to_irq_row anyway. Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Fixes: d0b075ffeede ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated") Link: https://lore.kernel.org/r/20210812130930.127134-1-mheyne@amazon.de Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-11xen/events: reset active flag for lateeoi events laterJuergen Gross1-4/+19
commit 3de218ff39b9e3f0d453fe3154f12a174de44b25 upstream. In order to avoid a race condition for user events when changing cpu affinity reset the active flag only when EOI-ing the event. This is working fine as all user events are lateeoi events. Note that lateeoi_ack_mask_dynirq() is not modified as there is no explicit call to xen_irq_lateeoi() expected later. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Fixes: b6622798bc50b62 ("xen/events: avoid handling the same event on two cpus at the same time") Tested-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com> Link: https://lore.kernel.org/r/20210623130913.9405-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-26xen-pciback: reconfigure also from backend watch handlerJan Beulich1-5/+17
commit c81d3d24602540f65256f98831d0a25599ea6b87 upstream. When multiple PCI devices get assigned to a guest right at boot, libxl incrementally populates the backend tree. The writes for the first of the devices trigger the backend watch. In turn xen_pcibk_setup_backend() will set the XenBus state to Initialised, at which point no further reconfigures would happen unless a device got hotplugged. Arrange for reconfigure to also get triggered from the backend watch handler. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/2337cbd6-94b9-4187-9862-c03ea12e0c61@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-26xen-pciback: redo VF placement in the virtual topologyJan Beulich1-6/+8
commit 4ba50e7c423c29639878c00573288869aa627068 upstream. The commit referenced below was incomplete: It merely affected what would get written to the vdev-<N> xenstore node. The guest would still find the function at the original function number as long as __xen_pcibk_get_pci_dev() wouldn't be in sync. The same goes for AER wrt __xen_pcibk_get_pcifront_dev(). Undo overriding the function to zero and instead make sure that VFs at function zero remain alone in their slot. This has the added benefit of improving overall capacity, considering that there's only a total of 32 slots available right now (PCI segment and bus can both only ever be zero at present). Fixes: 8a5248fe10b1 ("xen PV passthru: assign SR-IOV virtual functions to separate virtual slots") Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/8def783b-404c-3452-196d-3f3fd4d72c9e@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19xen/gntdev: fix gntdev_mmap() error exit pathJuergen Gross1-1/+3
commit 970655aa9b42461f8394e4457307005bdeee14d9 upstream. Commit d3eeb1d77c5d0af ("xen/gntdev: use mmu_interval_notifier_insert") introduced an error in gntdev_mmap(): in case the call of mmu_interval_notifier_insert_locked() fails the exit path should not call mmu_interval_notifier_remove(), as this might result in NULL dereferences. One reason for failure is e.g. a signal pending for the running process. Fixes: d3eeb1d77c5d0af ("xen/gntdev: use mmu_interval_notifier_insert") Cc: stable@vger.kernel.org Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Link: https://lore.kernel.org/r/20210423054038.26696-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19xen/unpopulated-alloc: fix error return code in fill_list()Zhen Lei1-1/+3
[ Upstream commit dbc03e81586fc33e4945263fd6e09e22eb4b980f ] Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: a4574f63edc6 ("mm/memremap_pages: convert to 'struct range'") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20210508021913.1727-1-thunder.leizhen@huawei.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19xen/unpopulated-alloc: consolidate pgmap manipulationDan Williams1-7/+7
[ Upstream commit 3a250629d7325f27b278dad1aaf44eab00090e76 ] Cleanup fill_list() to keep all the pgmap manipulations in a single location of the function. Update the exit unwind path accordingly. Link: http://lore.kernel.org/r/6186fa28-d123-12db-6171-a75cb6e615a5@oracle.com Link: https://lkml.kernel.org/r/160272253442.3136502.16683842453317773487.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-16xen/events: fix setting irq affinityJuergen Gross1-2/+2
The backport of upstream patch 25da4618af240fbec61 ("xen/events: don't unmask an event channel when an eoi is pending") introduced a regression for stable kernels 5.10 and older: setting IRQ affinity for IRQs related to interdomain events would no longer work, as moving the IRQ to its new cpu was not included in the irq_ack callback for those events. Fix that by adding the needed call. Note that kernels 5.11 and later don't need the explicit moving of the IRQ to the target cpu in the irq_ack callback, due to a rework of the affinity setting in kernel 5.11. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14xen/evtchn: Change irq_info lock to raw_spinlock_tLuca Fancellu1-6/+6
commit d120198bd5ff1d41808b6914e1eb89aff937415c upstream. Unmask operation must be called with interrupt disabled, on preempt_rt spin_lock_irqsave/spin_unlock_irqrestore don't disable/enable interrupts, so use raw_* implementation and change lock variable in struct irq_info from spinlock_t to raw_spinlock_t Cc: stable@vger.kernel.org Fixes: 25da4618af24 ("xen/events: don't unmask an event channel when an eoi is pending") Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20210406105105.10141-1-luca.fancellu@arm.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30xen/x86: make XEN_BALLOON_MEMORY_HOTPLUG_LIMIT depend on MEMORY_HOTPLUGRoger Pau Monne1-2/+2
[ Upstream commit 2b514ec72706a31bea0c3b97e622b81535b5323a ] The Xen memory hotplug limit should depend on the memory hotplug generic option, rather than the Xen balloon configuration. It's possible to have a kernel with generic memory hotplug enabled, but without Xen balloon enabled, at which point memory hotplug won't work correctly due to the size limitation of the p2m. Rename the option to XEN_MEMORY_HOTPLUG_LIMIT since it's no longer tied to ballooning. Fixes: 9e2369c06c8a18 ("xen: add helpers to allocate unpopulated memory") Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210324122424.58685-2-roger.pau@citrix.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17xen/events: avoid handling the same event on two cpus at the same timeJuergen Gross1-8/+18
commit b6622798bc50b625a1e62f82c7190df40c1f5b21 upstream. When changing the cpu affinity of an event it can happen today that (with some unlucky timing) the same event will be handled on the old and the new cpu at the same time. Avoid that by adding an "event active" flag to the per-event data and call the handler only if this flag isn't set. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Link: https://lore.kernel.org/r/20210306161833.4552-4-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17xen/events: don't unmask an event channel when an eoi is pendingJuergen Gross4-49/+88
commit 25da4618af240fbec6112401498301a6f2bc9702 upstream. An event channel should be kept masked when an eoi is pending for it. When being migrated to another cpu it might be unmasked, though. In order to avoid this keep three different flags for each event channel to be able to distinguish "normal" masking/unmasking from eoi related masking/unmasking and temporary masking. The event channel should only be able to generate an interrupt if all flags are cleared. Cc: stable@vger.kernel.org Fixes: 54c9de89895e ("xen/events: add a new "late EOI" evtchn framework") Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Ross Lagerwall <ross.lagerwall@citrix.com> Link: https://lore.kernel.org/r/20210306161833.4552-3-jgross@suse.com [boris -- corrected Fixed tag format] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17xen/events: reset affinity of 2-level event when tearing it downJuergen Gross3-0/+24
commit 9e77d96b8e2724ed00380189f7b0ded61113b39f upstream. When creating a new event channel with 2-level events the affinity needs to be reset initially in order to avoid using an old affinity from earlier usage of the event channel port. So when tearing an event channel down reset all affinity bits. The same applies to the affinity when onlining a vcpu: all old affinity settings for this vcpu must be reset. As percpu events get initialized before the percpu event channel hook is called, resetting of the affinities happens after offlining a vcpu (this is working, as initial percpu memory is zeroed out). Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Link: https://lore.kernel.org/r/20210306161833.4552-2-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23xen-scsiback: don't "handle" error by BUG()Jan Beulich1-2/+2
commit 7c77474b2d22176d2bfb592ec74e0f2cb71352c9 upstream. In particular -ENOMEM may come back here, from set_foreign_p2m_mapping(). Don't make problems worse, the more that handling elsewhere (together with map's status fields now indicating whether a mapping wasn't even attempted, and hence has to be considered failed) doesn't require this odd way of dealing with errors. This is part of XSA-362. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23Xen/gntdev: correct error checking in gntdev_map_grant_pages()Jan Beulich1-8/+9
commit ebee0eab08594b2bd5db716288a4f1ae5936e9bc upstream. Failure of the kernel part of the mapping operation should also be indicated as an error to the caller, or else it may assume the respective kernel VA is okay to access. Furthermore gnttab_map_refs() failing still requires recording successfully mapped handles, so they can be unmapped subsequently. This in turn requires there to be a way to tell full hypercall failure from partial success - preset map_op status fields such that they won't "happen" to look as if the operation succeeded. Also again use GNTST_okay instead of implying its value (zero). This is part of XSA-361. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()Jan Beulich1-11/+13
commit dbe5283605b3bc12ca45def09cc721a0a5c853a2 upstream. We may not skip setting the field in the unmap structure when GNTMAP_device_map is in use - such an unmap would fail to release the respective resources (a page ref in the hypervisor). Otoh the field doesn't need setting at all when GNTMAP_device_map is not in use. To record the value for unmapping, we also better don't use our local p2m: In particular after a subsequent change it may not have got updated for all the batch elements. Instead it can simply be taken from the respective map's results. We can additionally avoid playing this game altogether for the kernel part of the mappings in (x86) PV mode. This is part of XSA-361. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17arm/xen: Don't probe xenbus as part of an early initcallJulien Grall2-2/+1
commit c4295ab0b485b8bc50d2264bcae2acd06f25caaf upstream. After Commit 3499ba8198cad ("xen: Fix event channel callback via INTX/GSI"), xenbus_probe() will be called too early on Arm. This will recent to a guest hang during boot. If the hang wasn't there, we would have ended up to call xenbus_probe() twice (the second time is in xenbus_probe_initcall()). We don't need to initialize xenbus_probe() early for Arm guest. Therefore, the call in xen_guest_init() is now removed. After this change, there is no more external caller for xenbus_probe(). So the function is turned to a static one. Interestingly there were two prototypes for it. Cc: stable@vger.kernel.org Fixes: 3499ba8198cad ("xen: Fix event channel callback via INTX/GSI") Reported-by: Ian Jackson <iwj@xenproject.org> Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20210210170654.5377-1-julien@xen.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03xen: Fix XenStore initialisation for XS_LOCALDavid Woodhouse1-0/+31
commit 5f46400f7a6a4fad635d5a79e2aa5a04a30ffea1 upstream. In commit 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI") I reworked the triggering of xenbus_probe(). I tried to simplify things by taking out the workqueue based startup triggered from wake_waiting(); the somewhat poorly named xenbus IRQ handler. I missed the fact that in the XS_LOCAL case (Dom0 starting its own xenstored or xenstore-stubdom, which happens after the kernel is booted completely), that IRQ-based trigger is still actually needed. So... put it back, except more cleanly. By just spawning a xenbus_probe thread which waits on xb_waitq and runs the probe the first time it gets woken, just as the workqueue-based hack did. This is actually a nicer approach for *all* the back ends with different interrupt methods, and we can switch them all over to that without the complex conditions for when to trigger it. But not in -rc6. This is the minimal fix for the regression, although it's a step in the right direction instead of doing a partial revert and actually putting the workqueue back. It's also simpler than the workqueue. Fixes: 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI") Reported-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/4c9af052a6e0f6485d1de43f2c38b1461996db99.camel@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com> Cc: Salvatore Bonaccorso <carnil@debian.org> Cc: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27xen: Fix event channel callback via INTX/GSIDavid Woodhouse5-33/+68
[ Upstream commit 3499ba8198cad47b731792e5e56b9ec2a78a83a2 ] For a while, event channel notification via the PCI platform device has been broken, because we attempt to communicate with xenstore before we even have notifications working, with the xs_reset_watches() call in xs_init(). We tend to get away with this on Xen versions below 4.0 because we avoid calling xs_reset_watches() anyway, because xenstore might not cope with reading a non-existent key. And newer Xen *does* have the vector callback support, so we rarely fall back to INTX/GSI delivery. To fix it, clean up a bit of the mess of xs_init() and xenbus_probe() startup. Call xs_init() directly from xenbus_init() only in the !XS_HVM case, deferring it to be called from xenbus_probe() in the XS_HVM case instead. Then fix up the invocation of xenbus_probe() to happen either from its device_initcall if the callback is available early enough, or when the callback is finally set up. This means that the hack of calling xenbus_probe() from a workqueue after the first interrupt, or directly from the PCI platform device setup, is no longer needed. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210113132606.422794-2-dwmw2@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19xen/privcmd: allow fetching resource sizesRoger Pau Monne1-6/+19
commit ef3a575baf53571dc405ee4028e26f50856898e7 upstream. Allow issuing an IOCTL_PRIVCMD_MMAP_RESOURCE ioctl with num = 0 and addr = 0 in order to fetch the size of a specific resource. Add a shortcut to the default map resource path, since fetching the size requires no address to be passed in, and thus no VMA to setup. This is missing from the initial implementation, and causes issues when mapping resources that don't have fixed or known sizes. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Cc: stable@vger.kernel.org # >= 4.18 Link: https://lore.kernel.org/r/20210112115358.23346-1-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30xenbus/xenbus_backend: Disallow pending watch messagesSeongJae Park1-0/+7
commit 9996bd494794a2fe393e97e7a982388c6249aa76 upstream. 'xenbus_backend' watches 'state' of devices, which is writable by guests. Hence, if guests intensively updates it, dom0 will have lots of pending events that exhausting memory of dom0. In other words, guests can trigger dom0 memory pressure. This is known as XSA-349. However, the watch callback of it, 'frontend_changed()', reads only 'state', so doesn't need to have the pending events. To avoid the problem, this commit disallows pending watch messages for 'xenbus_backend' using the 'will_handle()' watch callback. This is part of XSA-349 Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Reported-by: Michael Kurth <mku@amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30xen/xenbus: Count pending messages for each watchSeongJae Park1-11/+18
commit 3dc86ca6b4c8cfcba9da7996189d1b5a358a94fc upstream. This commit adds a counter of pending messages for each watch in the struct. It is used to skip unnecessary pending messages lookup in 'unregister_xenbus_watch()'. It could also be used in 'will_handle' callback. This is part of XSA-349 Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Reported-by: Michael Kurth <mku@amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30xen/xenbus/xen_bus_type: Support will_handle watch callbackSeongJae Park2-1/+4
commit be987200fbaceaef340872841d4f7af2c5ee8dc3 upstream. This commit adds support of the 'will_handle' watch callback for 'xen_bus_type' users. This is part of XSA-349 Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Reported-by: Michael Kurth <mku@amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path()SeongJae Park3-4/+9
commit 2e85d32b1c865bec703ce0c962221a5e955c52c2 upstream. Some code does not directly make 'xenbus_watch' object and call 'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This commit adds support of 'will_handle' callback in the 'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'. This is part of XSA-349 Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Reported-by: Michael Kurth <mku@amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30xen/xenbus: Allow watches discard events before queueingSeongJae Park2-1/+5
commit fed1755b118147721f2c87b37b9d66e62c39b668 upstream. If handling logics of watch events are slower than the events enqueue logic and the events can be created from the guests, the guests could trigger memory pressure by intensively inducing the events, because it will create a huge number of pending events that exhausting the memory. Fortunately, some watch events could be ignored, depending on its handler callback. For example, if the callback has interest in only one single path, the watch wouldn't want multiple pending events. Or, some watches could ignore events to same path. To let such watches to volutarily help avoiding the memory pressure situation, this commit introduces new watch callback, 'will_handle'. If it is not NULL, it will be called for each new event just before enqueuing it. Then, if the callback returns false, the event will be discarded. No watch is using the callback for now, though. This is part of XSA-349 Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Reported-by: Michael Kurth <mku@amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-09xen: don't use page->lru for ZONE_DEVICE memoryJuergen Gross2-16/+69
Commit 9e2369c06c8a18 ("xen: add helpers to allocate unpopulated memory") introduced usage of ZONE_DEVICE memory for foreign memory mappings. Unfortunately this collides with using page->lru for Xen backend private page caches. Fix that by using page->zone_device_data instead. Cc: <stable@vger.kernel.org> # 5.9 Fixes: 9e2369c06c8a18 ("xen: add helpers to allocate unpopulated memory") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovksy@oracle.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2020-12-09xen: add helpers for caching grant mapping pagesJuergen Gross2-49/+83
Instead of having similar helpers in multiple backend drivers use common helpers for caching pages allocated via gnttab_alloc_pages(). Make use of those helpers in blkback and scsiback. Cc: <stable@vger.kernel.org> # 5.9 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovksy@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2020-11-02swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_singleChristoph Hellwig1-2/+1
The tbl_dma_addr argument is used to check the DMA boundary for the allocations, and thus needs to be a dma_addr_t. swiotlb-xen instead passed a physical address, which could lead to incorrect results for strange offsets. Fix this by removing the parameter entirely and hard code the DMA address for io_tlb_start instead. Fixes: 91ffe4ad534a ("swiotlb-xen: introduce phys_to_dma/dma_to_phys translations") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-10-25Merge tag 'for-linus-5.10b-rc1c-tag' of ↵Linus Torvalds4-98/+82
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - a series for the Xen pv block drivers adding module parameters for better control of resource usge - a cleanup series for the Xen event driver * tag 'for-linus-5.