summaryrefslogtreecommitdiff
path: root/drivers/iommu/intel
AgeCommit message (Collapse)AuthorFilesLines
2024-10-15iommu/vt-d: Fix incorrect pci_for_each_dma_alias() for non-PCI devicesLu Baolu1-1/+3
Previously, the domain_context_clear() function incorrectly called pci_for_each_dma_alias() to set up context entries for non-PCI devices. This could lead to kernel hangs or other unexpected behavior. Add a check to only call pci_for_each_dma_alias() for PCI devices. For non-PCI devices, domain_context_clear_one() is called directly. Reported-by: Todd Brandt <todd.e.brandt@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219363 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219349 Fixes: 9a16ab9d6402 ("iommu/vt-d: Make context clearing consistent with context mapping") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20241014013744.102197-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-19Merge tag 'dma-mapping-6.12-2024-09-19' of ↵Linus Torvalds1-1/+0
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - support DMA zones for arm64 systems where memory starts at > 4GB (Baruch Siach, Catalin Marinas) - support direct calls into dma-iommu and thus obsolete dma_map_ops for many common configurations (Leon Romanovsky) - add DMA-API tracing (Sean Anderson) - remove the not very useful return value from various dma_set_* APIs (Christoph Hellwig) - misc cleanups and minor optimizations (Chen Y, Yosry Ahmed, Christoph Hellwig) * tag 'dma-mapping-6.12-2024-09-19' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: reflow dma_supported dma-mapping: reliably inform about DMA support for IOMMU dma-mapping: add tracing for dma-mapping API calls dma-mapping: use IOMMU DMA calls for common alloc/free page calls dma-direct: optimize page freeing when it is not addressable dma-mapping: clearly mark DMA ops as an architecture feature vdpa_sim: don't select DMA_OPS arm64: mm: keep low RAM dma zone dma-mapping: don't return errors from dma_set_max_seg_size dma-mapping: don't return errors from dma_set_seg_boundary dma-mapping: don't return errors from dma_set_min_align_mask scsi: check that busses support the DMA API before setting dma parameters arm64: mm: fix DMA zone when dma-ranges is missing dma-mapping: direct calls for dma-iommu dma-mapping: call ->unmap_page and ->unmap_sg unconditionally arm64: support DMA zone above 4GB dma-mapping: replace zone_dma_bits by zone_dma_limit dma-mapping: use bit masking to check VM_DMA_COHERENT
2024-09-18Merge tag 'perf-core-2024-09-18' of ↵Linus Torvalds2-111/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Implement per-PMU context rescheduling to significantly improve single-PMU performance, and related cleanups/fixes (Peter Zijlstra and Namhyung Kim) - Fix ancient bug resulting in a lot of events being dropped erroneously at higher sampling frequencies (Luo Gengkun) - uprobes enhancements: - Implement RCU-protected hot path optimizations for better performance: "For baseline vs SRCU, peak througput increased from 3.7 M/s (million uprobe triggerings per second) up to about 8 M/s. For uretprobes it's a bit more modest with bump from 2.4 M/s to 5 M/s. For SRCU vs RCU Tasks Trace, peak throughput for uprobes increases further from 8 M/s to 10.3 M/s (+28%!), and for uretprobes from 5.3 M/s to 5.8 M/s (+11%), as we have more work to do on uretprobes side. Even single-thread (no contention) performance is slightly better: 3.276 M/s to 3.396 M/s (+3.5%) for uprobes, and 2.055 M/s to 2.174 M/s (+5.8%) for uretprobes." (Andrii Nakryiko et al) - Document mmap_lock, don't abuse get_user_pages_remote() (Oleg Nesterov) - Cleanups & fixes to prepare for future work: - Remove uprobe_register_refctr() - Simplify error handling for alloc_uprobe() - Make uprobe_register() return struct uprobe * - Fold __uprobe_unregister() into uprobe_unregister() - Shift put_uprobe() from delete_uprobe() to uprobe_unregister() - BPF: Fix use-after-free in bpf_uprobe_multi_link_attach() (Oleg Nesterov) - New feature & ABI extension: allow events to use PERF_SAMPLE READ with inheritance, enabling sample based profiling of a group of counters over a hierarchy of processes or threads (Ben Gainey) - Intel uncore & power events updates: - Add Arrow Lake and Lunar Lake support - Add PERF_EV_CAP_READ_SCOPE - Clean up and enhance cpumask and hotplug support (Kan Liang) - Add LNL uncore iMC freerunning support - Use D0:F0 as a default device (Zhenyu Wang) - Intel PT: fix AUX snapshot handling race (Adrian Hunter) - Misc fixes and cleanups (James Clark, Jiri Olsa, Oleg Nesterov and Peter Zijlstra) * tag 'perf-core-2024-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) dmaengine: idxd: Clean up cpumask and hotplug for perfmon iommu/vt-d: Clean up cpumask and hotplug for perfmon perf/x86/intel/cstate: Clean up cpumask and hotplug perf: Add PERF_EV_CAP_READ_SCOPE perf: Generic hotplug support for a PMU with a scope uprobes: perform lockless SRCU-protected uprobes_tree lookup rbtree: provide rb_find_rcu() / rb_find_add_rcu() perf/uprobe: split uprobe_unregister() uprobes: travers uprobe's consumer list locklessly under SRCU protection uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks uprobes: protected uprobe lifetime with SRCU uprobes: revamp uprobe refcounting and lifetime management bpf: Fix use-after-free in bpf_uprobe_multi_link_attach() perf/core: Fix small negative period being ignored perf: Really fix event_function_call() locking perf: Optimize __pmu_ctx_sched_out() perf: Add context time freeze perf: Fix event_function_call() locking perf: Extract a few helpers perf: Optimize context reschedule for single PMU cases ...
2024-09-18Merge tag 'iommu-updates-v6.12' of ↵Linus Torvalds7-454/+548
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core changes: - Allow ATS on VF when parent device is identity mapped - Optimize unmap path on ARM io-pagetable implementation - Use of_property_present() ARM-SMMU changes: - SMMUv2: - Devicetree binding updates for Qualcomm MMU-500 implementations - Extend workarounds for broken Qualcomm hypervisor to avoid touching features that are not available (e.g. 16KiB page support, reserved context banks) - SMMUv3: - Support for NVIDIA's custom virtual command queue hardware - Fix Stage-2 stall configuration and extend tests to cover this area - A bunch of driver cleanups, including simplification of the master rbtree code - Minor cleanups and fixes across both drivers Intel VT-d changes: - Retire si_domain and convert to use static identity domain - Batched IOTLB/dev-IOTLB invalidation - Small code refactoring and cleanups AMD-Vi changes: - Cleanup and refactoring of io-pagetable code - Add parameter to limit the used io-pagesizes - Other cleanups and fixes" * tag 'iommu-updates-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (77 commits) dt-bindings: arm-smmu: Add compatible for QCS8300 SoC iommu/amd: Test for PAGING domains before freeing a domain iommu/amd: Fix argument order in amd_iommu_dev_flush_pasid_all() iommu/amd: Add kernel parameters to limit V1 page-sizes iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg iommu/arm-smmu-v3: Add types for each level of the CD table iommu/arm-smmu-v3: Shrink the cdtab l1_desc array iommu/arm-smmu-v3: Do not use devm for the cd table allocations iommu/arm-smmu-v3: Remove strtab_base/cfg iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg iommu/arm-smmu-v3: Add types for each level of the 2 level stream table iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx() iommu/arm-smmu-qcom: apply num_context_bank fixes for SDM630 / SDM660 iommu/arm-smmu-v3: Use the new rb tree helpers dt-bindings: arm-smmu: document the support on SA8255p iommu/tegra241-cmdqv: Do not allocate vcmdq until dma_set_mask_and_coherent iommu/tegra241-cmdqv: Drop static at local variable iommu/tegra241-cmdqv: Fix ioremap() error handling in probe() iommu/amd: Do not set the D bit on AMD v2 table entries iommu/amd: Correct the reported page sizes from the V1 table ...
2024-09-17Merge tag 'x86-apic-2024-09-17' of ↵Linus Torvalds1-6/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Thomas Gleixner: - Handle an allocation failure in the IO/APIC code gracefully instead of crashing the machine. - Remove support for APIC local destination mode on 64bit Logical destination mode of the local APIC is used for systems with up to 8 CPUs. It has an advantage over physical destination mode as it allows to target multiple CPUs at once with IPIs. That advantage was definitely worth it when systems with up to 8 CPUs were state of the art for servers and workstations, but that's history. In the recent past there were quite some reports of new laptops failing to boot with logical destination mode, but they work fine with physical destination mode. That's not a suprise because physical destination mode is guaranteed to work as it's the only way to get a CPU up and running via the INIT/INIT/STARTUP sequence. Some of the affected systems were cured by BIOS updates, but not all OEMs provide them. As the number of CPUs keep increasing, logical destination mode becomes less used and the benefit for small systems, like laptops, is not really worth the trouble. So just remove logical destination mode support for 64bit and be done with it. - Code and comment cleanups in the APIC area. * tag 'x86-apic-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Fix comment on IRQ vector layout x86/apic: Remove unused extern declarations x86/apic: Remove logical destination mode for 64-bit x86/apic: Remove unused inline function apic_set_eoi_cb() x86/ioapic: Cleanup remaining coding style issues x86/ioapic: Cleanup line breaks x86/ioapic: Cleanup bracket usage x86/ioapic: Cleanup comments x86/ioapic: Move replace_pin_at_irq_node() to the call site iommu/vt-d: Cleanup apic_printk() x86/mpparse: Cleanup apic_printk()s x86/ioapic: Cleanup guarded debug printk()s x86/ioapic: Cleanup apic_printk()s x86/apic: Cleanup apic_printk()s x86/apic: Provide apic_printk() helpers x86/ioapic: Use guard() for locking where applicable x86/ioapic: Cleanup structs x86/ioapic: Mark mp_alloc_timer_irq() __init x86/ioapic: Handle allocation failures gracefully
2024-09-13Merge branches 'fixes', 'arm/smmu', 'intel/vt-d', 'amd/amd-vi' and 'core' ↵Joerg Roedel7-454/+548
into next
2024-09-10iommu/vt-d: Clean up cpumask and hotplug for perfmonKan Liang2-111/+2
The iommu PMU is system-wide scope, which is supported by the generic perf_event subsystem now. Set the scope for the iommu PMU and remove all the cpumask and hotplug codes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240802151643.1691631-5-kan.liang@linux.intel.com
2024-09-02iommu/vt-d: Introduce batched cache invalidationTina Zhang1-15/+107
Converts IOTLB and Dev-IOTLB invalidation to a batched model. Cache tag invalidation requests for a domain are now accumulated in a qi_batch structure before being flushed in bulk. It replaces the previous per- request qi_flush approach with a more efficient batching mechanism. Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20240815065221.50328-5-tina.zhang@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Add qi_batch for dmar_domainLu Baolu5-1/+27
Introduces a qi_batch structure to hold batched cache invalidation descriptors on a per-dmar_domain basis. A fixed-size descriptor array is used for simplicity. The qi_batch is allocated when the first cache tag is added to the domain and freed during iommu_free_domain(). Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20240815065221.50328-4-tina.zhang@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Refactor IOTLB and Dev-IOTLB flush for batchingTina Zhang3-67/+83
Extracts IOTLB and Dev-IOTLB invalidation logic from cache tag flush interfaces into dedicated helper functions. It prepares the codebase for upcoming changes to support batched cache invalidations. To enable direct use of qi_flush helpers in the new functions, iommu->flush.flush_iotlb and quirk_extra_dev_tlb_flush() are opened up. No functional changes are intended. Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20240815065221.50328-3-tina.zhang@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Factor out invalidation descriptor compositionTina Zhang2-87/+115
Separate the logic for constructing IOTLB and device TLB invalidation descriptors from the qi_flush interfaces. New helpers, qi_desc(), are introduced to encapsulate this common functionality. Moving descriptor composition code to new helpers enables its reuse in the upcoming qi_batch interfaces. No functional changes are intended. Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20240815065221.50328-2-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Unconditionally flush device TLB for pasid table updatesLu Baolu1-9/+3
The caching mode of an IOMMU is irrelevant to the behavior of the device TLB. Previously, commit <304b3bde24b5> ("iommu/vt-d: Remove caching mode check before device TLB flush") removed this redundant check in the domain unmap path. Checking the caching mode before flushing the device TLB after a pasid table entry is updated is unnecessary and can lead to inconsistent behavior. Extends this consistency by removing the caching mode check in the pasid table update path. Suggested-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240820030208.20020-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Move PCI PASID enablement to probe pathLu Baolu1-14/+15
Currently, PCI PASID is enabled alongside PCI ATS when an iommu domain is attached to the device and disabled when the device transitions to block translation mode. This approach is inappropriate as PCI PASID is a device feature independent of the type of the attached domain. Enable PCI PASID during the IOMMU device probe and disables it during the release path. Suggested-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240819051805.116936-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Fix potential lockup if qi_submit_sync called with 0 countSanjay K Kumar1-5/+11
If qi_submit_sync() is invoked with 0 invalidation descriptors (for instance, for DMA draining purposes), we can run into a bug where a submitting thread fails to detect the completion of invalidation_wait. Subsequently, this led to a soft lockup. Currently, there is no impact by this bug on the existing users because no callers are submitting invalidations with 0 descriptors. This fix will enable future users (such as DMA drain) calling qi_submit_sync() with 0 count. Suppose thread T1 invokes qi_submit_sync() with non-zero descriptors, while concurrently, thread T2 calls qi_submit_sync() with zero descriptors. Both threads then enter a while loop, waiting for their respective descriptors to complete. T1 detects its completion (i.e., T1's invalidation_wait status changes to QI_DONE by HW) and proceeds to call reclaim_free_desc() to reclaim all descriptors, potentially including adjacent ones of other threads that are also marked as QI_DONE. During this time, while T2 is waiting to acquire the qi->q_lock, the IOMMU hardware may complete the invalidation for T2, setting its status to QI_DONE. However, if T1's execution of reclaim_free_desc() frees T2's invalidation_wait descriptor and changes its status to QI_FREE, T2 will not observe the QI_DONE status for its invalidation_wait and will indefinitely remain stuck. This soft lockup does not occur when only non-zero descriptors are submitted.In such cases, invalidation descriptors are interspersed among wait descriptors with the status QI_IN_USE, acting as barriers. These barriers prevent the reclaim code from mistakenly freeing descriptors belonging to other submitters. Considered the following example timeline: T1 T2 ======================================== ID1 WD1 while(WD1!=QI_DONE) unlock lock WD1=QI_DONE* WD2 while(WD2!=QI_DONE) unlock lock WD1==QI_DONE? ID1=QI_DONE WD2=DONE* reclaim() ID1=FREE WD1=FREE WD2=FREE unlock soft lockup! T2 never sees QI_DONE in WD2 Where: ID = invalidation descriptor WD = wait descriptor * Written by hardware The root of the problem is that the descriptor status QI_DONE flag is used for two conflicting purposes: 1. signal a descriptor is ready for reclaim (to be freed) 2. signal by the hardware that a wait descriptor is complete The solution (in this patch) is state separation by using QI_FREE flag for #1. Once a thread's invalidation descriptors are complete, their status would be set to QI_FREE. The reclaim_free_desc() function would then only free descriptors marked as QI_FREE instead of those marked as QI_DONE. This change ensures that T2 (from the previous example) will correctly observe the completion of its invalidation_wait (marked as QI_DONE). Signed-off-by: Sanjay K Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240728210059.1964602-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Cleanup si_domainLu Baolu1-72/+19
The static identity domain has been introduced, rendering the si_domain obsolete. Remove si_domain and cleanup the code accordingly. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240809055431.36513-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Add support for static identity domainLu Baolu2-5/+111
Software determines VT-d hardware support for passthrough translation by inspecting the capability register. If passthrough translation is not supported, the device is instructed to use DMA domain for its default domain. Add a global static identity domain with guaranteed attach semantics for IOMMUs that support passthrough translation mode. The global static identity domain is a dummy domain without corresponding dmar_domain structure. Consequently, the device's info->domain will be NULL with the identity domain is attached. Refactor the code accordingly. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240809055431.36513-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Factor out helpers from domain_context_mapping_one()Lu Baolu1-41/+58
Extract common code from domain_context_mapping_one() into new helpers, making it reusable by other functions such as the upcoming identity domain implementation. No intentional functional changes. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20240809055431.36513-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Remove has_iotlb_device flagLu Baolu3-37/+1
The has_iotlb_device flag was used to indicate if a domain had attached devices with ATS enabled. Domains without this flag didn't require device TLB invalidation during unmap operations, optimizing performance by avoiding unnecessary device iteration. With the introduction of cache tags, this flag is no longer needed. The code to iterate over attached devices was removed by commit 06792d067989 ("iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi()"). Remove has_iotlb_device to avoid unnecessary code. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240809055431.36513-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Always reserve a domain ID for identity setupLu Baolu1-3/+3
We will use a global static identity domain. Reserve a static domain ID for it. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20240809055431.36513-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Remove identity mappings from si_domainLu Baolu1-118/+4
As the driver has enforced DMA domains for devices managed by an IOMMU hardware that doesn't support passthrough translation mode, there is no need for static identity mappings in the si_domain. Remove the identity mapping code to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240809055431.36513-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Require DMA domain if hardware not support passthroughLu Baolu1-0/+10
The iommu core defines the def_domain_type callback to query the iommu driver about hardware capability and quirks. The iommu driver should declare IOMMU_DOMAIN_DMA requirement for hardware lacking pass-through capability. Earlier VT-d hardware implementations did not support pass-through translation mode. The iommu driver relied on a paging domain with all physical system memory addresses identically mapped to the same IOVA to simulate pass-through translation before the def_domain_type was introduced and it has been kept until now. It's time to adjust it now to make the Intel iommu driver follow the def_domain_type semantics. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20240809055431.36513-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-08-30iommu: Allow ATS to work on VFs when the PF uses IDENTITYJason Gunthorpe1-0/+1
PCI ATS has a global Smallest Translation Unit field that is located in the PF but shared by all of the VFs. The expectation is that the STU will be set to the root port's global STU capability which is driven by the IO page table configuration of the iommu HW. Today it becomes set when the iommu driver first enables ATS. Thus, to enable ATS on the VF, the PF must have already had the correct STU programmed, even if ATS is off on the PF. Unfortunately the PF only programs the STU when the PF enables ATS. The iommu drivers tend to leave ATS disabled when IDENTITY translation is being used. Thus we can get into a state where the PF is setup to use IDENTITY with the DMA API while the VF would like to use VFIO with a PAGING domain and have ATS turned on. This fails because the PF never loaded a PAGING domain and so it never setup the STU, and the VF can't do it. The simplest solution is to have the iommu driver set the ATS STU when it probes the device. This way the ATS STU is loaded immediately at boot time to all PFs and there is no issue when a VF comes to use it. Add a new call pci_prepare_ats() which should be called by iommu drivers in their probe_device() op for every PCI device if the iommu driver supports ATS. This will setup the STU based on whatever page size capability the iommu HW has. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-0fb4d2ab6770+7e706-ats_vf_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-08-26iommu/vt-d: Fix incorrect domain ID in context flush helperLu Baolu3-6/+11
The helper intel_context_flush_present() is designed to flush all related caches when a context entry with the present bit set is modified. It currently retrieves the domain ID from the context entry and uses it to flush the IOTLB and context caches. This is incorrect when the context entry transitions from present to non-present, as the domain ID field is cleared before calling the helper. Fix it by passing the domain ID programmed in the context entry before the change to intel_context_flush_present(). This ensures that the correct domain ID is used for cache invalidation. Fixes: f90584f4beb8 ("iommu/vt-d: Add helper to flush caches for context change") Reported-by: Alex Williamson <alex.williamson@redhat.com> Closes: https://lore.kernel.org/linux-iommu/20240814162726.5efe1a6e.alex.williamson@redhat.com/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jacob Pan <jacob.pan@linux.microsoft.com> Link: https://lore.kernel.org/r/20240815124857.70038-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-08-22dma-mapping: direct calls for dma-iommuLeon Romanovsky1-1/+0
Directly call into dma-iommu just like we have been doing for dma-direct for a while. This avoids the indirect call overhead for IOMMU ops and removes the need to have DMA ops entirely for many common configurations. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-08-07iommu/vt-d: Cleanup apic_printk()Thomas Gleixner1-6/+5
Use the new apic_pr_verbose() helper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Tested-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/all/20240802155440.843266805@linutronix.de
2024-07-19Merge tag 'iommu-updates-v6.11' of ↵Linus Torvalds7-101/+249
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Will Deacon: "Core: - Support for the "ats-supported" device-tree property - Removal of the 'ops' field from 'struct iommu_fwspec' - Introduction of iommu_paging_domain_alloc() and partial conversion of existing users - Introduce 'struct iommu_attach_handle' and provide corresponding IOMMU interfaces which will be used by the IOMMUFD subsystem - Remove stale documentation - Add missing MODULE_DESCRIPTION() macro - Misc cleanups Allwinner Sun50i: - Ensure bypass mode is disabled on H616 SoCs - Ensure page-tables are allocated below 4GiB for the 32-bit page-table walker - Add new device-tree compatible strings AMD Vi: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte Arm SMMUv2: - Print much more useful information on context faults - Fix Qualcomm TBU probing when CONFIG_ARM_SMMU_QCOM_DEBUG=n - Add new Qualcomm device-tree bindings Arm SMMUv3: - Support for hardware update of access/dirty bits and reporting via IOMMUFD - More driver rework from Jason, this time updating the PASID/SVA support to prepare for full IOMMUFD support - Add missing MODULE_DESCRIPTION() macro - Minor fixes and cleanups NVIDIA Tegra: - Fix for benign fwspec initialisation issue exposed by rework on the core branch Intel VT-d: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte - Use READ_ONCE() to read volatile descriptor status - Remove support for handling Execute-Requested requests - Avoid calling iommu_domain_alloc() - Minor fixes and refactoring Qualcomm MSM: - Updates to the device-tree bindings" * tag 'iommu-updates-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (72 commits) iommu/tegra-smmu: Pass correct fwnode to iommu_fwspec_init() iommu/vt-d: Fix identity map bounds in si_domain_init() iommu: Move IOMMU_DIRTY_NO_CLEAR define dt-bindings: iommu: Convert msm,iommu-v0 to yaml iommu/vt-d: Fix aligned pages in calculate_psi_aligned_address() iommu/vt-d: Limit max address mask to MAX_AGAW_PFN_WIDTH docs: iommu: Remove outdated Documentation/userspace-api/iommu.rst arm64: dts: fvp: Enable PCIe ATS for Base RevC FVP iommu/of: Support ats-supported device-tree property dt-bindings: PCI: generic: Add ats-supported property iommu: Remove iommu_fwspec ops OF: Simplify of_iommu_configure() ACPI: Retire acpi_iommu_fwspec_ops() iommu: Resolve fwspec ops automatically iommu/mediatek-v1: Clean up redundant fwspec checks RDMA/usnic: Use iommu_paging_domain_alloc() wifi: ath11k: Use iommu_paging_domain_alloc() wifi: ath10k: Use iommu_paging_domain_alloc() drm/msm: Use iommu_paging_domain_alloc() vhost-vdpa: Use iommu_paging_domain_alloc() ...
2024-07-12iommu/vt-d: Fix identity map bounds in si_domain_init()Jon Pan-Doh1-1/+1
Intel IOMMU operates on inclusive bounds (both generally aas well as iommu_domain_identity_map()). Meanwhile, for_each_mem_pfn_range() uses exclusive bounds for end_pfn. This creates an off-by-one error when switching between the two. Fixes: c5395d5c4a82 ("intel-iommu: Clean up iommu_domain_identity_map()") Signed-off-by: Jon Pan-Doh <pandoh@google.com> Tested-by: Sudheer Dantuluri <dantuluris@google.com> Suggested-by: Gary Zibrat <gzibrat@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240709234913.2749386-1-pandoh@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-10iommu/vt-d: Fix aligned pages in calculate_psi_aligned_address()Lu Baolu1-0/+1
The helper calculate_psi_aligned_address() is used to convert an arbitrary range into a size-aligned one. The aligned_pages variable is calculated from input start and end, but is not adjusted when the start pfn is not aligned and the mask is adjusted, which results in an incorrect number of pages returned. The number of pages is used by qi_flush_piotlb() to flush caches for the first-stage translation. With the wrong number of pages, the cache is not synchronized, leading to inconsistencies in some cases. Fixes: c4d27ffaa8eb ("iommu/vt-d: Add cache tag invalidation helpers") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240709152643.28109-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-10iommu/vt-d: Limit max address mask to MAX_AGAW_PFN_WIDTHLu Baolu1-1/+1
Address mask specifies the number of low order bits of the address field that must be masked for the invalidation operation. Since address bits masked start from bit 12, the max address mask should be MAX_AGAW_PFN_WIDTH, as defined in Table 19 ("Invalidate Descriptor Address Mask Encodings") of the spec. Limit the max address mask returned from calculate_psi_aligned_address() to MAX_AGAW_PFN_WIDTH to prevent potential integer overflow in the following code: qi_flush_dev_iotlb(): ... addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1; ... Fixes: c4d27ffaa8eb ("iommu/vt-d: Add cache tag invalidation helpers") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240709152643.28109-2-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Refactor PCI PRI enabling/disabling callbacksLu Baolu3-7/+61
Commit 0095bf83554f8 ("iommu: Improve iopf_queue_remove_device()") specified the flow for disabling the PRI on a device. Refactor the PRI callbacks in the intel iommu driver to better manage PRI enabling and disabling and align it with the device queue interfaces in the iommu core. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240701112317.94022-3-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240702130839.108139-8-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Add helper to flush caches for context changeLu Baolu3-50/+92
This helper is used to flush the related caches following a change in a context table entry that was previously present. The VT-d specification provides guidance for such invalidations in section 6.5.3.3. This helper replaces the existing open code in the code paths where a present context entry is being torn down. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240701112317.94022-2-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240702130839.108139-7-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Add helper to allocate paging domainLu Baolu1-9/+81
The domain_alloc_user operation is currently implemented by allocating a paging domain using iommu_domain_alloc(). This is because it needs to fully initialize the domain before return. Add a helper to do this to avoid using iommu_domain_alloc(). Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240610085555.88197-16-baolu.lu@linux.intel.com Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20240702130839.108139-6-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Downgrade warning for pre-enabled IRLu Baolu1-2/+2
Emitting a warning is overkill in intel_setup_irq_remapping() since the interrupt remapping is pre-enabled. For example, there's no guarantee that kexec will explicitly disable interrupt remapping before booting a new kernel. As a result, users are seeing warning messages like below when they kexec boot a kernel, though there is nothing wrong: DMAR-IR: IRQ remapping was enabled on dmar18 but we are not in kdump mode DMAR-IR: IRQ remapping was enabled on dmar17 but we are not in kdump mode DMAR-IR: IRQ remapping was enabled on dmar16 but we are not in kdump mode ... ... Downgrade the severity of this message to avoid user confusion. CC: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://lore.kernel.org/linux-iommu/5517f76a-94ad-452c-bae6-34ecc0ec4831@molgen.mpg.de/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240625043912.258036-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240702130839.108139-5-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Remove control over Execute-Requested requestsLu Baolu4-17/+4
The VT-d specification has removed architectural support of the requests with pasid with a value of 1 for Execute-Requested (ER). And the NXE bit in the pasid table entry and XD bit in the first-stage paging Entries are deprecated accordingly. Remove the programming of these bits to make it consistent with the spec. Suggested-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240624032351.249858-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240702130839.108139-4-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Remove comment for def_domain_typeLu Baolu1-11/+0
The comment for def_domain_type is outdated. Part of it is irrelevant. Furthermore, it could just be deleted since the iommu_ops::def_domain_type callback is properly documented in iommu.h, so individual implementations shouldn't need to repeat that. Remove it to avoid confusion. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240624024327.234979-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240702130839.108139-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Handle volatile descriptor status readJacob Pan1-1/+1
Queued invalidation wait descriptor status is volatile in that IOMMU hardware writes the data upon completion. Use READ_ONCE() to prevent compiler optimizations which ensures memory reads every time. As a side effect, READ_ONCE() also enforces strict types and may add an extra instruction. But it should not have negative performance impact since we use cpu_relax anyway and the extra time(by adding an instruction) may allow IOMMU HW request cacheline ownership easier. e.g. gcc 12.3 BEFORE: 81 38 ad de 00 00 cmpl $0x2,(%rax) AFTER (with READ_ONCE()) 772f: 8b 00 mov (%rax),%eax 7731: 3d ad de 00 00 cmp $0x2,%eax //status data is 32 bit Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20240607173817.3914600-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240702130839.108139-2-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-06-27iommu/vt-d: Fix missed device TLB cache tagLu Baolu1-10/+10
When a domain is attached to a device, the required cache tags are assigned to the domain so that the related caches can be flushed whenever it is needed. The device TLB cache tag is created based on whether the ats_enabled field of the device's iommu data is set. This creates an ordered dependency between cache tag assignment and ATS enabling. The device TLB cache tag would not be created if device's ATS is enabled after the cache tag assignment. This causes devices with PCI ATS support to malfunction. The ATS control is exclusively owned by the iommu driver. Hence, move cache_tag_assign_domain() after PCI ATS enabling to make sure that the device TLB cache tag is created for the domain. Fixes: 3b1d9e2b2d68 ("iommu/vt-d: Add cache tag assignment interface") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240620062940.201786-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-06-25iommu/vt-d: Use try_cmpxchg64() in intel_pasid_get_entry()Uros Bizjak1-2/+5
Use try_cmpxchg64() instead of cmpxchg64 (*ptr, old, new) != old in intel_pasid_get_entry(). cmpxchg returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240522082729.971123-2-ubizjak@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-05-22tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)1-7/+7
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-21Merge tag 'pci-v6.10-changes' of ↵Linus Torvalds1-16/+3
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Skip E820 checks for MCFG ECAM regions for new (2016+) machines, since there's no requirement to describe them in E820 and some platforms require ECAM to work (Bjorn Helgaas) - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien Le Moal) - Remove last user and pci_enable_device_io() (Heiner Kallweit) - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen) - Skip waiting for devices that have been disconnected while suspended (Ilpo Järvinen) - Clear Secondary Status errors after enumeration since Master Aborts and Unsupported Request errors are an expected part of enumeration (Vidya Sagar) MSI: - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas) Error handling: - Mask Genesys GL975x SD host controller Replay Timer Timeout correctable errors caused by a hardware defect; the errors cause interrupts that prevent system suspend (Kai-Heng Feng) - Fix EDR-related _DSM support, which previously evaluated revision 5 but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan) ASPM: - Simplify link state definitions and mask calculation (Ilpo Järvinen) Power management: - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS apparently doesn't know how to put them back in D0 (Mario Limonciello) CXL: - Support resetting CXL devices; special handling required because CXL Ports mask Secondary Bus Reset by default (Dave Jiang) DOE: - Support DOE Discovery Version 2 (Alexey Kardashevskiy) Endpoint framework: - Set endpoint BAR to be 64-bit if the driver says that's all the device supports, in addition to doing so if the size is >2GB (Niklas Cassel) - Simplify endpoint BAR allocation and setting interfaces (Niklas Cassel) Cadence PCIe controller driver: - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof Kozlowski) Cadence PCIe endpoint driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) Freescale Layerscape PCIe controller driver: - Convert DT binding to YAML (Frank Li) MediaTek MT7621 PCIe controller driver: - Add DT binding missing 'reg' property for child Root Ports (Krzysztof Kozlowski) - Fix theoretical string truncation in PHY name (Sergio Paracuellos) NVIDIA Tegra194 PCIe controller driver: - Return success for endpoint probe instead of falling through to the f