<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/dma/direct.c, branch v6.18.21</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>dma-direct: Fix missing sg_dma_len assignment in P2PDMA bus mappings</title>
<updated>2025-11-26T20:47:13+00:00</updated>
<author>
<name>Pranjal Shrivastava</name>
<email>praan@google.com</email>
</author>
<published>2025-11-26T11:41:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d0d08f4bd7f667dc7a65cd7133c0a94a6f02aca3'/>
<id>d0d08f4bd7f667dc7a65cd7133c0a94a6f02aca3</id>
<content type='text'>
Prior to commit a25e7962db0d7 ("PCI/P2PDMA: Refactor the p2pdma mapping
helpers"), P2P segments were mapped using the pci_p2pdma_map_segment()
helper. This helper was responsible for populating sg-&gt;dma_address,
marking the bus address, and also setting sg_dma_len(sg).

The refactor[1] removed this helper and moved the mapping logic directly
into the callers. While iommu_dma_map_sg() was correctly updated to set
the length in the new flow, it was missed in dma_direct_map_sg().

Thus, in dma_direct_map_sg(), the PCI_P2PDMA_MAP_BUS_ADDR case sets the
dma_address and marks the segment, but immediately executes 'continue',
which causes the loop to skip the standard assignment logic at the end:

    sg_dma_len(sg) = sg-&gt;length;

As a result, when CONFIG_NEED_SG_DMA_LENGTH is enabled, the dma_length
field remains uninitialized (zero) for P2P bus address mappings. This
breaks upper-layer drivers (for e.g. RDMA/IB) that rely on sg_dma_len()
to determine the transfer size.

Fix this by explicitly setting the DMA length in the
PCI_P2PDMA_MAP_BUS_ADDR case before continuing to the next scatterlist
entry.

Fixes: a25e7962db0d7 ("PCI/P2PDMA: Refactor the p2pdma mapping helpers")
Reported-by: Jacob Moroni &lt;jmoroni@google.com&gt;
Signed-off-by: Pranjal Shrivastava &lt;praan@google.com&gt;

[1]
https://lore.kernel.org/all/ac14a0e94355bf898de65d023ccf8a2ad22a3ece.1746424934.git.leon@kernel.org/

Reviewed-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Reviewed-by: Shivaji Kant &lt;shivajikant@google.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20251126114112.3694469-1-praan@google.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Prior to commit a25e7962db0d7 ("PCI/P2PDMA: Refactor the p2pdma mapping
helpers"), P2P segments were mapped using the pci_p2pdma_map_segment()
helper. This helper was responsible for populating sg-&gt;dma_address,
marking the bus address, and also setting sg_dma_len(sg).

The refactor[1] removed this helper and moved the mapping logic directly
into the callers. While iommu_dma_map_sg() was correctly updated to set
the length in the new flow, it was missed in dma_direct_map_sg().

Thus, in dma_direct_map_sg(), the PCI_P2PDMA_MAP_BUS_ADDR case sets the
dma_address and marks the segment, but immediately executes 'continue',
which causes the loop to skip the standard assignment logic at the end:

    sg_dma_len(sg) = sg-&gt;length;

As a result, when CONFIG_NEED_SG_DMA_LENGTH is enabled, the dma_length
field remains uninitialized (zero) for P2P bus address mappings. This
breaks upper-layer drivers (for e.g. RDMA/IB) that rely on sg_dma_len()
to determine the transfer size.

Fix this by explicitly setting the DMA length in the
PCI_P2PDMA_MAP_BUS_ADDR case before continuing to the next scatterlist
entry.

Fixes: a25e7962db0d7 ("PCI/P2PDMA: Refactor the p2pdma mapping helpers")
Reported-by: Jacob Moroni &lt;jmoroni@google.com&gt;
Signed-off-by: Pranjal Shrivastava &lt;praan@google.com&gt;

[1]
https://lore.kernel.org/all/ac14a0e94355bf898de65d023ccf8a2ad22a3ece.1746424934.git.leon@kernel.org/

Reviewed-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Reviewed-by: Shivaji Kant &lt;shivajikant@google.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20251126114112.3694469-1-praan@google.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: export new dma_*map_phys() interface</title>
<updated>2025-09-11T22:18:21+00:00</updated>
<author>
<name>Leon Romanovsky</name>
<email>leonro@nvidia.com</email>
</author>
<published>2025-09-09T13:27:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=f7326196a781622b33bfbdabb00f5e72b5fb5679'/>
<id>f7326196a781622b33bfbdabb00f5e72b5fb5679</id>
<content type='text'>
Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
that operate directly on physical addresses instead of page+offset
parameters. This provides a more efficient interface for drivers that
already have physical addresses available.

The new functions are implemented as the primary mapping layer, with
the existing dma_map_page_attrs()/dma_map_resource() and
dma_unmap_page_attrs()/dma_unmap_resource() functions converted to simple
wrappers around the phys-based implementations.

In case dma_map_page_attrs(), the struct page is converted to physical
address with help of page_to_phys() function and dma_map_resource()
provides physical address as is together with addition of DMA_ATTR_MMIO
attribute.

The old page-based API is preserved in mapping.c to ensure that existing
code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
variant for dma_*map_phys().

Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/54cc52af91777906bbe4a386113437ba0bcfba9c.1757423202.git.leonro@nvidia.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
that operate directly on physical addresses instead of page+offset
parameters. This provides a more efficient interface for drivers that
already have physical addresses available.

The new functions are implemented as the primary mapping layer, with
the existing dma_map_page_attrs()/dma_map_resource() and
dma_unmap_page_attrs()/dma_unmap_resource() functions converted to simple
wrappers around the phys-based implementations.

In case dma_map_page_attrs(), the struct page is converted to physical
address with help of page_to_phys() function and dma_map_resource()
provides physical address as is together with addition of DMA_ATTR_MMIO
attribute.

The old page-based API is preserved in mapping.c to ensure that existing
code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
variant for dma_*map_phys().

Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/54cc52af91777906bbe4a386113437ba0bcfba9c.1757423202.git.leonro@nvidia.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: convert dma_direct_*map_page to be phys_addr_t based</title>
<updated>2025-09-11T22:18:20+00:00</updated>
<author>
<name>Leon Romanovsky</name>
<email>leonro@nvidia.com</email>
</author>
<published>2025-09-09T13:27:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=e53d29f957b36ba1666331956c6ccb047bb157d2'/>
<id>e53d29f957b36ba1666331956c6ccb047bb157d2</id>
<content type='text'>
Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.

The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).

Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().

The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks
to allow integration with dma_direct_map_resource and dma_direct_map_phys()
is extended to support MMIO path either.

Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/bb15a22f76dc2e26683333ff54e789606cfbfcf0.1757423202.git.leonro@nvidia.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.

The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).

Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().

The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks
to allow integration with dma_direct_map_resource and dma_direct_map_phys()
is extended to support MMIO path either.

Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/bb15a22f76dc2e26683333ff54e789606cfbfcf0.1757423202.git.leonro@nvidia.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-direct: clean up the logic in __dma_direct_alloc_pages()</title>
<updated>2025-08-11T09:28:04+00:00</updated>
<author>
<name>Petr Tesarik</name>
<email>ptesarik@suse.com</email>
</author>
<published>2025-07-10T08:38:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=9f683dfe8099639f9ac859287744a9ed1c3698a0'/>
<id>9f683dfe8099639f9ac859287744a9ed1c3698a0</id>
<content type='text'>
Convert a goto-based loop to a while() loop. To allow the simplification,
return early when allocation from CMA is successful. As a bonus, this early
return avoids a repeated dma_coherent_ok() check.

No functional change.

Signed-off-by: Petr Tesarik &lt;ptesarik@suse.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20250710083829.1853466-1-ptesarik@suse.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Convert a goto-based loop to a while() loop. To allow the simplification,
return early when allocation from CMA is successful. As a bonus, this early
return avoids a repeated dma_coherent_ok() check.

No functional change.

Signed-off-by: Petr Tesarik &lt;ptesarik@suse.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20250710083829.1853466-1-ptesarik@suse.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h</title>
<updated>2025-05-06T06:36:53+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-05-05T07:01:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ca2c2e4a78c619bcf1c1df29fee5981035f3fcbc'/>
<id>ca2c2e4a78c619bcf1c1df29fee5981035f3fcbc</id>
<content type='text'>
To support the upcoming non-scatterlist mapping helpers, we need to go
back to have them called outside of the DMA API.  Thus move them out of
dma-map-ops.h, which is only for DMA API implementations to pci-p2pdma.h,
which is for driver use.

Note that the core helper is still not exported as the mapping is
expected to be done only by very highlevel subsystem code at least for
now.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Tested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To support the upcoming non-scatterlist mapping helpers, we need to go
back to have them called outside of the DMA API.  Thus move them out of
dma-map-ops.h, which is only for DMA API implementations to pci-p2pdma.h,
which is for driver use.

Note that the core helper is still not exported as the mapping is
expected to be done only by very highlevel subsystem code at least for
now.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Tested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI/P2PDMA: Refactor the p2pdma mapping helpers</title>
<updated>2025-05-06T06:36:53+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-05-05T07:01:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=a25e7962db0d79882c9d32fd2a67a02e79129c0e'/>
<id>a25e7962db0d79882c9d32fd2a67a02e79129c0e</id>
<content type='text'>
The current scheme with a single helper to determine the P2P status
and map a scatterlist segment force users to always use the map_sg
helper to DMA map, which we're trying to get away from because they
are very cache inefficient.

Refactor the code so that there is a single helper that checks the P2P
state for a page, including the result that it is not a P2P page to
simplify the callers, and a second one to perform the address translation
for a bus mapped P2P transfer that does not depend on the scatterlist
structure.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Tested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current scheme with a single helper to determine the P2P status
and map a scatterlist segment force users to always use the map_sg
helper to DMA map, which we're trying to get away from because they
are very cache inefficient.

Refactor the code so that there is a single helper that checks the P2P
state for a page, including the result that it is not a P2P page to
simplify the callers, and a second one to perform the address translation
for a bus mapped P2P transfer that does not depend on the scatterlist
structure.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Tested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: fix missing clear bdr in check_ram_in_range_map()</title>
<updated>2025-03-12T12:41:44+00:00</updated>
<author>
<name>Baochen Qiang</name>
<email>quic_bqiang@quicinc.com</email>
</author>
<published>2025-03-07T03:03:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=8324993f60305e50f27b98358b01b9837e10d159'/>
<id>8324993f60305e50f27b98358b01b9837e10d159</id>
<content type='text'>
As discussed in [1], if 'bdr' is set once, it would never get
cleared, hence 0 is always returned.

Refactor the range check hunk into a new helper dma_find_range(),
which allows 'bdr' to be cleared in each iteration.

Link: https://lore.kernel.org/all/64931fac-085b-4ff3-9314-84bac2fa9bdb@quicinc.com/ # [1]
Fixes: a409d9600959 ("dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM")
Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Baochen Qiang &lt;quic_bqiang@quicinc.com&gt;
Link: https://lore.kernel.org/r/20250307030350.69144-1-quic_bqiang@quicinc.com
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As discussed in [1], if 'bdr' is set once, it would never get
cleared, hence 0 is always returned.

Refactor the range check hunk into a new helper dma_find_range(),
which allows 'bdr' to be cleared in each iteration.

Link: https://lore.kernel.org/all/64931fac-085b-4ff3-9314-84bac2fa9bdb@quicinc.com/ # [1]
Fixes: a409d9600959 ("dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM")
Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Baochen Qiang &lt;quic_bqiang@quicinc.com&gt;
Link: https://lore.kernel.org/r/20250307030350.69144-1-quic_bqiang@quicinc.com
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-direct: optimize page freeing when it is not addressable</title>
<updated>2024-09-04T04:08:51+00:00</updated>
<author>
<name>Chen Yu</name>
<email>yu.c.chen@intel.com</email>
</author>
<published>2024-08-31T11:01:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=f689a3ab7b8ece9e5787ff058b96b8630e4931ad'/>
<id>f689a3ab7b8ece9e5787ff058b96b8630e4931ad</id>
<content type='text'>
When the CMA allocation succeeds but isn't addressable, its buffer has
already been released and the page is set to NULL.  So later when the
normal page allocation succeeds but isn't addressable, __free_pages()
can be used to free that normal page rather than using
dma_free_contiguous that does extra checks that are not needed.

Signed-off-by: Chen Yu &lt;yu.c.chen@intel.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the CMA allocation succeeds but isn't addressable, its buffer has
already been released and the page is set to NULL.  So later when the
normal page allocation succeeds but isn't addressable, __free_pages()
can be used to free that normal page rather than using
dma_free_contiguous that does extra checks that are not needed.

Signed-off-by: Chen Yu &lt;yu.c.chen@intel.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: replace zone_dma_bits by zone_dma_limit</title>
<updated>2024-08-22T04:18:00+00:00</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2024-08-11T07:09:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ba0fb44aed47693cc2482427f63ba6cd19051327'/>
<id>ba0fb44aed47693cc2482427f63ba6cd19051327</id>
<content type='text'>
The hardware DMA limit might not be power of 2. When RAM range starts
above 0, say 4GB, DMA limit of 30 bits should end at 5GB.  A single high
bit can not encode this limit.

Use a plain  address for the DMA zone limit instead.

Since the DMA zone can now potentially span beyond 4GB physical limit of
DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Co-developed-by: Baruch Siach &lt;baruch@tkos.co.il&gt;
Signed-off-by: Baruch Siach &lt;baruch@tkos.co.il&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Petr Tesarik &lt;ptesarik@suse.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The hardware DMA limit might not be power of 2. When RAM range starts
above 0, say 4GB, DMA limit of 30 bits should end at 5GB.  A single high
bit can not encode this limit.

Use a plain  address for the DMA zone limit instead.

Since the DMA zone can now potentially span beyond 4GB physical limit of
DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Co-developed-by: Baruch Siach &lt;baruch@tkos.co.il&gt;
Signed-off-by: Baruch Siach &lt;baruch@tkos.co.il&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Petr Tesarik &lt;ptesarik@suse.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>swiotlb: reduce swiotlb pool lookups</title>
<updated>2024-07-10T05:59:03+00:00</updated>
<author>
<name>Michael Kelley</name>
<email>mhklinux@outlook.com</email>
</author>
<published>2024-07-08T19:41:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=7296f2301a057493e97b07739213c6e864f76891'/>
<id>7296f2301a057493e97b07739213c6e864f76891</id>
<content type='text'>
With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
places, the pool is found and used in one function, and then must
be found again in the next function that is called because only the
tlb_addr is passed as an argument. These are the six call sites:

dma_direct_map_page:
 1. swiotlb_map -&gt; swiotlb_tbl_map_single -&gt; swiotlb_bounce

dma_direct_unmap_page:
 2. dma_direct_sync_single_for_cpu -&gt; is_swiotlb_buffer
 3. dma_direct_sync_single_for_cpu -&gt; swiotlb_sync_single_for_cpu -&gt;
	swiotlb_bounce
 4. is_swiotlb_buffer
 5. swiotlb_tbl_unmap_single -&gt; swiotlb_del_transient
 6. swiotlb_tbl_unmap_single -&gt; swiotlb_release_slots

Reduce the number of calls by finding the pool at a higher level, and
passing it as an argument instead of searching again. A key change is
for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
and then pass this pool pointer to subsequent swiotlb functions.

There are 9 occurrences of is_swiotlb_buffer() used to test if a buffer
is a swiotlb buffer before calling a swiotlb function. To reduce code
duplication in getting the pool pointer and passing it as an argument,
introduce inline wrappers for this pattern. The generated code is
essentially unchanged.

Since is_swiotlb_buffer() no longer returns a boolean, rename some
functions to reflect the change:

 * swiotlb_find_pool() becomes __swiotlb_find_pool()
 * is_swiotlb_buffer() becomes swiotlb_find_pool()
 * is_xen_swiotlb_buffer() becomes xen_swiotlb_find_pool()

With these changes, a round-trip map/unmap pair requires only 2 pool
lookups (listed using the new names and wrappers):

dma_direct_unmap_page:
 1. dma_direct_sync_single_for_cpu -&gt; swiotlb_find_pool
 2. swiotlb_tbl_unmap_single -&gt; swiotlb_find_pool

These changes come from noticing the inefficiencies in a code review,
not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
__swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
so avoiding the redundant calls helps performance in a hot path.
When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
is minimal and the perf benefits are likely negligible, but no
harm is done.

No functional change is intended.

Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Reviewed-by: Petr Tesarik &lt;petr@tesarici.cz&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
places, the pool is found and used in one function, and then must
be found again in the next function that is called because only the
tlb_addr is passed as an argument. These are the six call sites:

dma_direct_map_page:
 1. swiotlb_map -&gt; swiotlb_tbl_map_single -&gt; swiotlb_bounce

dma_direct_unmap_page:
 2. dma_direct_sync_single_for_cpu -&gt; is_swiotlb_buffer
 3. dma_direct_sync_single_for_cpu -&gt; swiotlb_sync_single_for_cpu -&gt;
	swiotlb_bounce
 4. is_swiotlb_buffer
 5. swiotlb_tbl_unmap_single -&gt; swiotlb_del_transient
 6. swiotlb_tbl_unmap_single -&gt; swiotlb_release_slots

Reduce the number of calls by finding the pool at a higher level, and
passing it as an argument instead of searching again. A key change is
for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
and then pass this pool pointer to subsequent swiotlb functions.

There are 9 occurrences of is_swiotlb_buffer() used to test if a buffer
is a swiotlb buffer before calling a swiotlb function. To reduce code
duplication in getting the pool pointer and passing it as an argument,
introduce inline wrappers for this pattern. The generated code is
essentially unchanged.

Since is_swiotlb_buffer() no longer returns a boolean, rename some
functions to reflect the change:

 * swiotlb_find_pool() becomes __swiotlb_find_pool()
 * is_swiotlb_buffer() becomes swiotlb_find_pool()
 * is_xen_swiotlb_buffer() becomes xen_swiotlb_find_pool()

With these changes, a round-trip map/unmap pair requires only 2 pool
lookups (listed using the new names and wrappers):

dma_direct_unmap_page:
 1. dma_direct_sync_single_for_cpu -&gt; swiotlb_find_pool
 2. swiotlb_tbl_unmap_single -&gt; swiotlb_find_pool

These changes come from noticing the inefficiencies in a code review,
not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
__swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
so avoiding the redundant calls helps performance in a hot path.
When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
is minimal and the perf benefits are likely negligible, but no
harm is done.

No functional change is intended.

Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Reviewed-by: Petr Tesarik &lt;petr@tesarici.cz&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
