<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/xdp/xsk_queue.h, branch v6.19.9</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>xsk: Harden userspace-supplied xdp_desc validation</title>
<updated>2025-10-10T17:07:48+00:00</updated>
<author>
<name>Alexander Lobakin</name>
<email>aleksander.lobakin@intel.com</email>
</author>
<published>2025-10-08T16:56:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=07ca98f906a403637fc5e513a872a50ef1247f3b'/>
<id>07ca98f906a403637fc5e513a872a50ef1247f3b</id>
<content type='text'>
Turned out certain clearly invalid values passed in xdp_desc from
userspace can pass xp_{,un}aligned_validate_desc() and then lead
to UBs or just invalid frames to be queued for xmit.

desc-&gt;len close to ``U32_MAX`` with a non-zero pool-&gt;tx_metadata_len
can cause positive integer overflow and wraparound, the same way low
enough desc-&gt;addr with a non-zero pool-&gt;tx_metadata_len can cause
negative integer overflow. Both scenarios can then pass the
validation successfully.
This doesn't happen with valid XSk applications, but can be used
to perform attacks.

Always promote desc-&gt;len to ``u64`` first to exclude positive
overflows of it. Use explicit check_{add,sub}_overflow() when
validating desc-&gt;addr (which is ``u64`` already).

bloat-o-meter reports a little growth of the code size:

add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
Function                                     old     new   delta
xskq_cons_peek_desc                          299     330     +31
xsk_tx_peek_release_desc_batch               973    1002     +29
xsk_generic_xmit                            3148    3132     -16

but hopefully this doesn't hurt the performance much.

Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
Cc: stable@vger.kernel.org # 6.8+
Signed-off-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Reviewed-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/r/20251008165659.4141318-1-aleksander.lobakin@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Turned out certain clearly invalid values passed in xdp_desc from
userspace can pass xp_{,un}aligned_validate_desc() and then lead
to UBs or just invalid frames to be queued for xmit.

desc-&gt;len close to ``U32_MAX`` with a non-zero pool-&gt;tx_metadata_len
can cause positive integer overflow and wraparound, the same way low
enough desc-&gt;addr with a non-zero pool-&gt;tx_metadata_len can cause
negative integer overflow. Both scenarios can then pass the
validation successfully.
This doesn't happen with valid XSk applications, but can be used
to perform attacks.

Always promote desc-&gt;len to ``u64`` first to exclude positive
overflows of it. Use explicit check_{add,sub}_overflow() when
validating desc-&gt;addr (which is ``u64`` already).

bloat-o-meter reports a little growth of the code size:

add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
Function                                     old     new   delta
xskq_cons_peek_desc                          299     330     +31
xsk_tx_peek_release_desc_batch               973    1002     +29
xsk_generic_xmit                            3148    3132     -16

but hopefully this doesn't hurt the performance much.

Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
Cc: stable@vger.kernel.org # 6.8+
Signed-off-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Reviewed-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/r/20251008165659.4141318-1-aleksander.lobakin@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: Fix immature cq descriptor production</title>
<updated>2025-09-09T22:09:55+00:00</updated>
<author>
<name>Maciej Fijalkowski</name>
<email>maciej.fijalkowski@intel.com</email>
</author>
<published>2025-09-04T19:49:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=30f241fcf52aaaef7ac16e66530faa11be78a865'/>
<id>30f241fcf52aaaef7ac16e66530faa11be78a865</id>
<content type='text'>
Eryk reported an issue that I have put under Closes: tag, related to
umem addrs being prematurely produced onto pool's completion queue.
Let us make the skb's destructor responsible for producing all addrs
that given skb used.

Commit from fixes tag introduced the buggy behavior, it was not broken
from day 1, but rather when xsk multi-buffer got introduced.

In order to mitigate performance impact as much as possible, mimic the
linear and frag parts within skb by storing the first address from XSK
descriptor at sk_buff::destructor_arg. For fragments, store them at ::cb
via list. The nodes that will go onto list will be allocated via
kmem_cache. xsk_destruct_skb() will consume address stored at
::destructor_arg and optionally go through list from ::cb, if count of
descriptors associated with this particular skb is bigger than 1.

Previous approach where whole array for storing UMEM addresses from XSK
descriptors was pre-allocated during first fragment processing yielded
too big performance regression for 64b traffic. In current approach
impact is much reduced on my tests and for jumbo frames I observed
traffic being slower by at most 9%.

Magnus suggested to have this way of processing special cased for
XDP_SHARED_UMEM, so we would identify this during bind and set different
hooks for 'backpressure mechanism' on CQ and for skb destructor, but
given that results looked promising on my side I decided to have a
single data path for XSK generic Tx. I suppose other auxiliary stuff
would have to land as well in order to make it work.

Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
Reported-by: Eryk Kubanski &lt;e.kubanski@partner.samsung.com&gt;
Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.samsung.com/
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Tested-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Link: https://lore.kernel.org/r/20250904194907.2342177-1-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Eryk reported an issue that I have put under Closes: tag, related to
umem addrs being prematurely produced onto pool's completion queue.
Let us make the skb's destructor responsible for producing all addrs
that given skb used.

Commit from fixes tag introduced the buggy behavior, it was not broken
from day 1, but rather when xsk multi-buffer got introduced.

In order to mitigate performance impact as much as possible, mimic the
linear and frag parts within skb by storing the first address from XSK
descriptor at sk_buff::destructor_arg. For fragments, store them at ::cb
via list. The nodes that will go onto list will be allocated via
kmem_cache. xsk_destruct_skb() will consume address stored at
::destructor_arg and optionally go through list from ::cb, if count of
descriptors associated with this particular skb is bigger than 1.

Previous approach where whole array for storing UMEM addresses from XSK
descriptors was pre-allocated during first fragment processing yielded
too big performance regression for 64b traffic. In current approach
impact is much reduced on my tests and for jumbo frames I observed
traffic being slower by at most 9%.

Magnus suggested to have this way of processing special cased for
XDP_SHARED_UMEM, so we would identify this during bind and set different
hooks for 'backpressure mechanism' on CQ and for skb destructor, but
given that results looked promising on my side I decided to have a
single data path for XSK generic Tx. I suppose other auxiliary stuff
would have to land as well in order to make it work.

Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
Reported-by: Eryk Kubanski &lt;e.kubanski@partner.samsung.com&gt;
Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.samsung.com/
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Tested-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Link: https://lore.kernel.org/r/20250904194907.2342177-1-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: Carry a copy of xdp_zc_max_segs within xsk_buff_pool</title>
<updated>2024-10-14T15:23:30+00:00</updated>
<author>
<name>Maciej Fijalkowski</name>
<email>maciej.fijalkowski@intel.com</email>
</author>
<published>2024-10-07T12:24:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=6e126872191df946a6fe01b79273119d32d96711'/>
<id>6e126872191df946a6fe01b79273119d32d96711</id>
<content type='text'>
This so we avoid dereferencing struct net_device within hot path.

Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Link: https://lore.kernel.org/bpf/20241007122458.282590-5-maciej.fijalkowski@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This so we avoid dereferencing struct net_device within hot path.

Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Link: https://lore.kernel.org/bpf/20241007122458.282590-5-maciej.fijalkowski@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: Bump xsk_queue::queue_empty_descs in xp_can_alloc()</title>
<updated>2024-09-05T13:56:49+00:00</updated>
<author>
<name>Maciej Fijalkowski</name>
<email>maciej.fijalkowski@intel.com</email>
</author>
<published>2024-09-04T16:28:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=6b083650a37318112fb60c65fbb6070584f53d93'/>
<id>6b083650a37318112fb60c65fbb6070584f53d93</id>
<content type='text'>
We have STAT_FILL_EMPTY test case in xskxceiver that tries to process
traffic with fill queue being empty which currently fails for zero copy
ice driver after it started to use xsk_buff_can_alloc() API. That is
because xsk_queue::queue_empty_descs is currently only increased from
alloc APIs and right now if driver sees that xsk_buff_pool will be
unable to provide the requested count of buffers, it bails out early,
skipping calls to xsk_buff_alloc{_batch}().

Mentioned statistic should be handled in xsk_buff_can_alloc() from the
very beginning, so let's add this logic now. Do it by open coding
xskq_cons_has_entries() and bumping queue_empty_descs in the middle when
fill queue currently has no entries.

Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Link: https://lore.kernel.org/bpf/20240904162808.249160-1-maciej.fijalkowski@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have STAT_FILL_EMPTY test case in xskxceiver that tries to process
traffic with fill queue being empty which currently fails for zero copy
ice driver after it started to use xsk_buff_can_alloc() API. That is
because xsk_queue::queue_empty_descs is currently only increased from
alloc APIs and right now if driver sees that xsk_buff_pool will be
unable to provide the requested count of buffers, it bails out early,
skipping calls to xsk_buff_alloc{_batch}().

Mentioned statistic should be handled in xsk_buff_can_alloc() from the
very beginning, so let's add this logic now. Do it by open coding
xskq_cons_has_entries() and bumping queue_empty_descs in the middle when
fill queue currently has no entries.

Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Link: https://lore.kernel.org/bpf/20240904162808.249160-1-maciej.fijalkowski@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: Add TX timestamp and TX checksum offload support</title>
<updated>2023-11-29T22:59:40+00:00</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2023-11-27T19:03:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=48eb03dd26304c24f03bdbb9382e89c8564e71df'/>
<id>48eb03dd26304c24f03bdbb9382e89c8564e71df</id>
<content type='text'>
This change actually defines the (initial) metadata layout
that should be used by AF_XDP userspace (xsk_tx_metadata).
The first field is flags which requests appropriate offloads,
followed by the offload-specific fields. The supported per-device
offloads are exported via netlink (new xsk-flags).

The offloads themselves are still implemented in a bit of a
framework-y fashion that's left from my initial kfunc attempt.
I'm introducing new xsk_tx_metadata_ops which drivers are
supposed to implement. The drivers are also supposed
to call xsk_tx_metadata_request/xsk_tx_metadata_complete in
the right places. Since xsk_tx_metadata_{request,_complete}
are static inline, we don't incur any extra overhead doing
indirect calls.

The benefit of this scheme is as follows:
- keeps all metadata layout parsing away from driver code
- makes it easy to grep and see which drivers implement what
- don't need any extra flags to maintain to keep track of what
  offloads are implemented; if the callback is implemented - the offload
  is supported (used by netlink reporting code)

Two offloads are defined right now:
1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset
2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata
   area upon completion (tx_timestamp field)

XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes
SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass
metadata pointer).

The struct is forward-compatible and can be extended in the future
by appending more fields.

Reviewed-by: Song Yoong Siang &lt;yoong.siang.song@intel.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change actually defines the (initial) metadata layout
that should be used by AF_XDP userspace (xsk_tx_metadata).
The first field is flags which requests appropriate offloads,
followed by the offload-specific fields. The supported per-device
offloads are exported via netlink (new xsk-flags).

The offloads themselves are still implemented in a bit of a
framework-y fashion that's left from my initial kfunc attempt.
I'm introducing new xsk_tx_metadata_ops which drivers are
supposed to implement. The drivers are also supposed
to call xsk_tx_metadata_request/xsk_tx_metadata_complete in
the right places. Since xsk_tx_metadata_{request,_complete}
are static inline, we don't incur any extra overhead doing
indirect calls.

The benefit of this scheme is as follows:
- keeps all metadata layout parsing away from driver code
- makes it easy to grep and see which drivers implement what
- don't need any extra flags to maintain to keep track of what
  offloads are implemented; if the callback is implemented - the offload
  is supported (used by netlink reporting code)

Two offloads are defined right now:
1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset
2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata
   area upon completion (tx_timestamp field)

XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes
SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass
metadata pointer).

The struct is forward-compatible and can be extended in the future
by appending more fields.

Reviewed-by: Song Yoong Siang &lt;yoong.siang.song@intel.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: Support tx_metadata_len</title>
<updated>2023-11-29T22:59:40+00:00</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2023-11-27T19:03:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=341ac980eab90ac1f6c22ee9f9da83ed9604d899'/>
<id>341ac980eab90ac1f6c22ee9f9da83ed9604d899</id>
<content type='text'>
For zerocopy mode, tx_desc-&gt;addr can point to an arbitrary offset
and carry some TX metadata in the headroom. For copy mode, there
is no way currently to populate skb metadata.

Introduce new tx_metadata_len umem config option that indicates how many
bytes to treat as metadata. Metadata bytes come prior to tx_desc address
(same as in RX case).

The size of the metadata has mostly the same constraints as XDP:
- less than 256 bytes
- 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte
  timestamp in the completion)
- non-zero

This data is not interpreted in any way right now.

Reviewed-by: Song Yoong Siang &lt;yoong.siang.song@intel.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For zerocopy mode, tx_desc-&gt;addr can point to an arbitrary offset
and carry some TX metadata in the headroom. For copy mode, there
is no way currently to populate skb metadata.

Introduce new tx_metadata_len umem config option that indicates how many
bytes to treat as metadata. Metadata bytes come prior to tx_desc address
(same as in RX case).

The size of the metadata has mostly the same constraints as XDP:
- less than 256 bytes
- 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte
  timestamp in the completion)
- non-zero

This data is not interpreted in any way right now.

Reviewed-by: Song Yoong Siang &lt;yoong.siang.song@intel.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: support ZC Tx multi-buffer in batch API</title>
<updated>2023-07-19T16:56:50+00:00</updated>
<author>
<name>Maciej Fijalkowski</name>
<email>maciej.fijalkowski@intel.com</email>
</author>
<published>2023-07-19T13:24:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d5581966040f313c76b0f54ad76b3fb4095fddcd'/>
<id>d5581966040f313c76b0f54ad76b3fb4095fddcd</id>
<content type='text'>
Modify xskq_cons_read_desc_batch() in a way that each processed
descriptor will be checked if it is an EOP one or not and act
accordingly to that.

Change the behavior of mentioned function to break the processing when
stumbling upon invalid descriptor instead of skipping it. Furthermore,
let us give only full packets down to ZC driver.
With these two assumptions ZC drivers will not have to take care of an
intermediate state of incomplete frames, which will simplify its
implementations a lot.

Last but not least, stop processing when count of frags would exceed
max supported segments on underlying device.

Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/r/20230719132421.584801-15-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Modify xskq_cons_read_desc_batch() in a way that each processed
descriptor will be checked if it is an EOP one or not and act
accordingly to that.

Change the behavior of mentioned function to break the processing when
stumbling upon invalid descriptor instead of skipping it. Furthermore,
let us give only full packets down to ZC driver.
With these two assumptions ZC drivers will not have to take care of an
intermediate state of incomplete frames, which will simplify its
implementations a lot.

Last but not least, stop processing when count of frags would exceed
max supported segments on underlying device.

Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/r/20230719132421.584801-15-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: discard zero length descriptors in Tx path</title>
<updated>2023-07-19T16:56:49+00:00</updated>
<author>
<name>Tirthendu Sarkar</name>
<email>tirthendu.sarkar@intel.com</email>
</author>
<published>2023-07-19T13:24:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=07428da9e25a5dfae7252cd554c90557f9086a73'/>
<id>07428da9e25a5dfae7252cd554c90557f9086a73</id>
<content type='text'>
Descriptors with zero length are not supported by many NICs. To preserve
uniform behavior discard any zero length desc as invvalid desc.

Signed-off-by: Tirthendu Sarkar &lt;tirthendu.sarkar@intel.com&gt;
Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/r/20230719132421.584801-10-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Descriptors with zero length are not supported by many NICs. To preserve
uniform behavior discard any zero length desc as invvalid desc.

Signed-off-by: Tirthendu Sarkar &lt;tirthendu.sarkar@intel.com&gt;
Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/r/20230719132421.584801-10-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: add support for AF_XDP multi-buffer on Tx path</title>
<updated>2023-07-19T16:56:49+00:00</updated>
<author>
<name>Tirthendu Sarkar</name>
<email>tirthendu.sarkar@intel.com</email>
</author>
<published>2023-07-19T13:24:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=cf24f5a5feeaae34c1a34d1e04f8ac697290427a'/>
<id>cf24f5a5feeaae34c1a34d1e04f8ac697290427a</id>
<content type='text'>
For transmitting an AF_XDP packet, allocate skb while processing the
first desc and copy data to it. The 'XDP_PKT_CONTD' flag in 'options'
field of the desc indicates the EOP status of the packet. If the current
desc is not EOP, store the skb, release the current desc and go
on to read the next descs.

Allocate a page for each subsequent desc, copy data to it and add it as
a frag in the skb stored in xsk. On processing EOP, transmit the skb
with frags. Addresses contained in descs have been already queued in
consumer queue and skb destructor updated the completion count.

On transmit failure cancel the releases, clear the descs from the
completion queue and consume the skb for retrying packet transmission.

For any invalid descriptor (invalid length/address/options) in the middle
of a packet, all pending descriptors will be dropped by xsk core along
with the invalid one and the next descriptor is treated as the start of
a new packet.

Maximum supported frames for a packet is MAX_SKB_FRAGS + 1. If it is
exceeded, all descriptors accumulated so far are dropped.

Signed-off-by: Tirthendu Sarkar &lt;tirthendu.sarkar@intel.com&gt;
Link: https://lore.kernel.org/r/20230719132421.584801-9-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For transmitting an AF_XDP packet, allocate skb while processing the
first desc and copy data to it. The 'XDP_PKT_CONTD' flag in 'options'
field of the desc indicates the EOP status of the packet. If the current
desc is not EOP, store the skb, release the current desc and go
on to read the next descs.

Allocate a page for each subsequent desc, copy data to it and add it as
a frag in the skb stored in xsk. On processing EOP, transmit the skb
with frags. Addresses contained in descs have been already queued in
consumer queue and skb destructor updated the completion count.

On transmit failure cancel the releases, clear the descs from the
completion queue and consume the skb for retrying packet transmission.

For any invalid descriptor (invalid length/address/options) in the middle
of a packet, all pending descriptors will be dropped by xsk core along
with the invalid one and the next descriptor is treated as the start of
a new packet.

Maximum supported frames for a packet is MAX_SKB_FRAGS + 1. If it is
exceeded, all descriptors accumulated so far are dropped.

Signed-off-by: Tirthendu Sarkar &lt;tirthendu.sarkar@intel.com&gt;
Link: https://lore.kernel.org/r/20230719132421.584801-9-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path</title>
<updated>2023-07-19T16:56:49+00:00</updated>
<author>
<name>Tirthendu Sarkar</name>
<email>tirthendu.sarkar@intel.com</email>
</author>
<published>2023-07-19T13:24:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=b7f72a30e9ac2555b05afc6cfddc9dbc98e1eb8d'/>
<id>b7f72a30e9ac2555b05afc6cfddc9dbc98e1eb8d</id>
<content type='text'>
In Tx path, xsk core reserves space for each desc to be transmitted in
the completion queue and it's address contained in it is stored in the
skb destructor arg. After successful transmission the skb destructor
submits the addr marking completion.

To handle multiple descriptors per packet, now along with reserving
space for each descriptor, the corresponding address is also stored in
completion queue. The number of pending descriptors are stored in skb
destructor arg and is used by the skb destructor to update completions.

Introduce 'skb' in xdp_sock to store a partially built packet when
__xsk_generic_xmit() must return before it sees the EOP descriptor for
the current packet so that packet building can resume in next call of
__xsk_generic_xmit().

Helper functions are introduced to set and get the pending descriptors
in the skb destructor arg. Also, wrappers are introduced for storing
descriptor addresses, submitting and cancelling (for unsuccessful
transmissions) the number of completions.

Signed-off-by: Tirthendu Sarkar &lt;tirthendu.sarkar@intel.com&gt;
Link: https://lore.kernel.org/r/20230719132421.584801-7-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In Tx path, xsk core reserves space for each desc to be transmitted in
the completion queue and it's address contained in it is stored in the
skb destructor arg. After successful transmission the skb destructor
submits the addr marking completion.

To handle multiple descriptors per packet, now along with reserving
space for each descriptor, the corresponding address is also stored in
completion queue. The number of pending descriptors are stored in skb
destructor arg and is used by the skb destructor to update completions.

Introduce 'skb' in xdp_sock to store a partially built packet when
__xsk_generic_xmit() must return before it sees the EOP descriptor for
the current packet so that packet building can resume in next call of
__xsk_generic_xmit().

Helper functions are introduced to set and get the pending descriptors
in the skb destructor arg. Also, wrappers are introduced for storing
descriptor addresses, submitting and cancelling (for unsuccessful
transmissions) the number of completions.

Signed-off-by: Tirthendu Sarkar &lt;tirthendu.sarkar@intel.com&gt;
Link: https://lore.kernel.org/r/20230719132421.584801-7-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
