summaryrefslogtreecommitdiff
path: root/net/vmw_vsock
AgeCommit message (Collapse)AuthorFilesLines
2024-10-18Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds2-10/+12
Pull bpf fixes from Daniel Borkmann: - Fix BPF verifier to not affect subreg_def marks in its range propagation (Eduard Zingerman) - Fix a truncation bug in the BPF verifier's handling of coerce_reg_to_size_sx (Dimitar Kanaliev) - Fix the BPF verifier's delta propagation between linked registers under 32-bit addition (Daniel Borkmann) - Fix a NULL pointer dereference in BPF devmap due to missing rxq information (Florian Kauer) - Fix a memory leak in bpf_core_apply (Jiri Olsa) - Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for arrays of nested structs (Hou Tao) - Fix build ID fetching where memory areas backing the file were created with memfd_secret (Andrii Nakryiko) - Fix BPF task iterator tid filtering which was incorrectly using pid instead of tid (Jordan Rome) - Several fixes for BPF sockmap and BPF sockhash redirection in combination with vsocks (Michal Luczaj) - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri) - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility of an infinite BPF tailcall (Pu Lehui) - Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot be resolved (Thomas Weißschuh) - Fix a bug in kfunc BTF caching for modules where the wrong BTF object was returned (Toke Høiland-Jørgensen) - Fix a BPF selftest compilation error in cgroup-related tests with musl libc (Tony Ambardar) - Several fixes to BPF link info dumps to fill missing fields (Tyrone Wu) - Add BPF selftests for kfuncs from multiple modules, checking that the correct kfuncs are called (Simon Sundberg) - Ensure that internal and user-facing bpf_redirect flags don't overlap (Toke Høiland-Jørgensen) - Switch to use kvzmalloc to allocate BPF verifier environment (Rik van Riel) - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat under RT (Wander Lairson Costa) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits) lib/buildid: Handle memfd_secret() files in build_id_parse() selftests/bpf: Add test case for delta propagation bpf: Fix print_reg_state's constant scalar dump bpf: Fix incorrect delta propagation between linked registers bpf: Properly test iter/task tid filtering bpf: Fix iter/task tid filtering riscv, bpf: Make BPF_CMPXCHG fully ordered bpf, vsock: Drop static vsock_bpf_prot initialization vsock: Update msg_count on read_skb() vsock: Update rx_bytes on read_skb() bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock selftests/bpf: Add asserts for netfilter link info bpf: Fix link info netfilter flags to populate defrag flag selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx() selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx() bpf: Fix truncation bug in coerce_reg_to_size_sx() selftests/bpf: Assert link info uprobe_multi count & path_size if unset bpf: Fix unpopulated path_size when uprobe_multi fields unset selftests/bpf: Fix cross-compiling urandom_read selftests/bpf: Add test for kfunc module order ...
2024-10-17bpf, vsock: Drop static vsock_bpf_prot initializationMichal Luczaj1-8/+0
vsock_bpf_prot is set up at runtime. Remove the superfluous init. No functional change intended. Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-4-d6577bbfe742@rbox.co
2024-10-17vsock: Update msg_count on read_skb()Michal Luczaj1-0/+3
Dequeuing via vsock_transport::read_skb() left msg_count outdated, which then confused SOCK_SEQPACKET recv(). Decrease the counter. Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-3-d6577bbfe742@rbox.co
2024-10-17vsock: Update rx_bytes on read_skb()Michal Luczaj1-2/+9
Make sure virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt() calls are balanced (i.e. virtio_vsock_sock::rx_bytes doesn't lie) after vsock_transport::read_skb(). While here, also inform the peer that we've freed up space and it has more credit. Failing to update rx_bytes after packet is dequeued leads to a warning on SOCK_STREAM recv(): [ 233.396654] rx_queue is empty, but rx_bytes is non-zero [ 233.396702] WARNING: CPU: 11 PID: 40601 at net/vmw_vsock/virtio_transport_common.c:589 Fixes: 634f1a7110b4 ("vsock: support sockmap") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-2-d6577bbfe742@rbox.co
2024-10-07Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-4/+4
Pull virtio fixes from Michael Tsirkin: "Several small bugfixes all over the place. Most notably, fixes the vsock allocation with GFP_KERNEL in atomic context, which has been triggering warnings for lots of testers" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost/scsi: null-ptr-dereference in vhost_scsi_get_req() vsock/virtio: use GFP_ATOMIC under RCU read lock virtio_console: fix misc probe bugs virtio_ring: tag event_triggered as racy for KCSAN vdpa/octeon_ep: Fix format specifier for pointers in debug messages
2024-10-07vsock/virtio: use GFP_ATOMIC under RCU read lockMichael S. Tsirkin1-4/+4
virtio_transport_send_pkt in now called on transport fast path, under RCU read lock. In that case, we have a bug: virtio_add_sgs is called with GFP_KERNEL, and might sleep. Pass the gfp flags as an argument, and use GFP_ATOMIC on the fast path. Link: https://lore.kernel.org/all/hfcr2aget2zojmqpr4uhlzvnep4vgskblx5b6xf2ddosbsrke7@nt34bxgp7j2x Fixes: efcd71af38be ("vsock/virtio: avoid queuing packets when intermediate queue is empty") Reported-by: Christian Brauner <brauner@kernel.org> Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: Luigi Leonardi <luigi.leonardi@outlook.com> Message-ID: <3fbfb6e871f625f89eb578c7228e127437b1975a.1727876449.git.mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2024-09-26Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-50/+94
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-balloon supports new stats - vdpa supports setting mac address - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster - virtio_fs supports new sysfs entries for queue info - virtio/vsock performance has been improved And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vsock/virtio: avoid queuing packets when intermediate queue is empty vsock/virtio: refactor virtio_transport_send_pkt_work fw_cfg: Constify struct kobj_type vdpa/mlx5: Postpone MR deletion vdpa/mlx5: Introduce init/destroy for MR resources vdpa/mlx5: Rename mr_mtx -> lock vdpa/mlx5: Extract mr members in own resource struct vdpa/mlx5: Rename function vdpa/mlx5: Delete direct MKEYs in parallel vdpa/mlx5: Create direct MKEYs in parallel MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section virtio_fs: add sysfs entries for queue information virtio_fs: introduce virtio_fs_put_locked helper vdpa: Remove unused declarations vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command vdpa/mlx5: Small improvement for change_num_qps() vdpa/mlx5: Keep notifiers during suspend but ignore vdpa/mlx5: Parallelize device resume vdpa/mlx5: Parallelize device suspend vdpa/mlx5: Use async API for vq modify commands ...
2024-09-25vsock/virtio: avoid queuing packets when intermediate queue is emptyLuigi Leonardi1-4/+35
When the driver needs to send new packets to the device, it always queues the new sk_buffs into an intermediate queue (send_pkt_queue) and schedules a worker (send_pkt_work) to then queue them into the virtqueue exposed to the device. This increases the chance of batching, but also introduces a lot of latency into the communication. So we can optimize this path by adding a fast path to be taken when there is no element in the intermediate queue, there is space available in the virtqueue, and no other process that is sending packets (tx_lock held). The following benchmarks were run to check improvements in latency and throughput. The test bed is a host with Intel i7-10700KF CPU @ 3.80GHz and L1 guest running on QEMU/KVM with vhost process and all vCPUs pinned individually to pCPUs. - Latency Tool: Fio version 3.37-56 Mode: pingpong (h-g-h) Test runs: 50 Runtime-per-test: 50s Type: SOCK_STREAM In the following fio benchmark (pingpong mode) the host sends a payload to the guest and waits for the same payload back. fio process pinned both inside the host and the guest system. Before: Linux 6.9.8 Payload 64B: 1st perc. overall 99th perc. Before 12.91 16.78 42.24 us After 9.77 13.57 39.17 us Payload 512B: 1st perc. overall 99th perc. Before 13.35 17.35 41.52 us After 10.25 14.11 39.58 us Payload 4K: 1st perc. overall 99th perc. Before 14.71 19.87 41.52 us After 10.51 14.96 40.81 us - Throughput Tool: iperf-vsock The size represents the buffer length (-l) to read/write P represents the number of parallel streams P=1 4K 64K 128K Before 6.87 29.3 29.5 Gb/s After 10.5 39.4 39.9 Gb/s P=2 4K 64K 128K Before 10.5 32.8 33.2 Gb/s After 17.8 47.7 48.5 Gb/s P=4 4K 64K 128K Before 12.7 33.6 34.2 Gb/s After 16.9 48.1 50.5 Gb/s The performance improvement is related to this optimization, I used a ebpf kretprobe on virtio_transport_send_skb to check that each packet was sent directly to the virtqueue Co-developed-by: Marco Pinna <marco.pinn95@gmail.com> Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Message-Id: <20240730-pinna-v4-2-5c9179164db5@outlook.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2024-09-25vsock/virtio: refactor virtio_transport_send_pkt_workMarco Pinna1-46/+59
Preliminary patch to introduce an optimization to the enqueue system. All the code used to enqueue a packet into the virtqueue is removed from virtio_transport_send_pkt_work() and moved to the new virtio_transport_send_skb() function. Co-developed-by: Luigi Leonardi <luigi.leonardi@outlook.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20240730-pinna-v4-1-5c9179164db5@outlook.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-23/+31
Cross-merge networking fixes after downstream PR. Conflicts: Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml c25504a0ba36 ("dt-bindings: net: fsl,qoriq-mc-dpmac: add missed property phys") be034ee6c33d ("dt-bindings: net: fsl,qoriq-mc-dpmac: using unevaluatedProperties") https://lore.kernel.org/20240815110934.56ae623a@canb.auug.org.au drivers/net/dsa/vitesse-vsc73xx-core.c 5b9eebc2c7a5 ("net: dsa: vsc73xx: pass value in phy_write operation") fa63c6434b6f ("net: dsa: vsc73xx: check busy flag in MDIO operations") 2524d6c28bdc ("net: dsa: vsc73xx: use defined values in phy operations") https://lore.kernel.org/20240813104039.429b9fe6@canb.auug.org.au Resolve by using FIELD_PREP(), Stephen's resolution is simpler. Adjacent changes: net/vmw_vsock/af_vsock.c 69139d2919dd ("vsock: fix recursive ->recvmsg calls") 744500d81f81 ("vsock: add support for SIOCOUTQ ioctl") Link: https://patch.msgid.link/20240815141149.33862-1-pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-15vsock: fix recursive ->recvmsg callsCong Wang2-23/+31
After a vsock socket has been added to a BPF sockmap, its prot->recvmsg has been replaced with vsock_bpf_recvmsg(). Thus the following recursiion could happen: vsock_bpf_recvmsg() -> __vsock_recvmsg() -> vsock_connectible_recvmsg() -> prot->recvmsg() -> vsock_bpf_recvmsg() again We need to fix it by calling the original ->recvmsg() without any BPF sockmap logic in __vsock_recvmsg(). Fixes: 634f1a7110b4 ("vsock: support sockmap") Reported-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com Tested-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com Cc: Bobby Eshleman <bobby.eshleman@bytedance.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20240812022153.86512-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-02vsock/virtio: add SIOCOUTQ support for all virtio based transportsLuigi Leonardi3-1/+44
Introduce support for virtio_transport_unsent_bytes ioctl for virtio_transport, vhost_vsock and vsock_loopback. For all transports the unsent bytes counter is incremented in virtio_transport_get_credit. In virtio_transport (G2H) and in vhost-vsock (H2G) the counter is decremented when the skbuff is consumed. In vsock_loopback the same skbuff is passed from the transmitter to the receiver, so the counter is decremented before queuing the skbuff to the receiver. Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02vsock: add support for SIOCOUTQ ioctlLuigi Leonardi1-3/+55
Add support for ioctl(s) in AF_VSOCK. The only ioctl available is SIOCOUTQ/TIOCOUTQ, which returns the number of unsent bytes in the socket. This information is transport-specific and is delegated to them using a callback. Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-17virtio: rename virtio_find_vqs_info() to virtio_find_vqs()Jiri Pirko1-2/+1
Since the original virtio_find_vqs() is no longer present, rename virtio_find_vqs_info() back to virtio_find_vqs(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-20-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info()Jiri Pirko1-11/+6
Instead of passing separate names and callbacks arrays to virtio_find_vqs(), have one of virtual_queue_info structs and pass it to virtio_find_vqs_info(). Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-18-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-23Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-1/+0
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-net is finally supported in vduse - virtio (balloon and mem) interaction with suspend is improved - vhost-scsi now handles signals better/faster And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) virtio-pci: Check if is_avq is NULL virtio: delete vq in vp_find_vqs_msix() when request_irq() fails MAINTAINERS: add Eugenio Pérez as reviewer vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API vp_vdpa: don't allocate unused msix vectors sound: virtio: drop owner assignment fuse: virtio: drop owner assignment scsi: virtio: drop owner assignment rpmsg: virtio: drop owner assignment nvdimm: virtio_pmem: drop owner assignment wifi: mac80211_hwsim: drop owner assignment vsock/virtio: drop owner assignment net: 9p: virtio: drop owner assignment net: virtio: drop owner assignment net: caif: virtio: drop owner assignment misc: nsm: drop owner assignment iommu: virtio: drop owner assignment drm/virtio: drop owner assignment gpio: virtio: drop owner assignment firmware: arm_scmi: virtio: drop owner assignment ...
2024-05-22vsock/virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-19-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-05-13net: change proto and proto_ops accept typeJens Axboe1-3/+3
Rather than pass in flags, error pointer, and whether this is a kernel invocation or not, add a struct proto_accept_arg struct as the argument. This then holds all of these arguments, and prepares accept for being able to pass back more information. No functional changes in this patch. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-02vsock/virtio: fix packet delivery to tap deviceMarco Pinna1-1/+2
Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added virtio_transport_deliver_tap_pkt() for handing packets to the vsockmon device. However, in virtio_transport_send_pkt_work(), the function is called before actually sending the packet (i.e. before placing it in the virtqueue with virtqueue_add_sgs() and checking whether it returned successfully). Queuing the packet in the virtqueue can fail even multiple times. However, in virtio_transport_deliver_tap_pkt() we deliver the packet to the monitoring tap interface only the first time we call it. This certainly avoids seeing the same packet replicated multiple times in the monitoring interface, but it can show the packet sent with the wrong timestamp or even before we succeed to queue it in the virtqueue. Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs() and making sure it returned successfully. Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") Cc: stable@vge.kernel.org Signed-off-by: Marco Pinna <marco.pinn95@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-23sock_diag: add module pointer to "struct sock_diag_handler"Eric Dumazet1-0/+1
Following patch is going to use RCU instead of sock_diag_table_mutex acquisition. This patch is a preparation, no change of behavior yet. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-03vsock/virtio: use skb_frag_*() helpersMina Almasry1-3/+3
Minor fix for virtio: code wanting to access the fields inside an skb frag should use the skb_frag_*() helpers, instead of accessing the fields directly. This allows for extensions where the underlying memory is not a page. Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20240102205905.793738-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-15virtio/vsock: send credit update during setting SO_RCVLOWATArseniy Krasnov5-4/+41
Send credit update message when SO_RCVLOWAT is updated and it is bigger than number of bytes in rx queue. It is needed, because 'poll()' will wait until number of bytes in rx queue will be not smaller than O_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup for tx/rx is possible: sender waits for free space and receiver is waiting data in 'poll()'. Rename 'set_rcvlowat' callback to 'notify_set_rcvlowat' and set 'sk->sk_rcvlowat' only in one place (i.e. 'vsock_set_rcvlowat'), so the transport doesn't need to do it. Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-15virtio/vsock: fix logic which reduces credit update messagesArseniy Krasnov1-3/+10
Add one more condition for sending credit update during dequeue from stream socket: when number of bytes in the rx queue is smaller than SO_RCVLOWAT value of the socket. This is actual for non-default value of SO_RCVLOWAT (e.g. not 1) - idea is to "kick" peer to continue data transmission, because we need at least SO_RCVLOWAT bytes in our rx queue to wake up user for reading data (in corner case it is also possible to stuck both tx and rx sides, this is why 'Fixes' is used). Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()Nikolay Kuratov1-1/+1
We need to do signed arithmetic if we expect condition `if (bytes < 0)` to be possible Found by Linux Verification Center (linuxtesting.org) with SVACE Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20231211162317.4116625-1-kniv@yandex-team.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warningStefano Garzarella1-2/+1
After backporting commit 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") in CentOS Stream 9, CI reported the following error: In file included from ./include/linux/kernel.h:17, from ./include/linux/list.h:9, from ./include/linux/preempt.h:11, from ./include/linux/spinlock.h:56, from net/vmw_vsock/virtio_transport_common.c:9: net/vmw_vsock/virtio_transport_common.c: In function ‘virtio_transport_can_zcopy‘: ./include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror] 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^~ ./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck‘ 26 | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~~~~~~~~~ ./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp‘ 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ ./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp‘ 45 | #define min(x, y) __careful_cmp(x, y, <) | ^~~~~~~~~~~~~ net/vmw_vsock/virtio_transport_common.c:63:37: note: in expansion of macro ‘min‘ 63 | int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS); We could solve it by using min_t(), but this operation seems entirely unnecessary, because we also pass MAX_SKB_FRAGS to iov_iter_npages(), which performs almost the same check, returning at most MAX_SKB_FRAGS elements. So, let's eliminate this unnecessary comparison. Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Cc: avkrasnov@salutedevices.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Link: https://lore.kernel.org/r/20231206164143.281107-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-19net: fill in MODULE_DESCRIPTION()s for SOCK_DIAG modulesJakub Kicinski1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the sock diag modules in one fell swoop. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-07virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()Shigeru Yoshida1-0/+2
KMSAN reported the following uninit-value access issue: ===================================================== BUG: KMSAN: uninit-value in virtio_transport_recv_pkt+0x1dfb/0x26a0 net/vmw_vsock/virtio_transport_common.c:1421 virtio_transport_recv_pkt+0x1dfb/0x26a0 net/vmw_vsock/virtio_transport_common.c:1421 vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703 worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784 kthread+0x3cc/0x520 kernel/kthread.c:388 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 Uninit was stored to memory at: virtio_transport_space_update net/vmw_vsock/virtio_transport_common.c:1274 [inline] virtio_transport_recv_pkt+0x1ee8/0x26a0 net/vmw_vsock/virtio_transport_common.c:1415 vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703 worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784 kthread+0x3cc/0x520 kernel/kthread.c:388 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 Uninit was created at: slab_post_alloc_hook+0x105/0xad0 mm/slab.h:767 slab_alloc_node mm/slub.c:3478 [inline] kmem_cache_alloc_node+0x5a2/0xaf0 mm/slub.c:3523 kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:559 __alloc_skb+0x2fd/0x770 net/core/skbuff.c:650 alloc_skb include/linux/skbuff.h:1286 [inline] virtio_vsock_alloc_skb include/linux/virtio_vsock.h:66 [inline] virtio_transport_alloc_skb+0x90/0x11e0 net/vmw_vsock/virtio_transport_common.c:58 virtio_transport_reset_no_sock net/vmw_vsock/virtio_transport_common.c:957 [inline] virtio_transport_recv_pkt+0x1279/0x26a0 net/vmw_vsock/virtio_transport_common.c:1387 vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703 worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784 kthread+0x3cc/0x520 kernel/kthread.c:388 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 CPU: 1 PID: 10664 Comm: kworker/1:5 Not tainted 6.6.0-rc3-00146-g9f3ebbef746f #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 Workqueue: vsock-loopback vsock_loopback_work ===================================================== The following simple reproducer can cause the issue described above: int main(void) { int sock; struct sockaddr_vm addr = { .svm_family = AF_VSOCK, .svm_cid = VMADDR_CID_ANY, .svm_port = 1234, }; sock = socket(AF_VSOCK, SOCK_STREAM, 0); connect(sock, (struct sockaddr *)&addr, sizeof(addr)); return 0; } This issue occurs because the `buf_alloc` and `fwd_cnt` fields of the `struct virtio_vsock_hdr` are not initialized when a new skb is allocated in `virtio_transport_init_hdr()`. This patch resolves the issue by initializing these fields during allocation. Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Reported-and-tested-by: syzbot+0c8ce1da0ac31abbadcd@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0c8ce1da0ac31abbadcd Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20231104150531.257952-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-07vsock/virtio: remove socket from connected/bound list on shutdownFilippo Storniolo1-5/+11
If the same remote peer, using the same port, tries to connect to a server on a listening port more than once, the server will reject the connection, causing a "connection reset by peer" error on the remote peer. This is due to the presence of a dangling socket from a previous connection in both the connected and bound socket lists. The inconsistency of the above lists only occurs when the remote peer disconnects and the server remains active. This bug does not occur when the server socket is closed: virtio_transport_release() will eventually schedule a call to virtio_transport_do_close() and the latter will remove the socket from the bound and connected socket lists and clear the sk_buff. However, virtio_transport_do_close() will only perform the above actions if it has been scheduled, and this will not happen if the server is processing the shutdown message from a remote peer. To fix this, introduce a call to vsock_remove_sock() when the server is handling a client disconnect. This is to remove the socket from the bound and connected socket lists without clearing the sk_buff. Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Reported-by: Daan De Meyer <daan.j.demeyer@gmail.com> Tested-by: Daan De Meyer <daan.j.demeyer@gmail.com> Co-developed-by: Luigi Leonardi <luigi.leonardi@outlook.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Signed-off-by: Filippo Storniolo <f.storniolo95@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+17
Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-25vsock/virtio: initialize the_virtio_vsock before using VQsAlexandru Matei1-1/+17
Once VQs are filled with empty buffers and we kick the host, it can send connection requests. If the_virtio_vsock is not initialized before, replies are silently dropped and do not reach the host. virtio_transport_send_pkt() can queue packets once the_virtio_vsock is set, but they won't be processed until vsock->tx_run is set to true. We queue vsock->send_pkt_work when initialization finishes to send those packets queued earlier. Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock") Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20231024191742.14259-1-alexandru.matei@uipath.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-15vsock: enable setting SO_ZEROCOPYArseniy Krasnov1-2/+43
For AF_VSOCK, zerocopy tx mode depends on transport, so this option must be set in AF_VSOCK implementation where transport is accessible (if transport is not set during setting SO_ZEROCOPY: for example socket is not connected, then SO_ZEROCOPY will be enabled, but once transport will be assigned, support of this type of transmission will be checked). To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT bit, thus handling SOL_SOCKET option operations, but all of them except SO_ZEROCOPY will be forwarded to the generic handler by calling 'sock_setsockopt()'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock/loopback: support MSG_ZEROCOPY for transportArseniy Krasnov1-0/+6
Add 'msgzerocopy_allow()' callback for loopback transport. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock/virtio: support MSG_ZEROCOPY for transportArseniy Krasnov1-0/+7
Add 'msgzerocopy_allow()' callback for virtio transport. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: enable SOCK_SUPPORT_ZC bitArseniy Krasnov1-0/+6
This bit is used by io_uring in case of zerocopy tx mode. io_uring code checks, that socket has this feature. This patch sets it in two places: 1) For socket in 'connect()' call. 2) For new socket which is returned by 'accept()' call. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: check for MSG_ZEROCOPY support on sendArseniy Krasnov1-0/+6
This feature totally depends on transport, so if transport doesn't support it, return error. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: read from socket's error queueArseniy Krasnov1-0/+6
This adds handling of MSG_ERRQUEUE input flag in receive call. This flag is used to read socket's error queue instead of data queue. Possible scenario of error queue usage is receiving completions for transmission with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK' and 'VSOCK_RECVERR'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: set EPOLLERR on non-empty error queueArseniy Krasnov1-1/+1
If socket's error queue is not empty, EPOLLERR must be set. Otherwise, reader of error queue won't detect data in it using EPOLLERR bit. Currently for AF_VSOCK this is actual only with MSG_ZEROCOPY, as this feature is the only user of an error queue of the socket. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-21vsock/virtio: MSG_ZEROCOPY flag supportArseniy Krasnov2-58/+224
This adds handling of MSG_ZEROCOPY flag on transmission path: 1) If this flag is set and zerocopy transmission is possible (enabled in socket options and transport allows zerocopy), then non-linear skb will be created and filled with the pages of user's buffer. Pages of user's buffer are locked in memory by 'get_user_pages()'. 2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so to decrease this field correctly, proper skb destructor is needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'. 3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'. If this callback is set, then transport needs extra check to be able to send provided number of buffers in zerocopy mode. Currently, the only transport that needs this callback set is virtio, because this transport adds new buffers to the virtio queue and we need to check, that number of these buffers is less than size of the queue (it is required by virtio spec). vhost and loopback transports don't need this check. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21vsock/virtio: non-linear skb handling for tapArseniy Krasnov1-3/+28
For tap device new skb is created and data from the current skb is copied to it. This adds copying data from non-linear skb to new the skb. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21vsock/virtio: support to send non-linear skbArseniy Krasnov1-7/+53
For non-linear skb use its pages from fragment array as buffers in virtio tx queue. These pages are already pinned by 'get_user_pages()' during such skb creation. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21vsock/virtio/vhost: read data from non-linear skbArseniy Krasnov1-13/+19
This is preparation patch for MSG_ZEROCOPY support. It adds handling of non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with 'skb_copy_datagram_iter()'. Main advantage of the second one is that it can handle paged part of the skb by using 'kmap()' on each page, but if there are no pages in the skb, it behaves like simple copying to iov iterator. This patch also adds new field to the control block of skb - this value shows current offset in the skb to read next portion of data (it doesn't matter linear it or not). Idea behind this field is that 'skb_copy_datagram_iter()' handles both types of skb internally - it just needs an offset from which to copy data from the given skb. This offset is incremented on each read from skb. This approach allows to simplify handling of both linear and non-linear skbs, because for linear skb we need to call 'skb_pull()' after reading data from it, while in non-linear case we need to update 'data_len'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14vsock: send SIGPIPE on write to shutdowned socketArseniy Krasnov1-0/+3
POSIX requires to send SIGPIPE on write to SOCK_STREAM socket which was shutdowned with SHUT_WR flag or its peer was shutdowned with SHUT_RD flag. Also we must not send SIGPIPE if MSG_NOSIGNAL flag is set. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-31vsock: Remove unused function declarationsYue Haibing1-3/+0
These are never implemented since introduction in commit d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20230729122036.32988-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27virtio/vsock: support MSG_PEEK for SOCK_SEQPACKETArseniy Krasnov1-3/+60
This adds support of MSG_PEEK flag for SOCK_SEQPACKET type of socket. Difference with SOCK_STREAM is that this callback returns either length of the message or error. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27virtio/vsock: rework MSG_PEEK for SOCK_STREAMArseniy Krasnov1-22/+19
This reworks current implementation of MSG_PEEK logic: 1) Replaces 'skb_queue_walk_safe()' with 'skb_queue_walk()'. There is no need in the first one, as there are no removes of skb in loop. 2) Removes nested while loop - MSG_PEEK logic could be implemented without it: just iterate over skbs without removing it and copy data from each until destination buffer is not full. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-24sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)David Howells1-3/+0
Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages and multipage folios to be passed through. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: mptcp@lists.linux.dev cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23bpf, sockmap: Pass skb ownership through read_skbJohn Fastabend1-4/+1
The read_skb hook calls consume_skb() now, but this means that if the recv_actor program wants to use the skb it needs to inc the ref cnt so that the consume_skb() doesn't kfree the sk_buff. This is problematic because in some error cases under memory pressure we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue(). Then we get this, skb_linearize() __pskb_pull_tail() pskb_expand_head() BUG_ON(skb_shared(skb)) Because we incremented users refcnt from sk_psock_verdict_recv() we hit the bug on with refcnt > 1 and trip it. To fix lets simply pass ownership of the sk_buff through the skb_read call. Then we can drop the consume from read_skb handlers and assume the verdict recv does any required kfree. Bug found while testing in our CI which runs in VMs that hit memory constraints rather regularly. William tested TCP read_skb handlers. [ 106.536188] ------------[ cut here ]------------ [ 106.536197] kernel BUG at net/core/skbuff.c:1693! [ 106.536479] invalid opcode: 0000 [#1]