linux.git - Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

Age	Commit message (Collapse)	Author	Files	Lines
2024-07-09	vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()	Dragos Tatulea	1	-0/+3
	VQ indices in the range [cur_num_qps, max_vqs) represent queues that have not yet been activated. .set_vq_ready should not activate these VQs. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-24-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Don't reset VQs more than necessary	Dragos Tatulea	1	-3/+27
	The vdpa device can be reset many times in sequence without any significant state changes in between. Previously this was not a problem: VQs were torn down only on first reset. But after VQ pre-creation was introduced, each reset will delete and re-create the hardware VQs and their associated resources. To solve this problem, avoid resetting hardware VQs if the VQs are still in a blank state. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-23-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Re-create HW VQs under certain conditions	Dragos Tatulea	2	-0/+16
	There are a few conditions under which the hardware VQs need a full teardown and setup: - VQ size changed to something else than default value. Hardware VQ size modification is not supported. - User turns off certain device features: mergeable buffers, checksum virtio 1.0 compliance. In these cases, the TIR and RQT need to be re-created. Add a needs_teardown configuration variable and set it when detecting the above scenarios. On next DRIVER_OK, the resources will be torn down first. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-22-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time	Dragos Tatulea	1	-5/+32
	Currently, hardware VQs are created right when the vdpa device gets into DRIVER_OK state. That is easier because most of the VQ state is known by then. This patch switches to creating all VQs and their associated resources at device creation time. The motivation is to reduce the vdpa device live migration downtime by moving the expensive operation of creating all the hardware VQs and their associated resources out of downtime on the destination VM. The VQs are now created in a blank state. The VQ configuration will happen later, on DRIVER_OK. Then the configuration will be applied when the VQs are moved to the Ready state. When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is needed: now that the VQ is already created a resume_vq() will be triggered too early when no mr has been configured yet. Skip calling resume_vq() in this case, let it be handled during DRIVER_OK. For virtio-vdpa, the device configuration is done earlier during .vdpa_dev_add() by vdpa_register_device(). Avoid calling setup_vq_resources() a second time in that case. On a 64 CPU, 256 GB VM with 1 vDPA device of 16 VQps, the full VQ resource creation + resume time was ~370ms. Now it's down to 60 ms (only VQ config and resume). The measurements were done on a ConnectX6DX based vDPA device. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-21-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Use suspend/resume during VQP change	Dragos Tatulea	1	-3/+11
	Resume a VQ if it is already created when the number of VQ pairs increases. This is done in preparation for VQ pre-creation which is coming in a later patch. It is necessary because calling setup_vq() on an already created VQ will return early and will not enable the queue. For symmetry, suspend a VQ instead of tearing it down when the number of VQ pairs decreases. But only if the resume operation is supported. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-20-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Forward error in suspend/resume device	Dragos Tatulea	1	-4/+8
	Start using the suspend/resume_vq() error return codes previously added. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-19-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
2024-07-09	vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq()	Dragos Tatulea	1	-12/+6
	There are a few more places modifying the VQ to Ready directly. Let's consolidate them into resume_vq(). The redundant warnings for resume_vq() errors can also be dropped. There is one special case that needs to be handled for virtio-vdpa: the initialized flag must be set to true earlier in setup_vq() so that resume_vq() doesn't return early. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-18-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Add error code for suspend/resume VQ	Dragos Tatulea	1	-23/+54
	Instead of blindly calling suspend/resume_vqs(), make then return error codes. To keep compatibility, keep suspending or resuming VQs on error and return the last error code. The assumption here is that the error code would be the same. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-17-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq()	Dragos Tatulea	1	-2/+22
	Until now resume_vq() was used only for the suspend/resume scenario. This change also allows calling resume_vq() to bring it from Init to Ready state (VQ initialization). Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-16-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2024-07-09	vdpa/mlx5: Allow creation of blank VQs	Dragos Tatulea	1	-30/+55
	Based on the filled flag, create VQs that are filled or blank. Blank VQs will be filled in later through VQ modify. Downstream patches will make use of this to pre-create blank VQs at vdpa device creation. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-15-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2024-07-09	vdpa/mlx5: Set mkey modified flags on all VQs	Dragos Tatulea	1	-1/+1
	Otherwise, when virtqueues are moved from INIT to READY the latest mkey will not be set appropriately. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-14-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Start off rqt_size with max VQPs	Dragos Tatulea	1	-5/+5
	Currently rqt_size is initialized during device flag configuration. That's because it is the earliest moment when device knows if MQ (multi queue) is on or off. Shift this configuration earlier to device creation time. This implies that non-MQ devices will have a larger RQT size. But the configuration will still be correct. This is done in preparation for the pre-creation of hardware virtqueues at device add time. When that change will be added, RQT will be created at device creation time so it needs to be initialized to its max size. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-13-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Set an initial size on the VQ	Dragos Tatulea	1	-3/+3
	The virtqueue size is a pre-requisite for setting up any virtqueue resources. For the upcoming optimization of creating virtqueues at device add, the virtqueue size has to be configured. The queue size check in setup_vq() will always be false. So remove it. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-12-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Add support for modifying the VQ features field	Dragos Tatulea	1	-1/+11
	This is done in preparation for the pre-creation of hardware virtqueues at device add time. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-11-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Add support for modifying the virtio_version VQ field	Dragos Tatulea	1	-0/+16
	This is done in preparation for the pre-creation of hardware virtqueues at device add time. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-10-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Rename init_mvqs	Dragos Tatulea	1	-5/+5
	Function is used to set default values, so name it accordingly. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-9-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
2024-07-09	vdpa/mlx5: Clear and reinitialize software VQ data on reset	Dragos Tatulea	1	-13/+3
	The hardware VQ configuration is mirrored by data in struct mlx5_vdpa_virtqueue . Instead of clearing just a few fields at reset, fully clear the struct and initialize with the appropriate default values. As clear_vqs_ready() is used only during reset, get rid of it. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-8-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Initialize and reset device with one queue pair	Dragos Tatulea	1	-11/+12
	The virtio spec says that a vdpa device should start off with one queue pair. The driver is already compliant. This patch moves the initialization to device add and reset times. This is done in preparation for the pre-creation of hardware virtqueues at device add time. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-7-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2024-07-09	vdpa/mlx5: Remove duplicate suspend code	Dragos Tatulea	1	-6/+1
	Use the dedicated suspend_vqs() function instead. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-6-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Iterate over active VQs during suspend/resume	Dragos Tatulea	1	-2/+2
	No need to iterate over max number of VQs. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-5-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Drop redundant check in teardown_virtqueues()	Dragos Tatulea	1	-8/+2
	The check is done inside teardown_vq(). Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-4-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Drop redundant code	Dragos Tatulea	1	-6/+0
	Originally, the second loop initialized the CVQ. But (acde3929492b ("vdpa/mlx5: Use consistent RQT size") initialized all the queues in the first loop, so the second iteration in init_mvqs() is never called because the first one will iterate up to max_vqs. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-3-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical	Dragos Tatulea	1	-5/+5
	... by changing the setup_vq_resources() parameter type. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-2-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09	vdpa/mlx5: Clarify meaning thorough function rename	Dragos Tatulea	1	-14/+14
	setup_driver()/teardown_driver() are a bit vague. These functions are used for virtqueue resources. Same for alloc_resources()/teardown_resources(): they represent fixed resources that are meant to exist during the device lifetime. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-1-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19	vdpa/mlx5: Allow CVQ size changes	Jonah Palmer	1	-4/+9
	The MLX driver was not updating its control virtqueue size at set_vq_num and instead always initialized to MLX5_CVQ_MAX_ENT (16) at setup_cvq_vring. Qemu would try to set the size to 64 by default, however, because the CVQ size always was initialized to 16, an error would be thrown when sending >16 control messages (as used-ring entry 17 is initialized to 0). For example, starting a guest with x-svq=on and then executing the following command would produce the error below: # for i in {1..20}; do ifconfig eth0 hw ether XX:xx:XX:xx:XX:XX; done qemu-system-x86_64: Insufficient written data (0) [ 435.331223] virtio_net virtio0: Failed to set mac address by vq command. SIOCSIFHWADDR: Invalid argument Acked-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20240216142502.78095-1-jonah.palmer@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting")
2024-01-10	vdpa/mlx5: Add mkey leak detection	Dragos Tatulea	3	-0/+27
	Track allocated mrs in a list and show warning when leaks are detected on device free or reset. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-9-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10	vdpa/mlx5: Introduce reference counting to mrs	Dragos Tatulea	3	-25/+78
	Deleting the old mr during mr update (.set_map) and then modifying the vqs with the new mr is not a good flow for firmware. The firmware expects that mkeys are deleted after there are no more vqs referencing them. Introduce reference counting for mrs to fix this. It is the only way to make sure that mkeys are not in use by vqs. An mr reference is taken when the mr is associated to the mr asid table and when the mr is linked to the vq on create/modify. The reference is released when the mkey is unlinked from the vq (trough modify/destroy) and from the mr asid table. To make things consistent, get rid of mlx5_vdpa_destroy_mr and use get/put semantics everywhere. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-8-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10	vdpa/mlx5: Use vq suspend/resume during .set_map	Dragos Tatulea	1	-8/+38
	Instead of tearing down and setting up vq resources, use vq suspend/resume during .set_map to speed things up a bit. The vq mr is updated with the new mapping while the vqs are suspended. If the device doesn't support resumable vqs, do the old teardown and setup dance. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-7-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10	vdpa/mlx5: Mark vq state for modification in hw vq	Dragos Tatulea	1	-0/+8
	.set_vq_state will set the indices and mark the fields to be modified in the hw vq. Advertise that the device supports changing the vq state when the device is in DRIVER_OK state and suspended. Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231225151203.152687-6-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10	vdpa/mlx5: Mark vq addrs for modification in hw vq	Dragos Tatulea	1	-0/+9
	Addresses get set by .set_vq_address. hw vq addresses will be updated on next modify_virtqueue. Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-5-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10	vdpa/mlx5: Introduce per vq and device resume	Dragos Tatulea	1	-7/+62
	Implement vdpa vq and device resume if capability detected. Add support for suspend -> ready state change. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-4-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10	vdpa/mlx5: Allow modifying multiple vq fields in one modify command	Dragos Tatulea	1	-8/+40
	Add a bitmask variable that tracks hw vq field changes that are supposed to be modified on next hw vq change command. This will be useful to set multiple vq fields when resuming the vq. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-01	vdpa/mlx5: preserve CVQ vringh index	Steve Sistare	1	-1/+6
	mlx5_vdpa does not preserve userland's view of vring base for the control queue in the following sequence: ioctl VHOST_SET_VRING_BASE ioctl VHOST_VDPA_SET_STATUS VIRTIO_CONFIG_S_DRIVER_OK mlx5_vdpa_set_status() setup_cvq_vring() vringh_init_iotlb() vringh_init_kern() vrh->last_avail_idx = 0; ioctl VHOST_GET_VRING_BASE To fix, restore the value of cvq->vring.last_avail_idx after calling vringh_init_iotlb. Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <1699014387-194368-1-git-send-email-steven.sistare@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-01	vdpa/mlx5: implement .reset_map driver op	Si-Wei Liu	3	-3/+42
	Since commit 6f5312f80183 ("vdpa/mlx5: Add support for running with virtio_vdpa"), mlx5_vdpa starts with preallocate 1:1 DMA MR at device creation time. This 1:1 DMA MR will be implicitly destroyed while the first .set_map call is invoked, in which case callers like vhost-vdpa will start to set up custom mappings. When the .reset callback is invoked, the custom mappings will be cleared and the 1:1 DMA MR will be re-created. In order to reduce excessive memory mapping cost in live migration, it is desirable to decouple the vhost-vdpa IOTLB abstraction from the virtio device life cycle, i.e. mappings can be kept around intact across virtio device reset. Leverage the .reset_map callback, which is meant to destroy the regular MR (including cvq mapping) on the given ASID and recreate the initial DMA mapping. That way, the device .reset op runs free from having to maintain and clean up memory mappings by itself. Additionally, implement .compat_reset to cater for older userspace, which may wish to see mapping to be cleared during reset. Co-developed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <1697880319-4937-7-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK	Eugenio Pérez	1	-0/+7
	Offer this backend feature as mlx5 is compatible with it. It allows it to do live migration with CVQ, dynamically switching between passthrough and shadow virtqueue. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230703142514.363256-1-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-01	vdpa/mlx5: Update cvq iotlb mapping on ASID change	Dragos Tatulea	3	-1/+36
	For the following sequence: - cvq group is in ASID 0 - .set_map(1, cvq_iotlb) - .set_group_asid(cvq_group, 1) ... the cvq mapping from ASID 0 will be used. This is not always correct behaviour. This patch adds support for the above mentioned flow by saving the iotlb on each .set_map and updating the cvq iotlb with it on a cvq group change. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-18-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Make iotlb helper functions more generic	Dragos Tatulea	1	-8/+11
	They will be used in a follow-up patch. For dup_iotlb, avoid the src == dst case. This is an error. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-17-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Enable hw support for vq descriptor mapping	Dragos Tatulea	1	-1/+23
	Vq descriptor mappings are supported in hardware by filling in an additional mkey which contains the descriptor mappings to the hw vq. A previous patch in this series added support for hw mkey (mr) creation for ASID 1. This patch fills in both the vq data and vq descriptor mkeys based on group ASID mapping. The feature is signaled to the vdpa core through the presence of the .get_vq_desc_group op. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-16-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Introduce mr for vq descriptor	Dragos Tatulea	3	-14/+25
	Introduce the vq descriptor group and mr per ASID. Until now .set_map on ASID 1 was only updating the cvq iotlb. From now on it also creates a mkey for it. The current patch doesn't use it but follow-up patches will add hardware support for mapping the vq descriptors. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-15-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Improve mr update flow	Dragos Tatulea	3	-72/+82
	The current flow for updating an mr works directly on mvdev->mr which makes it cumbersome to handle multiple new mr structs. This patch makes the flow more straightforward by having mlx5_vdpa_create_mr return a new mr which will update the old mr (if any). The old mr will be deleted and unlinked from mvdev. For the case when the iotlb is empty (not NULL), the old mr will be cleared. This change paves the way for adding mrs for different ASIDs. The initialized bool is no longer needed as mr is now a pointer in the mlx5_vdpa_dev struct which will be NULL when not initialized. Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-14-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Move mr mutex out of mr struct	Dragos Tatulea	3	-11/+12
	The mutex is named like it is supposed to protect only the mkey but in reality it is a global lock for all mr resources. Shift the mutex to it's rightful location (struct mlx5_vdpa_dev) and give it a more appropriate name. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231018171456.1624030-13-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Allow creation/deletion of any given mr struct	Dragos Tatulea	3	-35/+36
	This patch adapts the mr creation/deletion code to be able to work with any given mr struct pointer. All the APIs are adapted to take an extra parameter for the mr. mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The check is done in the caller instead (mlx5_set_map). This change is needed for a followup patch which will introduce an additional mr for the vq descriptor data. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231018171456.1624030-12-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Rename mr destroy functions	Dragos Tatulea	3	-11/+11
	Make mlx5_destroy_mr symmetric to mlx5_create_mr. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-11-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Collapse "dvq" mr add/delete functions	Dragos Tatulea	1	-11/+5
	Now that the cvq code is out of mlx5_vdpa_create/destroy_mr, the "dvq" functions can be folded into their callers. Having "dvq" in the naming will no longer be accurate in the downstream patches. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-10-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Take cvq iotlb lock during refresh	Dragos Tatulea	1	-1/+9
	The reslock is taken while refresh is called but iommu_lock is more specific to this resource. So take the iommu_lock during cvq iotlb refresh. Based on Eugenio's patch [0]. [0] https://lore.kernel.org/lkml/20230112142218.725622-4-eperezma@redhat.com/ Acked-by: Jason Wang <jasowang@redhat.com> Suggested-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-9-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code	Dragos Tatulea	3	-39/+28
	The handling of the cvq iotlb is currently coupled with the creation and destruction of the hardware mkeys (mr). This patch moves cvq iotlb handling into its own function and shifts it to a scope that is not related to mr handling. As cvq handling is just a prune_iotlb + dup_iotlb cycle, put it all in the same "update" function. Finally, the destruction path is handled by directly pruning the iotlb. After this move is done the ASID mr code can be collapsed into a single function. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-8-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-11-01	vdpa/mlx5: Create helper function for dma mappings	Dragos Tatulea	3	-2/+8
	Necessary for upcoming cvq separation from mr allocation. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231018171456.1624030-7-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-10-18	vdpa/mlx5: Fix firmware error on creation of 1k VQs	Dragos Tatulea	2	-9/+63
	A firmware error is triggered when configuring a 9k MTU on the PF after switching to switchdev mode and then using a vdpa device with larger (1k) rings: mlx5_cmd_out_err: CREATE_GENERAL_OBJECT(0xa00) op_mod(0xd) failed, status bad resource(0x5), syndrome (0xf6db90), err(-22) This is due to the fact that the hw VQ size parameters are computed based on the umem_1/2/3_buffer_param_a/b capabilities and all device capabilities are read only when the driver is moved to switchdev mode. The problematic configuration flow looks like this: 1) Create VF 2) Unbind VF 3) Switch PF to switchdev mode. 4) Bind VF 5) Set PF MTU to 9k 6) create vDPA device 7) Start VM with vDPA device and 1K queue size Note that setting the MTU before step 3) doesn't trigger this issue. This patch reads the forementioned umem parameters at the latest point possible before the VQs of the device are created. v2: - Allocate output with kmalloc to reduce stack frame size. - Removed stable from cc. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230831155702.1080754-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-10-18	vdpa/mlx5: Fix double release of debugfs entry	Dragos Tatulea	3	-8/+6
	The error path in setup_driver deletes the debugfs entry but doesn't clear the pointer. During .dev_del the invalid pointer will be released again causing a crash. This patch fixes the issue by always clearing the debugfs entry in mlx5_vdpa_remove_debugfs. Also, stop removing the debugfs entry in .dev_del op: the debugfs entry is already handled within the setup_driver/teardown_driver scope. Cc: stable@vger.kernel.org Fixes: f0417e72add5 ("vdpa/mlx5: Add and remove debugfs in setup/teardown driver") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Message-Id: <20230829174014.928189-2-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-09-04	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds	1	-3/+0
	Pull virtio updates from Michael Tsirkin: "A small pull request this time around, mostly because the vduse network got postponed to next relase so we can be sure we got the security store right" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_ring: fix avail_wrap_counter in virtqueue_add_packed virtio_vdpa: build affinity masks conditionally virtio_net: merge dma operations when filling mergeable buffers virtio_ring: introduce dma sync api for virtqueue virtio_ring: introduce dma map api for virtqueue virtio_ring: introduce virtqueue_reset() virtio_ring: separate the logic of reset/enable from virtqueue_resize virtio_ring: correct the expression of the description of virtqueue_resize() virtio_ring: skip unmap for premapped virtio_ring: introduce virtqueue_dma_dev() virtio_ring: support add premapped buf virtio_ring: introduce virtqueue_set_dma_premapped() virtio_ring: put mapping error check in vring_map_one_sg virtio_ring: check use_dma_api before unmap desc for indirect vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK vdpa: add get_backend_features vdpa operation vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag vdpa/mlx5: Remove unused function declarations