summaryrefslogtreecommitdiff
path: root/drivers/s390
AgeCommit message (Collapse)AuthorFilesLines
2021-09-18s390/qdio: cancel the ESTABLISH ccw after timeoutJulian Wiedmann1-21/+30
commit 1c1dc8bda3a05c60877a6649775894db5343bdea upstream. When the ESTABLISH ccw does not complete within the specified timeout, try our best to cancel the ccw program that is still active on the device. Otherwise the IO subsystem might be accessing it even after the driver eg. called qdio_free(). Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Cc: <stable@vger.kernel.org> # 2.6.27 Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-18s390/qdio: fix roll-back after timeout on ESTABLISH ccwJulian Wiedmann1-12/+19
commit 2c197870e4701610ec3b1143808d4e31152caf30 upstream. When qdio_establish() times out while waiting for the ESTABLISH ccw to complete, it calls qdio_shutdown() to roll back all of its previous actions. But at this point the qdio_irq's state is still QDIO_IRQ_STATE_INACTIVE, so qdio_shutdown() will exit immediately without doing any actual work. Which means that eg. the qdio_irq's thinint-indicator stays registered, and cdev->handler isn't restored to its old value. And since commit 954d6235be41 ("s390/qdio: make thinint registration symmetric") the qdio_irq also stays on the tiq_list, so on the next qdio_establish() we might get a helpful BUG from the list-debugging code: ... [ 4633.512591] list_add double add: new=00000000005a4110, prev=00000001b357db78, next=00000000005a4110. [ 4633.512621] ------------[ cut here ]------------ [ 4633.512623] kernel BUG at lib/list_debug.c:29! ... [ 4633.512796] [<00000001b2c6ee9a>] __list_add_valid+0x82/0xa0 [ 4633.512798] ([<00000001b2c6ee96>] __list_add_valid+0x7e/0xa0) [ 4633.512800] [<00000001b2fcecce>] qdio_establish_thinint+0x116/0x190 [ 4633.512805] [<00000001b2fcbe58>] qdio_establish+0x128/0x498 ... Fix this by extracting a goto-chain from the existing error exits in qdio_establish(), and check the return value of the wait_event_...() to detect the timeout condition. Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.") Root-caused-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Cc: <stable@vger.kernel.org> # 2.6.27 Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-15s390/ap: fix state machine hang after failure to enable irqHarald Freudenberger3-34/+21
[ Upstream commit cabebb697c98fb1f05cc950a747a9b6ec61a5b01 ] If for any reason the interrupt enable for an ap queue fails the state machine run for the queue returned wrong return codes to the caller. So the caller assumed interrupt support for this queue in enabled and thus did not re-establish the high resolution timer used for polling. In the end this let to a hang for the user space process waiting "forever" for the reply. This patch reworks these return codes to return correct indications for the caller to re-establish the timer when a queue runs without interrupt support. Please note that this is fixing a wrong behavior after a first failure (enable interrupt support for the queue) failed. However, looks like this occasionally happens on KVM systems. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15s390/zcrypt: fix wrong offset index for APKA master key valid stateHarald Freudenberger1-4/+4
[ Upstream commit 8617bb74006252cb2286008afe7d6575a6425857 ] Tests showed a mismatch between what the CCA tool reports about the APKA master key state and what's displayed by the zcrypt dd in sysfs. After some investigation, we found out that the documentation which was the source for the zcrypt dd implementation lacks the listing of 3 fields. So this patch now moves the evaluation of the APKA master key state to the correct offset. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15s390/cio: add dev_busid sysfs entry for each subchannelVineeth Vijayan1-0/+17
[ Upstream commit d3683c055212bf910d4e318f7944910ce10dbee6 ] Introduce dev_busid, which exports the device-id associated with the io-subchannel (and message-subchannel). The dev_busid indicates that of the device which may be physically installed on the corrosponding subchannel. The dev_busid value "none" indicates that the subchannel is not valid, there is no I/O device currently associated with the subchannel. The dev_busid information would be helpful to write device-specific udev-rules associated with the subchannel. The dev_busid interface would be available even when the sch is not bound to any driver or if there is no operational device connected on it. Hence this attribute can be used to write udev-rules which are specific to the device associated with the subchannel. Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20s390/sclp_vt220: fix console name to match deviceValentin Vidic1-2/+2
[ Upstream commit b7d91d230a119fdcc334d10c9889ce9c5e15118b ] Console name reported in /proc/consoles: ttyS1 -W- (EC p ) 4:65 does not match the char device name: crw--w---- 1 root root 4, 65 May 17 12:18 /dev/ttysclp0 so debian-installer inside a QEMU s390x instance gets confused and fails to start with the following error: steal-ctty: No such file or directory Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr> Link: https://lore.kernel.org/r/20210427194010.9330-1-vvidic@valentin-vidic.from.hr Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20scsi: zfcp: Report port fc_security as unknown early during remote cable pullSteffen Maier1-0/+1
commit 8b3bdd99c092bbaeaa7d9eecb1a3e5dc9112002b upstream. On remote cable pull, a zfcp_port keeps its status and only gets ZFCP_STATUS_PORT_LINK_TEST added. Only after an ADISC timeout, we would actually start port recovery and remove ZFCP_STATUS_COMMON_UNBLOCKED which zfcp_sysfs_port_fc_security_show() detected and reported as "unknown" instead of the old and possibly stale zfcp_port->connection_info. Add check for ZFCP_STATUS_PORT_LINK_TEST for timely "unknown" report. Link: https://lore.kernel.org/r/20210702160922.2667874-1-maier@linux.ibm.com Fixes: a17c78460093 ("scsi: zfcp: report FC Endpoint Security in sysfs") Cc: <stable@vger.kernel.org> #5.7+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14s390/cio: dont call css_wait_for_slow_path() inside a lockVineeth Vijayan2-2/+3
commit c749d8c018daf5fba6dfac7b6c5c78b27efd7d65 upstream. Currently css_wait_for_slow_path() gets called inside the chp->lock. The path-verification-loop of slowpath inside this lock could lead to deadlock as reported by the lockdep validator. The ccw_device_get_chp_desc() during the instance of a device-set-online would try to acquire the same 'chp->lock' to read the chp->desc. The instance of this function can get called from multiple scenario, like probing or setting-device online manually. This could, in some corner-cases lead to the deadlock. lockdep validator reported this as, CPU0 CPU1 ---- ---- lock(&chp->lock); lock(kn->active#43); lock(&chp->lock); lock((wq_completion)cio); The chp->lock was introduced to serialize the access of struct channel_path. This lock is not needed for the css_wait_for_slow_path() function, so invoke the slow-path function outside this lock. Fixes: b730f3a93395 ("[S390] cio: add lock to struct channel_path") Cc: <stable@vger.kernel.org> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-23s390/ap: Fix hanging ioctl caused by wrong msg counterHarald Freudenberger1-2/+9
commit e73a99f3287a740a07d6618e9470f4d6cb217da8 upstream. When a AP queue is switched to soft offline, all pending requests are purged out of the pending requests list and 'received' by the upper layer like zcrypt device drivers. This is also done for requests which are already enqueued into the firmware queue. A request in a firmware queue may eventually produce an response message, but there is no waiting process any more. However, the response was counted with the queue_counter and as this counter was reset to 0 with the offline switch, the pending response caused the queue_counter to get negative. The next request increased this counter to 0 (instead of 1) which caused the ap code to assume there is nothing to receive and so the response for this valid request was never tried to fetch from the firmware queue. This all caused a queue to not work properly after a switch offline/online and in the end processes to hang forever when trying to send a crypto request after an queue offline/online switch cicle. Fixed by a) making sure the counter does not drop below 0 and b) on a successful enqueue of a message has at least a value of 1. Additionally a warning is emitted, when a reply can't get assigned to a waiting process. This may be normal operation (process had timeout or has been killed) but may give a hint that something unexpected happened (like this odd behavior described above). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16vfio-ccw: Serialize FSM IDLE state with I/O completionEric Farman1-2/+10
[ Upstream commit 2af7a834a435460d546f0cf0a8b8e4d259f1d910 ] Today, the stacked call to vfio_ccw_sch_io_todo() does three things: 1) Update a solicited IRB with CP information, and release the CP if the interrupt was the end of a START operation. 2) Copy the IRB data into the io_region, under the protection of the io_mutex 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that vfio-ccw can accept more work. The trouble is that step 3 is (A) invoked for both solicited and unsolicited interrupts, and (B) sitting after the mutex for step 2. This second piece becomes a problem if it processes an interrupt for a CLEAR SUBCHANNEL while another thread initiates a START, thus allowing the CP and FSM states to get out of sync. That is: CPU 1 CPU 2 fsm_do_clear() fsm_irq() fsm_io_request() vfio_ccw_sch_io_todo() fsm_io_helper() Since the FSM state and CP should be kept in sync, let's make a note when the CP is released, and rely on that as an indication that the FSM should also be reset at the end of this routine and open up the device for more work. Signed-off-by: Eric Farman <farman@linux.ibm.com> Acked-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20210511195631.3995081-4-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-16vfio-ccw: Reset FSM state to IDLE inside FSMEric Farman2-2/+1
[ Upstream commit 6c02ac4c9211edabe17bda437ac97e578756f31b ] When an I/O request is made, the fsm_io_request() routine moves the FSM state from IDLE to CP_PROCESSING, and then fsm_io_helper() moves it to CP_PENDING if the START SUBCHANNEL received a cc0. Yet, the error case to go from CP_PROCESSING back to IDLE is done after the FSM call returns. Let's move this up into the FSM proper, to provide some better symmetry when unwinding in this case. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Matthew Rosato <mjrosato@linux.ibm.com> Message-Id: <20210511195631.3995081-3-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03vfio-ccw: Check initialized flag in cp_init()Eric Farman1-0/+4
[ Upstream commit c6c82e0cd8125d30f2f1b29205c7e1a2f1a6785b ] We have a really nice flag in the channel_program struct that indicates if it had been initialized by cp_init(), and use it as a guard in the other cp accessor routines, but not for a duplicate call into cp_init(). The possibility of this occurring is low, because that flow is protected by the private->io_mutex and FSM CP_PROCESSING state. But then why bother checking it in (for example) cp_prefetch() then? Let's just be consistent and check for that in cp_init() too. Fixes: 71189f263f8a3 ("vfio-ccw: make it safe to access channel programs") Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Matthew Rosato <mjrosato@linux.ibm.com> Message-Id: <20210511195631.3995081-2-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11s390/zcrypt: fix zcard and zqueue hot-unplug memleakHarald Freudenberger2-0/+2
commit 70fac8088cfad9f3b379c9082832b4d7532c16c2 upstream. Tests with kvm and a kmemdebug kernel showed, that on hot unplug the zcard and zqueue structs for the unplugged card or queue are not properly freed because of a mismatch with get/put for the embedded kref counter. This fix now adjusts the handling of the kref counters. With init the kref counter starts with 1. This initial value needs to drop to zero with the unregister of the card or queue to trigger the release and free the object. Fixes: 29c2680fd2bf ("s390/ap: fix ap devices reference counting") Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-25s390/qeth: schedule TX NAPI on QAOB completionJulian Wiedmann1-6/+12
[ Upstream commit 3e83d467a08e25b27c44c885f511624a71c84f7c ] When a QAOB notifies us that a pending TX buffer has been delivered, the actual TX completion processing by qeth_tx_complete_pending_bufs() is done within the context of a TX NAPI instance. We shouldn't rely on this instance being scheduled by some other TX event, but just do it ourselves. qeth_qdio_handle_aob() is called from qeth_poll(), ie. our main NAPI instance. To avoid touching the TX queue's NAPI instance before/after it is (un-)registered, reorder the code in qeth_open() and qeth_stop() accordingly. Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17s390/dasd: fix hanging IO request during DASD driver unbindStefan Haberland1-1/+2
commit 66f669a272898feb1c69b770e1504aa2ec7723d1 upstream. Prevent that an IO request is build during device shutdown initiated by a driver unbind. This request will never be able to be processed or canceled and will hang forever. This will lead also to a hanging unbind. Fix by checking not only if the device is in READY state but also check that there is no device offline initiated before building a new IO request. Fixes: e443343e509a ("s390/dasd: blk-mq conversion") Cc: <stable@vger.kernel.org> # v4.14+ Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Tested-by: Bjoern Walk <bwalk@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17s390/dasd: fix hanging DASD driver unbindStefan Haberland1-2/+1
commit 7d365bd0bff3c0310c39ebaffc9a8458e036d666 upstream. In case of an unbind of the DASD device driver the function dasd_generic_remove() is called which shuts down the device. Among others this functions removes the int_handler from the cdev. During shutdown the device cancels all outstanding IO requests and waits for completion of the clear request. Unfortunately the clear interrupt will never be received when there is no interrupt handler connected. Fix by moving the int_handler removal after the call to the state machine where no request or interrupt is outstanding. Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Tested-by: Bjoern Walk <bwalk@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17s390/qeth: fix notification for pending buffers during teardownJulian Wiedmann1-3/+3
[ Upstream commit 7eefda7f353ef86ad82a2dc8329e8a3538c08ab6 ] The cited commit reworked the state machine for pending TX buffers. In qeth_iqd_tx_complete() it turned PENDING into a transient state, and uses NEED_QAOB for buffers that get parked while waiting for their QAOB completion. But it missed to adjust the check in qeth_tx_complete_buf(). So if qeth_tx_complete_pending_bufs() is called during teardown to drain the parked TX buffers, we no longer raise a notification for af_iucv. Instead of updating the checked state, just move this code into qeth_tx_complete_pending_bufs() itself. This also gets rid of the special-case in the common TX completion path. Fixes: 8908f36d20d8 ("s390/qeth: fix af_iucv notification race") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17s390/qeth: improve completion of pending TX buffersJulian Wiedmann2-42/+30
[ Upstream commit c20383ad1656b0f6354dd50e4acd894f9d94090d ] The current design attaches a pending TX buffer to a custom single-linked list, which is anchored at the buffer's slot on the TX ring. The buffer is then checked for final completion whenever this slot is processed during a subsequent TX NAPI poll cycle. But if there's insufficient traffic on the ring, we might never make enough progress to get back to this ring slot and discover the pending buffer's final TX completion. In particular if this missing TX completion blocks the application from sending further traffic. So convert the custom single-linked list code to a per-queue list_head, and scan this list on every TX NAPI cycle. Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17s390/qeth: remove QETH_QDIO_BUF_HANDLED_DELAYED stateJulian Wiedmann2-5/+2
[ Upstream commit 75cf3854dcdf7b5c583538cae12ffa054d237d93 ] Reuse the QETH_QDIO_BUF_EMPTY state to indicate that a TX buffer has been completed with a QAOB notification, and may be cleaned up by qeth_cleanup_handled_pending(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17s390/qeth: don't replace a fully completed async TX bufferJulian Wiedmann1-38/+51
[ Upstream commit db4ffdcef7c9a842e55228c9faef7abf8b72382f ] For TX buffers that require an additional async notification via QAOB, the TX completion code can now manage all the necessary processing if the notification has already occurred (or is occurring concurrently). In such cases we can avoid replacing the metadata that is associated with the buffer's slot on the ring, and just keep using the current one. As qeth_clear_output_buffer() will also handle any kmem cache-allocated memory that was mapped into the TX buffer, qeth_qdio_handle_aob() doesn't need to worry about it. While at it, also remove the unneeded forward declaration for qeth_init_qdio_out_buf(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17s390/crypto: return -EFAULT if copy_to_user() failsWang Qing1-1/+1
commit 942df4be7ab40195e2a839e9de81951a5862bc5b upstream. The copy_to_user() function returns the number of bytes remaining to be copied, but we want to return -EFAULT if the copy doesn't complete. Fixes: e06670c5fe3b ("s390: vfio-ap: implement VFIO_DEVICE_GET_INFO ioctl") Signed-off-by: Wang Qing <wangqing@vivo.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/1614600502-16714-1-git-send-email-wangqing@vivo.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17s390/cio: return -EFAULT if copy_to_user() failsEric Farman1-1/+1
commit d9c48a948d29bcb22f4fe61a81b718ef6de561a0 upstream. Fixes: 120e214e504f ("vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls") Signed-off-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17s390/cio: return -EFAULT if copy_to_user() fails againWang Qing1-2/+2
commit 51c44babdc19aaf882e1213325a0ba291573308f upstream. The copy_to_user() function returns the number of bytes remaining to be copied, but we want to return -EFAULT if the copy doesn't complete. Fixes: e01bcdd61320 ("vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl") Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/1614600093-13992-1-git-send-email-wangqing@vivo.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17s390/qeth: fix memory leak after failed TX Buffer allocationJulian Wiedmann1-18/+17
commit e7a36d27f6b9f389e41d8189a8a08919c6835732 upstream. When qeth_alloc_qdio_queues() fails to allocate one of the buffers that back an Output Queue, the 'out_freeoutqbufs' path will free all previously allocated buffers for this queue. But it misses to free the half-finished queue struct itself. Move the buffer allocation into qeth_alloc_output_queue(), and deal with such errors internally. Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04virtio/s390: implement virtio-ccw revision 2 correctlyCornelia Huck1-2/+2
commit 182f709c5cff683e6732d04c78e328de0532284f upstream. CCW_CMD_READ_STATUS was introduced with revision 2 of virtio-ccw, and drivers should only rely on it being implemented when they negotiated at least that revision with the device. However, virtio_ccw_get_status() issued READ_STATUS for any device operating at least at revision 1. If the device accepts READ_STATUS regardless of the negotiated revision (which some implementations like QEMU do, even though the spec currently does not allow it), everything works as intended. While a device rejecting the command should also be handled gracefully, we will not be able to see any changes the device makes to the status, such as setting NEEDS_RESET or setting the status to zero after a completed reset. We negotiated the revision to at most 1, as we never bumped the maximum revision; let's do that now and properly send READ_STATUS only if we are operating at least at revision 2. Cc: stable@vger.kernel.org Fixes: 7d3ce5ab9430 ("virtio/s390: support READ_STATUS command for virtio-ccw") Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20210216110645.1087321-1-cohuck@redhat.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04s390/zcrypt: return EIO when msg retry limit reachedHarald Freudenberger1-0/+14
[ Upstream commit d39fae45c97c67b1b4da04773f2bb5a2f29088c4 ] When a msg is retried because the lower ap layer returns -EAGAIN there is a retry limit (currently 10). When this limit is reached the last return code from the lower layer is returned, causing the userspace to get -1 on the ioctl with errno EAGAIN. This EAGAIN is misleading here. After 10 retry attempts the userspace should receive a clear failure indication like EINVAL or EIO or ENODEV. However, the reason why these retries all fail is unclear. On an invalid message EINVAL would be returned by the lower layer, and if devices go away or are not available an ENODEV is seen. So this patch now reworks the retry loops to return EIO to userspace when the retry limit is reached. Fixes: 91ffc519c199 ("s390/zcrypt: introduce msg tracking in zcrypt functions") Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-03s390/vfio-ap: No need to disable IRQ after queue resetTony Krowiak3-49/+69
commit 6c12a6384e0c0b96debd88b24028e58f2ebd417b upstream. The queues assigned to a matrix mediated device are currently reset when: * The VFIO_DEVICE_RESET ioctl is invoked * The mdev fd is closed by userspace (QEMU) * The mdev is removed from sysfs. Immediately after the reset of a queue, a call is made to disable interrupts for the queue. This is entirely unnecessary because the reset of a queue disables interrupts, so this will be removed. Furthermore, vfio_ap_irq_disable() does an unconditional PQAP/AQIC which can result in a specification exception (when the corresponding facility is not available), so this is actually a bugfix. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> [pasic@linux.ibm.com: minor rework before merging] Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Fixes: ec89b55e3bce ("s390: ap: implement PAPQ AQIC interception in kernel") Cc: <stable@vger.kernel.org> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17s390/qeth: fix L2 header access in qeth_l3_osa_features_check()Julian Wiedmann1-1/+1
[ Upstream commit f9c4845385c8f6631ebd5dddfb019ea7a285fba4 ] ip_finish_output_gso() may call .ndo_features_check() even before the skb has a L2 header. This conflicts with qeth_get_ip_version()'s attempt to inspect the L2 header via vlan_eth_hdr(). Switch to vlan_get_protocol(), as already used further down in the common qeth_features_check() path. Fixes: f13ade199391 ("s390/qeth: run non-offload L3 traffic over common xmit path") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17s390/qeth: fix locking for discipline setup / removalJulian Wiedmann3-10/+7
[ Upstream commit b41b554c1ee75070a14c02a88496b1f231c7eacc ] Due to insufficient locking, qeth_core_set_online() and qeth_dev_layer2_store() can run in parallel, both attempting to load & setup the discipline (and stepping on each other toes along the way). A similar race can also occur between qeth_core_remove_device() and qeth_dev_layer2_store(). Access to .discipline is meant to be protected by the discipline_mutex, so add/expand the locking in qeth_core_remove_device() and qeth_core_set_online(). Adjust the locking in qeth_l*_remove_device() accordingly, as it's now handled by the callers in a consistent manner. Based on an initial patch by Ursula Braun. Fixes: 9dc48ccc68b9 ("qeth: serialize sysfs-triggered device configurations") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17s390/qeth: fix deadlock during recoveryJulian Wiedmann4-18/+34
[ Upstream commit 0b9902c1fcc59ba75268386c0420a554f8844168 ] When qeth_dev_layer2_store() - holding the discipline_mutex - waits inside qeth_l*_remove_device() for a qeth_do_reset() thread to complete, we can hit a deadlock if qeth_do_reset() concurrently calls qeth_set_online() and thus tries to aquire the discipline_mutex. Move the discipline_mutex locking outside of qeth_set_online() and qeth_set_offline(), and turn the discipline into a parameter so that callers understand the dependency. To fix the deadlock, we can now relax the locking: As already established, qeth_l*_remove_device() waits for qeth_do_reset() to complete. So qeth_do_reset() itself is under no risk of having card->discipline ripped out while it's running, and thus doesn't need to take the discipline_mutex. Fixes: 9dc48ccc68b9 ("qeth: serialize sysfs-triggered device configurations") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30s390/dasd: fix list corruption of lcu listStefan Haberland1-1/+1
commit 53a7f655834c7c335bf683f248208d4fbe4b47bc upstream. In dasd_alias_disconnect_device_from_lcu the device is removed from any list on the LCU. Afterwards the LCU is removed from the lcu list if it does not contain devices any longer. The lcu->lock protects the lcu from parallel updates. But to cancel all workers and wait for completion the lcu->lock has to be unlocked. If two devices are removed in parallel and both are removed from the LCU the first device that takes the lcu->lock again will delete the LCU because it is already empty but the second device also tries to free the LCU which leads to a list corruption of the lcu list. Fix by removing the device right before the lcu is checked without unlocking the lcu->lock in between. Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30s390/dasd: fix list corruption of pavgroup group listStefan Haberland1-0/+1
commit 0ede91f83aa335da1c3ec68eb0f9e228f269f6d8 upstream. dasd_alias_add_device() moves devices to the active_devices list in case of a scheduled LCU update regardless if they have previously been in a pavgroup or not. Example: device A and B are in the same pavgroup. Device A has already been in a pavgroup and the private->pavgroup pointer is set and points to a valid pavgroup. While going through dasd_add_device it is moved from the pavgroup to the active_devices list. In parallel device B might be removed from the same pavgroup in remove_device_from_lcu() which in turn checks if the group is empty and deletes it accordingly because device A has already been removed from there. When now device A enters remove_device_from_lcu() it is tried to remove it from the pavgroup again because the pavgroup pointer is still set and again the empty group will be cleaned up which leads to a list corruption. Fix by setting private->pavgroup to NULL in dasd_add_device. If the device has been the last device on the pavgroup an empty pavgroup remains but this will be cleaned up by the scheduled lcu_update which iterates over all existing pavgroups. Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30s390/dasd: prevent inconsistent LCU device dataStefan Haberland1-0/+9
commit a29ea01653493b94ea12bb2b89d1564a265081b6 upstream. Prevent _lcu_update from adding a device to a pavgroup if the LCU still requires an update. The data is not reliable any longer and in parallel devices might have been moved on the lists already. This might lead to list corruptions or invalid PAV grouping. Only add devices to a pavgroup if the LCU is up to date. Additional steps are taken by the scheduled lcu update. Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30s390/dasd: fix hanging device offline processingStefan Haberland1-1/+9
commit 658a337a606f48b7ebe451591f7681d383fa115e upstream. For an LCU update a read unit address configuration IO is required. This is started using sleep_on(), which has early exit paths in case the device is not usable for IO. For example when it is in offline processing. In those cases the LCU update should fail and not be retried. Therefore lcu_update_work checks if EOPNOTSUPP is returned or not. Commit 41995342b40c ("s390/dasd: fix endless loop after read unit address configuration") accidentally removed the EOPNOTSUPP return code from read_unit_address_configuration(), which in turn might lead to an endless loop of the LCU update in offline processing. Fix by returning EOPNOTSUPP again if the device is not able to perform the request. Fixes: 41995342b40c ("s390/dasd: fix endless loop after read unit address configuration") Cc: stable@vger.kernel.org #5.3 Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30s390/cio: fix use-after-free in ccw_device_destroy_consoleQinglang Miao1-2/+2
[ Upstream commit 14d4c4fa46eeaa3922e8e1c4aa727eb0a1412804 ] Use of sch->dev reference after the put_device() call could trigger the use-after-free bugs. Fix this by simply adjusting the position of put_device. Fixes: 37db8985b211 ("s390/cio: add basic protected virtualization support") Reported-by: Hulk Robot <hulkci@huawei.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> [vneethv@linux.ibm.com: Slight modification in the commit-message] Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-27Merge tag 'net-5.10-rc6' of ↵Linus Torvalds3-47/+62
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc6, including fixes from the WiFi driver, and CAN subtrees. Current release - regressions: - gro_cells: reduce number of synchronize_net() calls - ch_ktls: release a lock before jumping to an error path Current release - always broken: - tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header Previous release - regressions: - net/tls: fix missing received data after fast remote close - vsock/virtio: discard packets only when socket is really closed - sock: set sk_err to ee_errno on dequeue from errq - cxgb4: fix the panic caused by non smac rewrite Previous release - always broken: - tcp: fix corner cases around setting ECN with BPF selection of congestion control - tcp: fix race condition when creating child sockets from syncookies on loopback interface - usbnet: ipheth: fix connectivity with iOS 14 - tun: honor IOCB_NOWAIT flag - net/packet: fix packet receive on L3 devices without visible hard header - devlink: Make sure devlink instance and port are in same net namespace - net: openvswitch: fix TTL decrement action netlink message format - bonding: wait for sysfs kobject destruction before freeing struct slave - net: stmmac: fix upstream patch applied to the wrong context - bnxt_en: fix return value and unwind in probe error paths Misc: - devlink: add extra layer of categorization to the reload stats uAPI before it's released" * tag 'net-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) sock: set sk_err to ee_errno on dequeue from errq mptcp: fix NULL ptr dereference on bad MPJ net: openvswitch: fix TTL decrement action netlink message format can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0 can: m_can: fix nominal bitiming tseg2 min for version >= 3.1 can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given can: gs_usb: fix endianess problem with candleLight firmware ch_ktls: lock is not freed net/tls: Protect from calling tls_dev_del for TLS RX twice devlink: Make sure devlink instance and port are in same net namespace devlink: Hold rtnl lock while reading netdev attributes ptp: clockmatrix: bug fix for idtcm_strverscmp enetc: Let the hardware auto-advance the taprio base-time of 0 gro_cells: reduce number of synchronize_net() calls net: stmmac: fix incorrect merge of patch upstream ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init Documentation: netdev-FAQ: suggest how to post co-dependent series ibmvnic: enhance resetting status check during module exit ...
2020-11-20s390/qeth: fix tear down of async TX buffersJulian Wiedmann1-6/+0
When qeth_iqd_tx_complete() detects that a TX buffer requires additional async completion via QAOB, it might fail to replace the queue entry's metadata (and ends up triggering recovery). Assume now that the device gets torn down, overruling the recovery. If the QAOB notification then arrives before the tear down has sufficiently progressed, the buffer state is changed to QETH_QDIO_BUF_HANDLED_DELAYED by qeth_qdio_handle_aob(). The tear down code calls qeth_drain_output_queue(), where qeth_cleanup_handled_pending() will then attempt to replace such a buffer _again_. If it succeeds this time, the buffer ends up dangling in its replacement's ->next_pending list ... where it will never be freed, since there's no further call to qeth_cleanup_handled_pending(). But the second attempt isn't actually needed, we can simply leave the buffer on the queue and re-use it after a potential recovery has completed. The qeth_clear_output_buffer() in qeth_drain_output_queue() will ensure that it's in a clean state again. Fixes: 72861ae792c2 ("qeth: recovery through asynchronous delivery") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20s390/qeth: fix af_iucv notification raceJulian Wiedmann2-24/+58
The two expected notification sequences are 1. TX_NOTIFY_PENDING with a subsequent TX_NOTIFY_DELAYED_*, when our TX completion code first observed the pending TX and the QAOB then completes at a later time; or 2. TX_NOTIFY_OK, when qeth_qdio_handle_aob() picked up the QAOB completion before our TX completion code even noticed that the TX was pending. But as qeth_iqd_tx_complete() and qeth_qdio_handle_aob() can run concurrently, we may end up with a race that results in a sequence of TX_NOTIFY_DELAYED_* followed by TX_NOTIFY_PENDING. Which would confuse the af_iucv code in its tracking of pending transmits. Rework the notification code, so that qeth_qdio_handle_aob() defers its notification if the TX completion code is still active. Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20s390/qeth: make af_iucv TX notification call more robustJulian Wiedmann1-1/+2
Calling into socket code is ugly already, at least check whether we are dealing with the expected sk_family. Only looking at skb->protocol is bound to cause troubles (consider eg. af_packet). Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20s390/qeth: Remove pnso workaroundAlexandra Winter1-16/+2
Remove workaround that supported early hardware implementations of PNSO OC3. Rely on the 'enarf' feature bit instead. Fixes: fa115adff2c1 ("s390/qeth: Detect PNSO OC3 capability") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> [jwi: use logical instead of bit-wise AND] Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20Merge tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+6
Pull block fixes from Jens Axboe: - NVMe pull request from Christoph: - Doorbell Buffer freeing fix (Minwoo Im) - CSE log leak fix (Keith Busch) - blk-cgroup hd_struct leak fix (Christoph) - Flush request state fix (Ming) - dasd NULL deref fix (Stefan) * tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block: s390/dasd: fix null pointer dereference for ERP requests blk-cgroup: fix a hd_struct leak in blkcg_fill_root_iostats nvme: fix memory leak freeing command effects nvme: directly cache command effects log nvme: free sq/cq dbbuf pointers when dbbuf set fails block: mark flush request as IDLE when it is really finished
2020-11-16s390/dasd: fix null pointer dereference for ERP requestsStefan Haberland1-0/+6
When requeueing all requests on the device request queue to the blocklayer we might get to an ERP (error recovery) request that is a copy of an original CQR. Those requests do not have blocklayer request information or a pointer to the dasd_queue set. When trying to access those data it will lead to a null pointer dereference in dasd_requeue_all_requests(). Fix by checking if the request is an ERP request that can simply be ignored. The blocklayer request will be requeued by the original CQR that is on the device queue right behind the ERP request. Fixes: 9487cfd3430d ("s390/dasd: fix handling of internal requests") Cc: <stable@vger.kernel.org> #4.16 Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-03s390/pkey: fix paes selftest failure with paes and pkey static buildHarald Freudenberger1-14/+16
When both the paes and the pkey kernel module are statically build into the kernel, the paes cipher selftests run before the pkey kernel module is initialized. So a static variable set in the pkey init function and used in the pkey_clr2protkey function is not initialized when the paes cipher's selftests request to call pckmo for transforming a clear key value into a protected key. This patch moves the initial setup of the static variable into the function pck_clr2protkey. So it's possible, to use the function for transforming a clear to a protected key even before the pkey init function has been ca