summaryrefslogtreecommitdiff
path: root/drivers/scsi/hisi_sas
AgeCommit message (Collapse)AuthorFilesLines
2023-11-28scsi: hisi_sas: Set debugfs_dir pointer to NULL after removing debugfsYihang Li1-6/+7
[ Upstream commit 6de426f9276c448e2db7238911c97fb157cb23be ] If init debugfs failed during device registration due to memory allocation failure, debugfs_remove_recursive() is called, after which debugfs_dir is not set to NULL. debugfs_remove_recursive() will be called again during device removal. As a result, illegal pointer is accessed. [ 1665.467244] hisi_sas_v3_hw 0000:b4:02.0: failed to init debugfs! ... [ 1669.836708] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0 [ 1669.872669] pc : down_write+0x24/0x70 [ 1669.876315] lr : down_write+0x1c/0x70 [ 1669.879961] sp : ffff000036f53a30 [ 1669.883260] x29: ffff000036f53a30 x28: ffffa027c31549f8 [ 1669.888547] x27: ffffa027c3140000 x26: 0000000000000000 [ 1669.893834] x25: ffffa027bf37c270 x24: ffffa027bf37c270 [ 1669.899122] x23: ffff0000095406b8 x22: ffff0000095406a8 [ 1669.904408] x21: 0000000000000000 x20: ffffa027bf37c310 [ 1669.909695] x19: 00000000000000a0 x18: ffff8027dcd86f10 [ 1669.914982] x17: 0000000000000000 x16: 0000000000000000 [ 1669.920268] x15: 0000000000000000 x14: ffffa0274014f870 [ 1669.925555] x13: 0000000000000040 x12: 0000000000000228 [ 1669.930842] x11: 0000000000000020 x10: 0000000000000bb0 [ 1669.936129] x9 : ffff000036f537f0 x8 : ffff80273088ca10 [ 1669.941416] x7 : 000000000000001d x6 : 00000000ffffffff [ 1669.946702] x5 : ffff000008a36310 x4 : ffff80273088be00 [ 1669.951989] x3 : ffff000009513e90 x2 : 0000000000000000 [ 1669.957276] x1 : 00000000000000a0 x0 : ffffffff00000001 [ 1669.962563] Call trace: [ 1669.965000] down_write+0x24/0x70 [ 1669.968301] debugfs_remove_recursive+0x5c/0x1b0 [ 1669.972905] hisi_sas_debugfs_exit+0x24/0x30 [hisi_sas_main] [ 1669.978541] hisi_sas_v3_remove+0x130/0x150 [hisi_sas_v3_hw] [ 1669.984175] pci_device_remove+0x48/0xd8 [ 1669.988082] device_release_driver_internal+0x1b4/0x250 [ 1669.993282] device_release_driver+0x28/0x38 [ 1669.997534] pci_stop_bus_device+0x84/0xb8 [ 1670.001611] pci_stop_and_remove_bus_device_locked+0x24/0x40 [ 1670.007244] remove_store+0xfc/0x140 [ 1670.010802] dev_attr_store+0x44/0x60 [ 1670.014448] sysfs_kf_write+0x58/0x80 [ 1670.018095] kernfs_fop_write+0xe8/0x1f0 [ 1670.022000] __vfs_write+0x60/0x190 [ 1670.025472] vfs_write+0xac/0x1c0 [ 1670.028771] ksys_write+0x6c/0xd8 [ 1670.032071] __arm64_sys_write+0x24/0x30 [ 1670.035977] el0_svc_common+0x78/0x130 [ 1670.039710] el0_svc_handler+0x38/0x78 [ 1670.043442] el0_svc+0x8/0xc To fix this, set debugfs_dir to NULL after debugfs_remove_recursive(). Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1694571327-78697-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13scsi: hisi_sas: Fix normally completed I/O analysed as failedXingui Yang2-4/+13
[ Upstream commit f5393a5602cacfda2014e0ff8220e5a7564e7cd1 ] The PIO read command has no response frame and the struct iu[1024] won't be filled. I/Os which are normally completed will be treated as failed in sas_ata_task_done() when iu contains abnormal dirty data. Consequently ending_fis should not be filled by iu when the response frame hasn't been written to memory. Fixes: d380f55503ed ("scsi: hisi_sas: Don't bother clearing status buffer IU in task prep") Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1689045300-44318-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13scsi: hisi_sas: Fix warnings detected by sparseXingui Yang1-3/+4
[ Upstream commit c0328cc595124579328462fc45d7a29a084cf357 ] This patch fixes the following warning: drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:2168:43: sparse: sparse: restricted __le32 degrades to integer Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304161254.NztCVZIO-lkp@intel.com/ Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1684118481-95908-4-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: f5393a5602ca ("scsi: hisi_sas: Fix normally completed I/O analysed as failed") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11scsi: hisi_sas: Handle NCQ error when IPTT is validXingui Yang3-3/+15
[ Upstream commit bb544224da77b96b2c11a13872bf91ede1e015be ] If an NCQ error occurs when the IPTT is valid and slot->abort flag is set in completion path, sas_task_abort() will be called to abort only one NCQ command now, and the host would be set to SHOST_RECOVERY state. But this may not kick-off EH Immediately until other outstanding QCs timeouts. As a result, the host may remain in the SHOST_RECOVERY state for up to 30 seconds, such as follows: [7972317.645234] hisi_sas_v3_hw 0000:74:04.0: erroneous completion iptt=3264 task=00000000466116b8 dev id=2 sas_addr=0x5000000000000502 CQ hdr: 0x1883 0x20cc0 0x40000 0x20420000 Error info: 0x0 0x0 0x200000 0x0 [7972341.508264] sas: Enter sas_scsi_recover_host busy: 32 failed: 32 [7972341.984731] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 32 tries: 1 All NCQ commands that are in the queue should be aborted when an NCQ error occurs in this scenario. Fixes: 05d91b557af9 ("scsi: hisi_sas: Directly trigger SCSI error handling for completion errors") Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1679283265-115066-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-30scsi: hisi_sas: Check devm_add_action() return valueKang Chen1-2/+1
[ Upstream commit 06d1a90de60208054cca15ef200138cfdbb642a9 ] In case devm_add_action() fails, check it in the caller of interrupt_preinit_v3_hw(). Link: https://lore.kernel.org/r/20230227031030.893324-1-void0red@gmail.com Signed-off-by: Kang Chen <void0red@gmail.com> Acked-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-25scsi: hisi_sas: Fix SATA devices missing issue during I_T nexus resetJie Zhan1-3/+5
[ Upstream commit 3c2673a09cf1181318c07b7dbc1bc532ba3d33e3 ] SATA devices on an expander may be removed and not be found again when I_T nexus reset and revalidation are processed simultaneously. The issue comes from: - Revalidation can remove SATA devices in link reset, e.g. in hisi_sas_clear_nexus_ha(). - However, hisi_sas_debug_I_T_nexus_reset() polls the state of a SATA device on an expander after sending link_reset, where it calls: hisi_sas_debug_I_T_nexus_reset sas_ata_wait_after_reset ata_wait_after_reset ata_wait_ready smp_ata_check_ready sas_ex_phy_discover sas_ex_phy_discover_helper sas_set_ex_phy The ex_phy's change count is updated in sas_set_ex_phy(), so SATA devices after a link reset may not be found later through revalidation. A similar issue was reported in: commit 0f3fce5cc77e ("[SCSI] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready") commit 87c8331fcf72 ("[SCSI] libsas: prevent domain rediscovery competing with ata error handling"). To address this issue, in hisi_sas_debug_I_T_nexus_reset(), we now call smp_ata_check_ready_type() that only polls the device type while not updating the ex_phy's data of libsas. Fixes: 71453bd9d1bf ("scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset") Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Link: https://lore.kernel.org/r/20221118083714.4034612-5-zhanjie9@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01scsi: hisi_sas: Set a port invalid only if there are no devices attached ↵Yihang Li1-1/+1
when refreshing port id [ Upstream commit f58c89700630da6554b24fd3df293a24874c10c1 ] Currently the driver sets the port invalid if one phy in the port is not enabled, which may cause issues in expander situation. In directly attached situation, if phy up doesn't occur in time when refreshing port id, the port is incorrectly set to invalid which will also cause disk lost. Therefore set a port invalid only if there are no devices attached to the port. Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1672805000-141102-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01scsi: hisi_sas: Use abort task set to reset SAS disks when discoveredXingui Yang1-1/+1
[ Upstream commit 037b48057e8b485a8d72f808122796aeadbbee32 ] Currently clear task set is used to abort all commands remaining in the disk when the SAS disk is discovered, and if the disk is discovered by two initiators, other I_T nexuses are also affected. So use abort task set instead and take effect only on the specified I_T nexus. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1672805000-141102-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-07Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds5-25/+38
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (qla2xxx, lpfc, ufs, hisi_sas, mpi3mr, mpt3sas, target). The biggest change (from my biased viewpoint) being that the mpi3mr now attached to the SAS transport class, making it the first fusion type device to do so. Beyond the usual bug fixing and security class reworks, there aren't a huge number of core changes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (141 commits) scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername() scsi: mpi3mr: Remove unnecessary cast scsi: stex: Properly zero out the passthrough command structure scsi: mpi3mr: Update driver version to 8.2.0.3.0 scsi: mpi3mr: Fix scheduling while atomic type bug scsi: mpi3mr: Scan the devices during resume time scsi: mpi3mr: Free enclosure objects during driver unload scsi: mpi3mr: Handle 0xF003 Fault Code scsi: mpi3mr: Graceful handling of surprise removal of PCIe HBA scsi: mpi3mr: Schedule IRQ kthreads only on non-RT kernels scsi: mpi3mr: Support new power management framework scsi: mpi3mr: Update mpi3 header files scsi: mpt3sas: Revert "scsi: mpt3sas: Fix ioc->base_readl() use" scsi: mpt3sas: Revert "scsi: mpt3sas: Fix writel() use" scsi: wd33c93: Remove dead code related to the long-gone config WD33C93_PIO scsi: core: Add I/O timeout count for SCSI device scsi: qedf: Populate sysfs attributes for vport scsi: pm8001: Replace one-element array with flexible-array member scsi: 3w-xxxx: Replace one-element array with flexible-array member scsi: hptiop: Replace one-element array with flexible-array member in struct hpt_iop_request_ioctl_command() ...
2022-09-06scsi: hisi_sas: Don't send bcast events from HW during nexus HA resetJohn Garry1-4/+12
Remote devices may go missing from the per-device nexus reset part of the HA nexus, i.e after the controller reset. This is because libsas may find the devices to be gone as the phy may be temporarily down when processing the bcast event generated from the nexus reset. Filter out bcast events during this time to stop the devices being lost. Link: https://lore.kernel.org/r/1662378529-101489-6-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: hisi_sas: Add helper to process bcast eventsJohn Garry5-13/+18
Add a helper for bcast processing to reduce duplication. Link: https://lore.kernel.org/r/1662378529-101489-5-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: hisi_sas: Drain bcast events in hisi_sas_rescan_topology()John Garry1-0/+7
In resetting the controller, SATA devices may be lost. The issue is that when we insert the bcast events to rescan the topology in hisi_sas_rescan_topology(), when we subsequently nexus reset the SATA devices in hisi_sas_async_I_T_nexus_reset(), there is a small timing window in which the remote phy is down and we process the bcast event (meaning that libsas judges that the disk is lost). Ensure that all bcast events are processed prior to the nexus reset to close this window. Link: https://lore.kernel.org/r/1662378529-101489-4-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: hisi_sas: Clear HISI_SAS_HW_FAULT_BIT earlierJohn Garry1-1/+1
Once the controller HW has been reset then we can unset flag HISI_SAS_HW_FAULT_BIT. In clearing this flag earlier we can now successfully execute commands in hisi_sas_controller_reset_done(), like bcast processing. Link: https://lore.kernel.org/r/1662378529-101489-3-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: hisi_sas: Revert change to limit max hw sectors for v3 HWJohn Garry1-7/+0
Now that libsas and the SCSI core code limits the default sectors from commit 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit") and commit 608128d391fa ("scsi: sd: allow max_sectors be capped at DMA optimal size limit"), there is no need for the hack to limit the max HW sectors. Link: https://lore.kernel.org/r/1662378529-101489-2-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-08-22block: Change the return type of blk_mq_map_queues() into voidBart Van Assche2-7/+3
Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Wang <jasowang@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-18scsi: hisi_sas: Modify v3 HW SATA completion error processingXingui Yang1-1/+8
If the I/O completion response frame returned by the target device has been written to the host memory and the err bit in the status field of the received fis is 1, ts->stat should set to SAS_PROTO_RESPONSE, and this will let EH analyze and further determine cause of failure. Link: https://lore.kernel.org/r/1657823002-139010-5-git-send-email-john.garry@huawei.com Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-18scsi: hisi_sas: Relocate DMA unmap of SMP taskXiang Chen4-7/+5
Currently SMP tasks are DMA unmapped only when cq of SMP I/O is returned normally. If the cq of SMP I/O is returned with exception actually SMP TAS is never unmapped. Relocate DMA unmap of SMP task to fix the issue. Link: https://lore.kernel.org/r/1657823002-139010-4-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-18scsi: hisi_sas: Remove unnecessary variable to hold DMA map elementsXiang Chen1-25/+18
Use slot->n_elem to store the return value of dma_map_sg() for SSP and SMP IOs, and remove unnecessary variable n_elem_req. Link: https://lore.kernel.org/r/1657823002-139010-3-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-18scsi: hisi_sas: Call hisi_sas_slave_configure() from slave_configure_v3_hw()John Garry1-4/+1
There is duplicated code between slave_configure_v3_hw() and hisi_sas_slave_configure(), so call common function hisi_sas_slave_configure() from slave_configure_v3_hw(). Link: https://lore.kernel.org/r/1657823002-139010-2-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-07Merge branch '5.19/scsi-fixes' into 5.20/scsi-stagingMartin K. Petersen1-0/+7
Bring in fixes to resolve a merge conflict in the lpfc driver update. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-27scsi: hisi_sas: Limit max hw sectors for v3 HWJohn Garry1-0/+7
If the controller is behind an IOMMU then the IOMMU IOVA caching range can affect performance, as discussed in [0]. Limit the max HW sectors to not exceed this limit. We need to hardcode the value until a proper DMA mapping API is available. [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/ Link: https://lore.kernel.org/r/1655988119-223714-1-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-21scsi: hisi_sas: Align commentsJiang Jian1-2/+2
Properly align comment lines in slot_index_alloc_quirk_v2_hw(). Link: https://lore.kernel.org/r/20220621072405.34394-1-jiangjian@cdjrlc.com Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-19scsi: hisi_sas: Fix memory ordering in hisi_sas_task_deliver()John Garry1-0/+2
The memories for the slot should be observed to be written prior to observing the slot as ready. Prior to commit 26fc0ea74fcb ("scsi: libsas: Drop SAS_TASK_AT_INITIATOR"), we had a spin_lock() + spin_unlock() immediately before marking the slot as ready. The spin_unlock() - with release semantics - caused the slot memory to be observed to be written. Now that the spin_lock() + spin_unlock() is gone, use a smp_wmb(). Link: https://lore.kernel.org/r/1652774661-12935-1-git-send-email-john.garry@huawei.com Fixes: 26fc0ea74fcb ("scsi: libsas: Drop SAS_TASK_AT_INITIATOR") Reported-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Yihang Li <liyihang6@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-19scsi: hisi_sas: Fix rescan after deleting a diskJohn Garry1-29/+18
Removing an ATA device via sysfs means that the device may not be found through re-scanning: root@ubuntu:/home/john# lsscsi [0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda [0:0:1:0] disk ATA HGST HUS724040AL A8B0 /dev/sdb [0:0:8:0] enclosu 12G SAS Expander RevB - root@ubuntu:/home/john# echo 1 > /sys/block/sdb/device/delete root@ubuntu:/home/john# echo "- - -" > /sys/class/scsi_host/host0/scan root@ubuntu:/home/john# lsscsi [0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda [0:0:8:0] enclosu 12G SAS Expander RevB - root@ubuntu:/home/john# The problem is that the rescan of the device may conflict with the device in being re-initialized, as follows: - In the rescan we call hisi_sas_slave_alloc() in store_scan() -> sas_user_scan() -> [__]scsi_scan_target() -> scsi_probe_and_add_lunc() -> scsi_alloc_sdev() -> hisi_sas_slave_alloc() -> hisi_sas_init_device() In hisi_sas_init_device() we issue an IT nexus reset for ATA devices - That IT nexus causes the remote PHY to go down and this triggers a bcast event - In parallel libsas processes the bcast event, finds that the phy is down and marks the device as gone The hard reset issued in hisi_sas_init_device() is unncessary - as described in the code comment - so remove it. Also set dev status as HISI_SAS_DEV_NORMAL as the hisi_sas_init_device() call. Link: https://lore.kernel.org/r/1652354134-171343-4-git-send-email-john.garry@huawei.com Fixes: 36c6b7613ef1 ("scsi: hisi_sas: Initialise devices in .slave_alloc callback") Tested-by: Yihang Li <liyihang6@hisilicon.com> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-19scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus resetJohn Garry1-7/+12
We have seen errors like this when a SATA device is probed: [524.566298] hisi_sas_v3_hw 0000L74:02.0: erroneous completion iptt=4096 ... [524.582827] sas: TMF task open reject failed 500e004aaaaaaaa00 Since commit 21c7e972475e ("scsi: hisi_sas: Disable SATA disk phy for severe I_T nexus reset failure"), we issue an ATA softreset to disks after a phy reset to ensure that they are in sound working order. If the softreset is issued before the remote phy has come back up then the softreset will fail (errors as above). Remedy this by waiting for the phy to come back up after the reset. Link: https://lore.kernel.org/r/1652354134-171343-3-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-10scsi: hisi_sas: Undo RPM resume for failed notify phy event for v3 HWXiang Chen1-2/+8
If we fail to notify the phy up event then undo the RPM resume, as the phy up notify event handling pairs with that RPM resume. Link: https://lore.kernel.org/r/1651839939-101188-1-git-send-email-john.garry@huawei.com Reported-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Yihang Li <liyihang6@hisilicon.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29scsi: hisi_sas: Remove stray fallthrough annotationDan Carpenter1-1/+0
This case statement doesn't fall through any more so remove the fallthrough annotation. Link: https://lore.kernel.org/r/20220317075214.GC25237@kili Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-14scsi: hisi_sas: Use libsas internal abort supportJohn Garry4-319/+171
Use the common libsas internal abort functionality. In addition, this driver has special handling for internal abort timeouts - specifically whether to reset the controller in that instance, so extend the API for that. Timeout is now increased to 20 * Hz from 6 * Hz. We also retry for failure now, but this should not make a difference. Link: https://lore.kernel.org/r/1647001432-239276-5-git-send-email-john.garry@huawei.com Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27scsi: hisi_sas: Modify v3 HW SSP underflow error processingXingui Yang1-13/+26
In case of SSP underflow allow the response frame IU to be examined for setting the response stat value rather than always setting SAS_DATA_UNDERRUN. This will mean that we call sas_ssp_task_response() in those scenarios and may send sense data to upper layer. Such a condition would be for bad blocks were we just reporting an underflow error to upper layer, but now the sense data will tell immediately that the media is faulty. Link: https://lore.kernel.org/r/1645703489-87194-7-git-send-email-john.garry@huawei.com Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27scsi: hisi_sas: Limit users changing debugfs BIST count valueXiang Chen1-2/+50
Add a file operation for "cnt" file under bist directory, so users can only read "cnt" or clear "cnt" to zero, but cannot randomly modify. Link: https://lore.kernel.org/r/1645703489-87194-6-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27scsi: hisi_sas: Rename error labels in hisi_sas_v3_probe()Qi Liu1-10/+10
To avoid doubt, rename the error labels to indicate the action they will take. Link: https://lore.kernel.org/r/1645703489-87194-5-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27scsi: hisi_sas: Free irq vectors in order for v3 HWQi Liu1-5/+11
If the driver probe fails to request the channel IRQ or fatal IRQ, the driver will free the IRQ vectors before freeing the IRQs in free_irq(), and this will cause a kernel BUG like this: ------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:369! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Call trace: free_msi_irqs+0x118/0x13c pci_disable_msi+0xfc/0x120 pci_free_irq_vectors+0x24/0x3c hisi_sas_v3_probe+0x360/0x9d0 [hisi_sas_v3_hw] local_pci_probe+0x44/0xb0 work_for_cpu_fn+0x20/0x34 process_one_work+0x1d0/0x340 worker_thread+0x2e0/0x460 kthread+0x180/0x190 ret_from_fork+0x10/0x20 ---[ end trace b88990335b610c11 ]--- So we use devm_add_action() to control the order in which we free the vectors. Link: https://lore.kernel.org/r/1645703489-87194-4-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27scsi: hisi_sas: Change hisi_sas_control_phy() phyup timeoutXiang Chen2-2/+3
The time of phyup not only depends on the controller but also the type of disk connected. As an example, from experience, for some SATA disks the amount of time from reset/power-on to receive the D2H FIS for phyup can take upto and more than 10s sometimes. According to the specification of some SATA disks such as ST14000NM0018, the max time from power-on to ready is 30s. Based on this the current timeout of phyup at 2s which is not enough. So set the value as HISI_SAS_WAIT_PHYUP_TIMEOUT (30s) in hisi_sas_control_phy(). For v3 hw there is a pre-existing workaround for a HW bug, being that we issue a link reset when the OOB occurs but the phyup does not. The current phyup timeout is HISI_SAS_WAIT_PHYUP_TIMEOUT. So if this does occur from when issuing a phy enable or similar via hisi_sas_control_phy(), the subsequent HW workaround linkreset processing calls hisi_sas_control_phy(), but this will pend the original phy reset timing out, so it is safe. Link: https://lore.kernel.org/r/1645703489-87194-3-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27scsi: hisi_sas: Change permission of parameter prot_maskXiang Chen1-1/+1
Currently the permission of parameter prot_mask is 0x0, which means that the member does not appear in sysfs. Change it as other module parameters to 0444 for world-readable. [mkp: s/v3/v2/] Link: https://lore.kernel.org/r/1645703489-87194-2-git-send-email-john.garry@huawei.com Fixes: d6a9000b81be ("scsi: hisi_sas: Add support for DIF feature for v2 hw") Reported-by: Yihang Li <liyihang6@hisilicon.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22scsi: hisi_sas: Remove unnecessary print function dev_err()Yang Li1-12/+3
The print function dev_err() is redundant because platform_get_irq() already prints an error. Eliminate the follow coccicheck warnings: ./drivers/scsi/hisi_sas/hisi_sas_v1_hw.c:1661:3-10: line 1661 is redundant because platform_get_irq() already prints an error ./drivers/scsi/hisi_sas/hisi_sas_v1_hw.c:1642:4-11: line 1642 is redundant because platform_get_irq() already prints an error ./drivers/scsi/hisi_sas/hisi_sas_v1_hw.c:1679:3-10: line 1679 is redundant because platform_get_irq() already prints an error Link: https://lore.kernel.org/r/20220215020524.44268-1-yang.lee@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22scsi: libsas: Add sas_execute_ata_cmd()John Garry2-129/+6
Add a function to execute an ATA command using the TMF code, and use in the hisi_sas driver. That driver needs to be able to issue the command on a specific phy, so add an interface for that. With that, hisi_sas_exec_internal_tmf_task() may be deleted. Link: https://lore.kernel.org/r/1645534259-27068-19-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19scsi: libsas: Add sas_abort_task()John Garry1-26/+1
Add a generic implementation of abort task TMF handler, and use in LLDDs. With that, some LLDDs custom TMF functions can now be deleted. Link: https://lore.kernel.org/r/1645112566-115804-18-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19scsi: libsas: Add sas_query_task()John Garry1-11/+1
Add a generic implementation of query task TMF handler, and use in LLDDs. Link: https://lore.kernel.org/r/1645112566-115804-17-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19scsi: libsas: Add sas_lu_reset()John Garry1-3/+1
Add a generic implementation of LU reset TMF handler, and use in LLDDs. Link: https://lore.kernel.org/r/1645112566-115804-16-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19scsi: libsas: Add sas_clear_task_set()John Garry1-4/+1
Add a generic implementation of clear task set TMF handler, and use in LLDDs. Link: https://lore.kernel.org/r/1645112566-115804-15-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19scsi: libsas: Add sas_abort_task_set()John Garry1-4/+1
Add a generic implementation of abort task set TMF handler, and use in LLDDs. Link: https://lore.kernel.org/r/1645112566-115804-14-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19scsi: libsas: Add TMF handler aborted callbackJohn Garry1-0/+20
The hisi_sas and pm8001 TMF handlers have some special processing for when the TMF is aborted, so add a callback and fill it in for those drivers. Link: https://lore.kernel.org/r/1645112566-115804-13-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19scsi: libsas: Add sas_task.tmfJohn Garry1-10/+6
Add a pointer to a sas_tmf_task to the sas_task struct, as this will be used when the common LLDD TMF code is factored out. Also set it for the LLDDs to store per-sas_task TMF info. Link: https://lore.kernel.org/r/1645112566-115804-9-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19scsi: libsas: Add struct sas_tmf_taskJohn Garry5-23/+16
Some of the LLDDs which use libsas have their own definition of a struct to hold TMF info, so add a common struct for libsas. Also add an interim force phy id field for hisi_sas driver, which will be removed once the STP "TMF" code is factored out. Even though some LLDDs (pm8001) use a u32 for the tag, u16 will be adequate, as that named driver only uses tags in range [0, 1024). Link: https://lore.kernel.org/r/1645112566-115804-8-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19scsi: hisi_sas: Delete unused I_T_NEXUS_RESET_PHYUP_TIMEOUTJohn Garry1-2/+0
There is no user, so delete it. Link: https://lore.kernel.org/r/1645112566-115804-6-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19scsi: libsas: Delete lldd_clear_aca callbackJohn Garry1-12/+0
This callback is never called, so remove support. Link: https://lore.kernel.org/r/1645112566-115804-4-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-14Merge branch '5.17/scsi-fixes' into 5.18/scsi-stagingMartin K. Petersen2-13/+6
Pull 5.17 fixes branch into 5.18 tree to resolve a few pm8001 driver merge conflicts. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-11scsi: libsas: Drop SAS_TASK_AT_INITIATORJohn Garry4-13/+4
This flag is now only ever set, so delete it. This also avoids a use-after-free in the pm8001 queue path, as reported in the following: https://lore.kernel.org/linux-scsi/c3cb7228-254e-9584-182b-007ac5e6fe0a@huawei.com/T/#m28c94c6d3ff582ec4a9fa54819180740e8bd4cfb https://lore.kernel.org/linux-scsi/0cc0c435-b4f2-9c76-258d-865ba50a29dd@huawei.com/ [mkp: checkpatch + two SAS_TASK_AT_INITIATOR references] Link: https://lore.kernel.org/r/1644489804-85730-3-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31scsi: hisi_sas: Fix setting of hisi_sas_slot.is_internalJohn Garry1-8/+6
The hisi_sas_slot.is_internal member is not set properly for ATA commands which the driver sends directly. A TMF struct pointer is normally used as a test to set this, but it is NULL for those commands. It's not ideal, but pass an empty TMF struct to set that member properly. Link: https://lore.kernel.org/r/1643627607-138785-1-git-send-email-john.garry@huawei.com Fixes: dc313f6b125b ("scsi: hisi_sas: Factor out task prep and delivery code") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24scsi: hisi_sas: Remove useless DMA-32 fallback configurationChristophe JAILLET2-5/+0
As stated in [1], dma_set_mask() with a 64-bit mask never fails if dev->dma_mask is non-NULL. So, if it fails, the 32-bit case will also fail for the same reason. Simplify code and remove some dead code accordingly. [1]: https://lore.kernel.org/linux-kernel/YL3vSPK5DXTNvgdx@infradead.org/#t Link: https://lore.kernel.org/r/1bf2d3660178b0e6f172e5208bc0bd68d31d9268.1642237482.git.christophe.jaillet@wanadoo.fr Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>