summaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)AuthorFilesLines
2024-11-08scsi: scsi_transport_fc: Allow setting rport state to current stateBenjamin Marzinski1-2/+2
[ Upstream commit d539a871ae47a1f27a609a62e06093fa69d7ce99 ] The only input fc_rport_set_marginal_state() currently accepts is "Marginal" when port_state is "Online", and "Online" when the port_state is "Marginal". It should also allow setting port_state to its current state, either "Marginal or "Online". Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Link: https://lore.kernel.org/r/20240917230643.966768-1-bmarzins@redhat.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08scsi: scsi_debug: Fix do_device_access() handling of unexpected SG copy lengthJohn Garry1-6/+4
[ Upstream commit d28d17a845600dd9f7de241de9b1528a1b138716 ] If the sg_copy_buffer() call returns less than sdebug_sector_size, then we drop out of the copy loop. However, we still report that we copied the full expected amount, which is not proper. Fix by keeping a running total and return that value. Fixes: 84f3a3c01d70 ("scsi: scsi_debug: Atomic write support") Reported-by: Colin Ian King <colin.i.king@gmail.com> Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241018101655.4207-1-john.g.garry@oracle.com Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-22scsi: mpi3mr: Validate SAS port assignmentsRanjan Kumar2-17/+29
commit b9e63d6c7c0e94a99e1af7c9c0c7fad13a2f2453 upstream. A sanity check on phy_mask was added in commit 3668651def2c ("scsi: mpi3mr: Sanitise num_phys"). This causes warning messages when more than 64 phys are detected and devices connected to phys greater than 64 are dropped. The phy_mask bitmap is only needed for controller phys and not required for expander phys. Controller phys can go up to a maximum of 64 and therefore u64 is good enough to contain phy_mask bitmap. To suppress those warnings and allow devices to be discovered as before the offending commit, restrict the phy_mask setting and lowest phy setting only to the controller phys. Fixes: 3668651def2c ("scsi: mpi3mr: Sanitise num_phys") Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410051943.Mp9o5DlF-lkp@intel.com/ Reported-by: Alexander Motin <mav@ixsystems.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20241008074353.200379-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17scsi: fnic: Move flush_work initialization out of if blockMartin Wilck1-1/+1
commit f30e5f77d2f205ac14d09dec40fd4bb76712f13d upstream. After commit 379a58caa199 ("scsi: fnic: Move fnic_fnic_flush_tx() to a work queue"), it can happen that a work item is sent to an uninitialized work queue. This may has the effect that the item being queued is never actually queued, and any further actions depending on it will not proceed. The following warning is observed while the fnic driver is loaded: kernel: WARNING: CPU: 11 PID: 0 at ../kernel/workqueue.c:1524 __queue_work+0x373/0x410 kernel: <IRQ> kernel: queue_work_on+0x3a/0x50 kernel: fnic_wq_copy_cmpl_handler+0x54a/0x730 [fnic 62fbff0c42e7fb825c60a55cde2fb91facb2ed24] kernel: fnic_isr_msix_wq_copy+0x2d/0x60 [fnic 62fbff0c42e7fb825c60a55cde2fb91facb2ed24] kernel: __handle_irq_event_percpu+0x36/0x1a0 kernel: handle_irq_event_percpu+0x30/0x70 kernel: handle_irq_event+0x34/0x60 kernel: handle_edge_irq+0x7e/0x1a0 kernel: __common_interrupt+0x3b/0xb0 kernel: common_interrupt+0x58/0xa0 kernel: </IRQ> It has been observed that this may break the rediscovery of Fibre Channel devices after a temporary fabric failure. This patch fixes it by moving the work queue initialization out of an if block in fnic_probe(). Signed-off-by: Martin Wilck <mwilck@suse.com> Fixes: 379a58caa199 ("scsi: fnic: Move fnic_fnic_flush_tx() to a work queue") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240930133014.71615-1-mwilck@suse.com Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Karan Tilak Kumar <kartilak@cisco.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17scsi: wd33c93: Don't use stale scsi_pointer valueDaniel Palmer1-1/+1
commit 9023ed8d91eb1fcc93e64dc4962f7412b1c4cbec upstream. A regression was introduced with commit dbb2da557a6a ("scsi: wd33c93: Move the SCSI pointer to private command data") which results in an oops in wd33c93_intr(). That commit added the scsi_pointer variable and initialized it from hostdata->connected. However, during selection, hostdata->connected is not yet valid. Fix this by getting the current scsi_pointer from hostdata->selecting. Cc: Daniel Palmer <daniel@0x0f.com> Cc: Michael Schmitz <schmitzmic@gmail.com> Cc: stable@kernel.org Fixes: dbb2da557a6a ("scsi: wd33c93: Move the SCSI pointer to private command data") Signed-off-by: Daniel Palmer <daniel@0x0f.com> Co-developed-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/09e11a0a54e6aa2a88bd214526d305aaf018f523.1727926187.git.fthain@linux-m68k.org Reviewed-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17scsi: lpfc: Revise TRACE_EVENT log flag severities from KERN_ERR to KERN_WARNINGJustin Tee2-71/+64
[ Upstream commit 1af9af1f8ab38f1285b27581a5e6920ec58296ba ] Revise certain log messages marked as KERN_ERR LOG_TRACE_EVENT to KERN_WARNING and use the lpfc_vlog_msg() macro to still log the event. The benefit is that events of interest are still logged and the entire trace buffer is not dumped with extraneous logging information when using default lpfc_log_verbose driver parameter settings. Also, delete the keyword "fail" from such log messages as they aren't really causes for concern. The log messages are more for warnings to a SAN admin about SAN activity. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240912232447.45607-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17scsi: lpfc: Ensure DA_ID handling completion before deleting an NPIV instanceJustin Tee3-7/+55
[ Upstream commit 0a3c84f71680684c1d41abb92db05f95c09111e8 ] Deleting an NPIV instance requires all fabric ndlps to be released before an NPIV's resources can be torn down. Failure to release fabric ndlps beforehand opens kref imbalance race conditions. Fix by forcing the DA_ID to complete synchronously with usage of wait_queue. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240912232447.45607-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17scsi: lpfc: Add ELS_RSP cmd to the list of WQEs to flush in lpfc_els_flush_cmd()Justin Tee1-3/+4
[ Upstream commit 93bcc5f3984bf4f51da1529700aec351872dbfff ] During HBA stress testing, a spam of received PLOGIs exposes a resource recovery bug causing leakage of lpfc_sqlq entries from the global phba->sli4_hba.lpfc_els_sgl_list. The issue is in lpfc_els_flush_cmd(), where the driver attempts to recover outstanding ELS sgls when walking the txcmplq. Only CMD_ELS_REQUEST64_CRs and CMD_GEN_REQUEST64_CRs are added to the abort and cancel lists. A check for CMD_XMIT_ELS_RSP64_WQE is missing in order to recover LS_ACC usages of the phba->sli4_hba.lpfc_els_sgl_list too. Fix by adding CMD_XMIT_ELS_RSP64_WQE as part of the txcmplq walk when adding WQEs to the abort and cancel list in lpfc_els_flush_cmd(). Also, update naming convention from CRs to WQEs. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240912232447.45607-2-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10scsi: NCR5380: Initialize buffer for MSG IN and STATUS transfersFinn Thain1-0/+4
[ Upstream commit 1c71065df2df693d208dd32758171c1dece66341 ] Following an incomplete transfer in MSG IN phase, the driver would not notice the problem and would make use of invalid data. Initialize 'tmp' appropriately and bail out if no message was received. For STATUS phase, preserve the existing status code unless a new value was transferred. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/52e02a8812ae1a2d810d7f9f7fd800c3ccc320c4.1723001788.git.fthain@linux-m68k.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10scsi: lpfc: Update PRLO handling in direct attached topologyJustin Tee2-13/+36
[ Upstream commit 1f0f7679ad8942f810b0f19ee9cf098c3502d66a ] A kref imbalance occurs when handling an unsolicited PRLO in direct attached topology. Rework PRLO rcv handling when in MAPPED state. Save the state that we were handling a PRLO by setting nlp_last_elscmd to ELS_CMD_PRLO. Then in the lpfc_cmpl_els_logo_acc() completion routine, manually restart discovery. By issuing the PLOGI, which nlp_gets, before nlp_put at the end of the lpfc_cmpl_els_logo_acc() routine, we are saving us from a final nlp_put. And, we are still allowing the unreg_rpi to happen. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240726231512.92867-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10scsi: lpfc: Fix unsolicited FLOGI kref imbalance when in direct attached ↵Justin Tee3-23/+46
topology [ Upstream commit b5c18c9dd138733c16893613345af44deadcf05e ] In direct attached topology, certain target vendors that are quick to issue FLOGI followed by a cable pull for more than dev_loss_tmo may result in a kref imbalance for the remote port ndlp object. Add an nlp_get when the defer_flogi_acc flag is set. This is expected to balance the nlp_put in the defer_flogi_acc clause in the lpfc_issue_els_flogi() routine. Because we need to retain the ndlp ptr, reorganize all of the defer_flogi_acc information into one lpfc_defer_flogi_acc struct. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240726231512.92867-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10scsi: lpfc: Validate hdwq pointers before dereferencing in reset/errata pathsJustin Tee3-3/+24
[ Upstream commit 2be1d4f11944cd6283cb97268b3e17c4424945ca ] When the HBA is undergoing a reset or is handling an errata event, NULL ptr dereference crashes may occur in routines such as lpfc_sli_flush_io_rings(), lpfc_dev_loss_tmo_callbk(), or lpfc_abort_handler(). Add NULL ptr checks before dereferencing hdwq pointers that may have been freed due to operations colliding with a reset or errata event handler. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240726231512.92867-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10scsi: aacraid: Rearrange order of struct aac_srb_unitKees Cook1-1/+1
[ Upstream commit 6e5860b0ad4934baee8c7a202c02033b2631bb44 ] struct aac_srb_unit contains struct aac_srb, which contains struct sgmap, which ends in a (currently) "fake" (1-element) flexible array. Converting this to a flexible array is needed so that runtime bounds checking won't think the array is fixed size (i.e. under CONFIG_FORTIFY_SOURCE=y and/or CONFIG_UBSAN_BOUNDS=y), as other parts of aacraid use struct sgmap as a flexible array. It is not legal to have a flexible array in the middle of a structure, so it either needs to be split up or rearranged so that it is at the end of the structure. Luckily, struct aac_srb_unit, which is exclusively consumed/updated by aac_send_safw_bmic_cmd(), does not depend on member ordering. The values set in the on-stack struct aac_srb_unit instance "srbu" by the only two callers, aac_issue_safw_bmic_identify() and aac_get_safw_ciss_luns(), do not contain anything in srbu.srb.sgmap.sg, and they both implicitly initialize srbu.srb.sgmap.count to 0 during memset(). For example: memset(&srbu, 0, sizeof(struct aac_srb_unit)); srbcmd = &srbu.srb; srbcmd->flags = cpu_to_le32(SRB_DataIn); srbcmd->cdb[0] = CISS_REPORT_PHYSICAL_LUNS; srbcmd->cdb[1] = 2; /* extended reporting */ srbcmd->cdb[8] = (u8)(datasize >> 8); srbcmd->cdb[9] = (u8)(datasize); rcode = aac_send_safw_bmic_cmd(dev, &srbu, phys_luns, datasize); During aac_send_safw_bmic_cmd(), a separate srb is mapped into DMA, and has srbu.srb copied into it: srb = fib_data(fibptr); memcpy(srb, &srbu->srb, sizeof(struct aac_srb)); Only then is srb.sgmap.count written and srb->sg populated: srb->count = cpu_to_le32(xfer_len); sg64 = (struct sgmap64 *)&srb->sg; sg64->count = cpu_to_le32(1); sg64->sg[0].addr[1] = cpu_to_le32(upper_32_bits(addr)); sg64->sg[0].addr[0] = cpu_to_le32(lower_32_bits(addr)); sg64->sg[0].count = cpu_to_le32(xfer_len); But this is happening in the DMA memory, not in srbu.srb. An attempt to copy the changes back to srbu does happen: /* * Copy the updated data for other dumping or other usage if * needed */ memcpy(&srbu->srb, srb, sizeof(struct aac_srb)); But this was never correct: the sg64 (3 u32s) overlap of srb.sg (2 u32s) always meant that srbu.srb would have held truncated information and any attempt to walk srbu.srb.sg.sg based on the value of srbu.srb.sg.count would result in attempting to parse past the end of srbu.srb.sg.sg[0] into srbu.srb_reply. After getting a reply from hardware, the reply is copied into srbu.srb_reply: srb_reply = (struct aac_srb_reply *)fib_data(fibptr); memcpy(&srbu->srb_reply, srb_reply, sizeof(struct aac_srb_reply)); This has always been fixed-size, so there's no issue here. It is worth noting that the two callers _never check_ srbu contents -- neither srbu.srb nor srbu.srb_reply is examined. (They depend on the mapped xfer_buf instead.) Therefore, the ordering of members in struct aac_srb_unit does not matter, and the flexible array member can moved to the end. (Additionally, the two memcpy()s that update srbu could be entirely removed as they are never consumed, but I left that as-is.) Signed-off-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20240711215739.208776-1-kees@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10scsi: smartpqi: add new controller PCI IDsDavid Strahan1-0/+24
[ Upstream commit dbc39b84540f746cc814e69b21e53e6d3e12329a ] All PCI ID entries in Hex. Add new cisco pci ids: VID / DID / SVID / SDID ---- ---- ---- ---- 9005 028f 1137 02fe 9005 028f 1137 02ff 9005 028f 1137 0300 Add new h3c pci ids: VID / DID / SVID / SDID ---- ---- ---- ---- 9005 028f 193d 0462 9005 028f 193d 8462 Add new ieit pci ids: VID / DID / SVID / SDID ---- ---- ---- ---- 9005 028f 1ff9 00a3 Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: David Strahan <David.Strahan@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240827185501.692804-5-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10scsi: smartpqi: correct stream detectionMahesh Rajashekhara1-1/+1
[ Upstream commit 4c76114932d1d6fad2e72823e7898a3c960cf2a7 ] Correct stream detection by initializing the structure pqi_scsi_dev_raid_map_data to 0s. When the OS issues SCSI READ commands, the driver erroneously considers them as SCSI WRITES. If they are identified as sequential IOs, the driver then submits those requests via the RAID path instead of the AIO path. The 'is_write' flag might be set for SCSI READ commands also. The driver may interpret SCSI READ commands as SCSI WRITE commands, resulting in IOs being submitted through the RAID path. Note: This does not cause data corruption. Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240827185501.692804-3-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10scsi: smartpqi: Add new controller PCI IDsDavid Strahan1-0/+104
[ Upstream commit 0e21e73384d324f75ea16f3d622cfc433fa6209b ] All PCI ID entries in hex. Add new inagile PCI IDs: VID / DID / SVID / SDID ---- ---- ---- ---- SMART-HBA 8242-24i 9005 / 028f / 1ff9 / 0045 RAID 8236-16i 9005 / 028f / 1ff9 / 0046 RAID 8240-24i 9005 / 028f / 1ff9 / 0047 SMART-HBA 8238-16i 9005 / 028f / 1ff9 / 0048 PM8222-SHBA 9005 / 028f / 1ff9 / 004a RAID PM8204-2GB 9005 / 028f / 1ff9 / 004b RAID PM8204-4GB 9005 / 028f / 1ff9 / 004c PM8222-HBA 9005 / 028f / 1ff9 / 004f MT0804M6R 9005 / 028f / 1ff9 / 0051 MT0801M6E 9005 / 028f / 1ff9 / 0052 MT0808M6R 9005 / 028f / 1ff9 / 0053 MT0800M6H 9005 / 028f / 1ff9 / 0054 RS0800M5H24i 9005 / 028f / 1ff9 / 006b RS0800M5E8i 9005 / 028f / 1ff9 / 006c RS0800M5H8i 9005 / 028f / 1ff9 / 006d RS0804M5R16i 9005 / 028f / 1ff9 / 006f RS0800M5E24i 9005 / 028f / 1ff9 / 0070 RS0800M5H16i 9005 / 028f / 1ff9 / 0071 RS0800M5E16i 9005 / 028f / 1ff9 / 0072 RT0800M7E 9005 / 028f / 1ff9 / 0086 RT0800M7H 9005 / 028f / 1ff9 / 0087 RT0804M7R 9005 / 028f / 1ff9 / 0088 RT0808M7R 9005 / 028f / 1ff9 / 0089 RT1608M6R16i 9005 / 028f / 1ff9 / 00a1 Add new h3c pci_id: VID / DID / SVID / SDID ---- ---- ---- ---- UN RAID P4408-Mr-2 9005 / 028f / 193d / 1110 Add new powerleader pci ids: VID / DID / SVID / SDID ---- ---- ---- ---- PL SmartROC PM8204 9005 / 028f / 1f3a / 0104 Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: David Strahan <David.Strahan@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240711194704.982400-2-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10scsi: pm8001: Do not overwrite PCI queue mappingDaniel Wagner1-2/+4
[ Upstream commit a141c17a543332fc1238eb5cba562bfc66879126 ] blk_mq_pci_map_queues() maps all queues but right after this, we overwrite these mappings by calling blk_mq_map_queues(). Just use one helper but not both. Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/r/20240912-do-not-overwrite-pci-mapping-v1-1-85724b6cec49@suse.de Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10scsi: st: Fix input/output error on empty drive resetRafael Rocha1-2/+3
[ Upstream commit 3d882cca73be830549833517ddccb3ac4668c04e ] A previous change was introduced to prevent data loss during a power-on reset when a tape is present inside the drive. This commit set the "pos_unknown" flag to true to avoid operations that could compromise data by performing actions from an untracked position. The relevant change is commit 9604eea5bd3a ("scsi: st: Add third party poweron reset handling") As a consequence of this change, a new issue has surfaced: the driver now returns an "Input/output error" even for empty drives when the drive, host, or bus is reset. This issue stems from the "flush_buffer" function, which first checks whether the "pos_unknown" flag is set. If the flag is set, the user will encounter an "Input/output error" until the tape position is known again. This behavior differs from the previous implementation, where empty drives were not affected at system start up time, allowing tape software to send commands to the driver to retrieve the drive's status and other information. The current behavior prioritizes the "pos_unknown" flag over the "ST_NO_TAPE" status, leading to issues for software that detects drives during system startup. This software will receive an "Input/output error" until a tape is loaded and its position is known. To resolve this, the "ST_NO_TAPE" status should take priority when the drive is empty, allowing communication with the drive following a power-on reset. At the same time, the change should continue to protect data by maintaining the "pos_unknown" flag when the drive contains a tape and its position is unknown. Signed-off-by: Rafael Rocha <rrochavi@fnal.gov> Link: https://lore.kernel.org/r/20240905173921.10944-1-rrochavi@fnal.gov Fixes: 9604eea5bd3a ("scsi: st: Add third party poweron reset handling") Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04scsi: mac_scsi: Disallow bus errors during PDMA sendFinn Thain1-25/+19
commit 5551bc30e4a69ad86d0d008e2f56cd59b6583476 upstream. SD cards can produce write latency spikes on the order of a hundred milliseconds. If the target firmware does not hide that latency during DATA IN and OUT phases it can cause the PDMA circuitry to raise a processor bus fault which in turn leads to an unreliable byte count and a DMA overrun. The Last Byte Sent flag is used to detect the overrun but this mechanism is unreliable on some systems. Instead, set a DID_ERROR result whenever there is a bus fault during a PDMA send, unless the cause was a phase mismatch. Cc: stable@vger.kernel.org # 5.15+ Reported-and-tested-by: Stan Johnson <userm57@yahoo.com> Fixes: 7c1f3e3447a1 ("scsi: mac_scsi: Treat Last Byte Sent time-out as failure") Signed-off-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/cc38df687ace2c4ffc375a683b2502fc476b600d.1723001788.git.fthain@linux-m68k.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04scsi: mac_scsi: Refactor polling loopFinn Thain1-38/+42
commit 5545c3165cbc98615fe65a44f41167cbb557e410 upstream. Before the error handling can be revised, some preparation is needed. Refactor the polling loop with a new function, macscsi_wait_for_drq(). This function will gain more call sites in the next patch. Cc: stable@vger.kernel.org # 5.15+ Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/6a5ffabb4290c0d138c6d285fda8fa3902e926f0.1723001788.git.fthain@linux-m68k.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04scsi: mac_scsi: Revise printk(KERN_DEBUG ...) messagesFinn Thain1-20/+22
commit 5ec4f820cb9766e4583df947150a6febce8da794 upstream. After a bus fault, capture and log the chip registers immediately, if the NDEBUG_PSEUDO_DMA macro is defined. Remove some printk(KERN_DEBUG ...) messages that aren't needed any more. Don't skip the debug message when bytes == 0. Show all of the byte counters in the debug messages. Cc: stable@vger.kernel.org # 5.15+ Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/7573c79f4e488fc00af2b8a191e257ca945e0409.1723001788.git.fthain@linux-m68k.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04scsi: lpfc: Restrict support for 32 byte CDBs to specific HBAsJustin Tee3-4/+22
commit 05ab4e7846f1103377133c00295a9a910cc6dfc2 upstream. An older generation of HBAs are failing FCP discovery due to usage of an outdated field in FCP command WQEs. Fix by checking the SLI Interface Type register for applicable support of 32 Byte CDB commands, and restore a setting for a WQE path using normal 16 byte CDBs. Fixes: af20bb73ac25 ("scsi: lpfc: Add support for 32 byte CDBs") Cc: stable@vger.kernel.org # v6.10+ Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240912232447.45607-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04scsi: sd: Fix off-by-one error in sd_read_block_characteristics()Martin Wilck1-1/+1
commit f81eaf08385ddd474a2f41595a7757502870c0eb upstream. Ff the device returns page 0xb1 with length 8 (happens with qemu v2.x, for example), sd_read_block_characteristics() may attempt an out-of-bounds memory access when accessing the zoned field at offset 8. Fixes: 7fb019c46eee ("scsi: sd: Switch to using scsi_device VPD pages") Cc: stable@vger.kernel.org Signed-off-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20240912134308.282824-1-mwilck@suse.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04scsi: elx: libefc: Fix potential use after free in efc_nport_vport_del()Dan Carpenter1-1/+1
[ Upstream commit 2e4b02fad094976763af08fec2c620f4f8edd9ae ] The kref_put() function will call nport->release if the refcount drops to zero. The nport->release release function is _efc_nport_free() which frees "nport". But then we dereference "nport" on the next line which is a use after free. Re-order these lines to avoid the use after free. Fixes: fcd427303eb9 ("scsi: elx: libefc: SLI and FC PORT state machine interfaces") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/b666ab26-6581-4213-9a3d-32a9147f0399@stanley.mountain Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04scsi: NCR5380: Check for phase match during PDMA fixupFinn Thain1-39/+39
[ Upstream commit 5768718da9417331803fc4bc090544c2a93b88dc ] It's not an error for a target to change the bus phase during a transfer. Unfortunately, the FLAG_DMA_FIXUP workaround does not allow for that -- a phase change produces a DRQ timeout error and the device borken flag will be set. Check the phase match bit during FLAG_DMA_FIXUP processing. Don't forget to decrement the command residual. While we are here, change shost_printk() into scmd_printk() for better consistency with other DMA error messages. Tested-by: Stan Johnson <userm57@yahoo.com> Fixes: 55181be8ced1 ("ncr5380: Replace redundant flags with FLAG_NO_DMA_FIXUP") Signed-off-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/99dc7d1f4c825621b5b120963a69f6cd3e9ca659.1723001788.git.fthain@linux-m68k.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04scsi: sd: Don't check if a write for REQ_ATOMICJohn Garry1-1/+1
[ Upstream commit 0c150b30d3d51a8c2e09fadd004a640fa12985c6 ] Flag REQ_ATOMIC can only be set for writes, so don't check if the operation is also a write in sd_setup_read_write_cmnd(). Fixes: bf4ae8f2e640 ("scsi: sd: Atomic write support") Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240805113315.1048591-2-john.g.garry@oracle.com Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04scsi: smartpqi: revert propagate-the-multipath-failure-to-SML-quicklyGilbert Wu1-18/+2
[ Upstream commit f1393d52e6cda9c20f12643cbecf1e1dc357e0e2 ] Correct a rare multipath failure issue by reverting commit 94a68c814328 ("scsi: smartpqi: Quickly propagate path failures to SCSI midlayer") [1]. Reason for revert: The patch propagated the path failure to SML quickly when one of the path fails during IO and AIO path gets disabled for a multipath device. But it created a new issue: when creating a volume on an encryption-enabled controller, the firmware reports the AIO path is disabled, which cause the driver to report a path failure to SML for a multipath device. There will be a new fix to handle "Illegal request" and "Invalid field in parameter list" on RAID path when the AIO path is disabled on a multipath device. [1] https://lore.kernel.org/all/164375209313.440833.9992416628621839233.stgit@brunhilda.pdev.net/ Fixes: 94a68c814328 ("scsi: smartpqi: Quickly propagate path failures to SCSI midlayer") Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: Gilbert Wu <Gilbert.Wu@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240711194704.982400-4-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-01Merge tag 'scsi-fixes' of ↵Linus Torvalds3-6/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Minor fixes only. The sd.c one ignores a sync cache request if format is in progress which can happen if formatting a drive across suspend/resume" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Ignore command SYNCHRONIZE CACHE error if format in progress scsi: aacraid: Fix double-free on probe failure scsi: lpfc: Fix overflow build issue
2024-08-25Merge tag 'scsi-fixes' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "The important core fix is another tweak to our discard discovery issues. The off by 512 in logical block count seems bad, but in fact the inline was only ever used in debug prints, which is why no-one noticed" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Do not attempt to configure discard unless LBPME is set scsi: MAINTAINERS: Add header files to SCSI SUBSYSTEM scsi: ufs: qcom: Add UFSHCD_QUIRK_BROKEN_LSDBS_CAP for SM8550 SoC scsi: ufs: core: Add a quirk for handling broken LSDBS field in controller capabilities register scsi: core: Fix the return value of scsi_logical_block_count() scsi: MAINTAINERS: Update HiSilicon SAS controller driver maintainer
2024-08-22scsi: sd: Ignore command SYNCHRONIZE CACHE error if format in progressYihang Li1-5/+7
If formatting a suspended disk (such as formatting with different DIF type), the disk will be resuming first, and then the format command will submit to the disk through SG_IO ioctl. When the disk is processing the format command, the system does not submit other commands to the disk. Therefore, the system attempts to suspend the disk again and sends the SYNCHRONIZE CACHE command. However, the SYNCHRONIZE CACHE command will fail because the disk is in the formatting process. This will cause the runtime_status of the disk to error and it is difficult for user to recover it. Error info like: [ 669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache [ 670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK [ 670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current] [ 670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4 To solve the issue, ignore the error and return success/0 when format is in progress. Cc: stable@vger.kernel.org Signed-off-by: Yihang Li <liyihang9@huawei.com> Link: https://lore.kernel.org/r/20240819090934.2130592-1-liyihang9@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-22scsi: aacraid: Fix double-free on probe failureBen Hutchings1-0/+2
aac_probe_one() calls hardware-specific init functions through the aac_driver_ident::init pointer, all of which eventually call down to aac_init_adapter(). If aac_init_adapter() fails after allocating memory for aac_dev::queues, it frees the memory but does not clear that member. After the hardware-specific init function returns an error, aac_probe_one() goes down an error path that frees the memory pointed to by aac_dev::queues, resulting.in a double-free. Reported-by: Michael Gordon <m.gordon.zelenoborsky@gmail.com> Link: https://bugs.debian.org/1075855 Fixes: 8e0c5ebde82b ("[SCSI] aacraid: Newer adapter communication iterface support") Signed-off-by: Ben Hutchings <benh@debian.org> Link: https://lore.kernel.org/r/ZsZvfqlQMveoL5KQ@decadent.org.uk Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-22scsi: lpfc: Fix overflow build issueSherry Yang1-1/+1
Build failed while enabling "CONFIG_GCOV_KERNEL=y" and "CONFIG_GCOV_PROFILE_ALL=y" with following error: BUILDSTDERR: drivers/scsi/lpfc/lpfc_bsg.c: In function 'lpfc_get_cgnbuf_info': BUILDSTDERR: ./include/linux/fortify-string.h:114:33: error: '__builtin_memcpy' accessing 18446744073709551615 bytes at offsets 0 and 0 overlaps 9223372036854775807 bytes at offset -9223372036854775808 [-Werror=restrict] BUILDSTDERR: 114 | #define __underlying_memcpy __builtin_memcpy BUILDSTDERR: | ^ BUILDSTDERR: ./include/linux/fortify-string.h:637:9: note: in expansion of macro '__underlying_memcpy' BUILDSTDERR: 637 | __underlying_##op(p, q, __fortify_size); \ BUILDSTDERR: | ^~~~~~~~~~~~~ BUILDSTDERR: ./include/linux/fortify-string.h:682:26: note: in expansion of macro '__fortify_memcpy_chk' BUILDSTDERR: 682 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ BUILDSTDERR: | ^~~~~~~~~~~~~~~~~~~~ BUILDSTDERR: drivers/scsi/lpfc/lpfc_bsg.c:5468:9: note: in expansion of macro 'memcpy' BUILDSTDERR: 5468 | memcpy(cgn_buff, cp, cinfosz); BUILDSTDERR: | ^~~~~~ This happens from the commit 06bb7fc0feee ("kbuild: turn on -Wrestrict by default"). Address this issue by using size_t type. Signed-off-by: Sherry Yang <sherry.yang@oracle.com> Link: https://lore.kernel.org/r/20240821065131.1180791-1-sherry.yang@oracle.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-17Merge tag 'scsi-fixes' of ↵Linus Torvalds2-3/+9
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes to the mpi3mr driver. One to avoid oversize allocations in tracing and the other to fix an uninitialized spinlock in the user to driver feature request code (used to trigger dumps and the like)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpi3mr: Avoid MAX_PAGE_ORDER WARNING for buffer allocations scsi: mpi3mr: Add missing spin_lock_init() for mrioc->trigger_lock
2024-08-16scsi: sd: Do not attempt to configure discard unless LBPME is setMartin K. Petersen1-0/+3
Commit f874d7210d88 ("scsi: sd: Keep the discard mode stable") attempted to address an issue where one mode of discard operation got configured prior to the device completing full discovery. Unfortunately this change assumed discard was always enabled on the device. Do not attempt to configure discard unless LBPME is enabled. Link: https://lore.kernel.org/r/20240817005325.3319384-1-martin.petersen@oracle.com Fixes: f874d7210d88 ("scsi: sd: Keep the discard mode stable") Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com> Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-12scsi: mpi3mr: Avoid MAX_PAGE_ORDER WARNING for buffer allocationsShin'ichiro Kawasaki1-3/+8
Commit fc4444941140 ("scsi: mpi3mr: HDB allocation and posting for hardware and firmware buffers") added mpi3mr_alloc_diag_bufs() which calls dma_alloc_coherent() to allocate the trace buffer and the firmware buffer. mpi3mr_alloc_diag_bufs() decides the buffer sizes from the driver configuration. In my environment, the sizes are 8MB. With the sizes, dma_alloc_coherent() fails and report this WARNING: WARNING: CPU: 4 PID: 438 at mm/page_alloc.c:4676 __alloc_pages_noprof+0x52f/0x640 The WARNING indicates that the order of the allocation size is larger than MAX_PAGE_ORDER. After this failure, mpi3mr_alloc_diag_bufs() reduces the buffer sizes and retries dma_alloc_coherent(). In the end, the buffer allocations succeed with 4MB size in my environment, which corresponds to MAX_PAGE_ORDER=10. Though the allocations succeed, the WARNING message is misleading and should be avoided. To avoid the WARNING, check the orders of the buffer allocation sizes before calling dma_alloc_coherent(). If the orders are larger than MAX_PAGE_ORDER, fall back to the retry path. Fixes: fc4444941140 ("scsi: mpi3mr: HDB allocation and posting for hardware and firmware buffers") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20240810042701.661841-3-shinichiro.kawasaki@wdc.com Acked-by: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-12scsi: mpi3mr: Add missing spin_lock_init() for mrioc->trigger_lockShin'ichiro Kawasaki1-0/+1
Commit fc4444941140 ("scsi: mpi3mr: HDB allocation and posting for hardware and firmware buffers") added the spinlock trigger_lock to the struct mpi3mr_ioc. However, spin_lock_init() call was not added for it, then the lock does not work as expected. Also, the kernel reports the message below when lockdep is enabled. INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? To fix the issue and to avoid the INFO message, add the missing spin_lock_init() call. Fixes: fc4444941140 ("scsi: mpi3mr: HDB allocation and posting for hardware and firmware buffers") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20240810042701.661841-2-shinichiro.kawasaki@wdc.com Acked-by: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-11Merge tag 'scsi-fixes' of ↵Linus Torvalds1-5/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two core fixes: one to prevent discard type changes (seen on iSCSI) during intermittent errors and the other is fixing a lockdep problem caused by the queue limits change. And one driver fix in ufs" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Keep the discard mode stable scsi: sd: Move sd_read_cpr() out of the q->limits_lock region scsi: ufs: core: Fix hba->last_dme_cmd_tstamp timestamp updating logic
2024-08-03Merge tag 'scsi-fixes' of ↵Linus Torvalds3-4/+32
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "One core change that reverts the double message print patch in sd.c (it was causing regressions on embedded systems). The rest are driver fixes in ufs, mpt3sas and mpi3mr" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: exynos: Don't resume FMP when crypto support is disabled scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES scsi: mpi3mr: Avoid IOMMU page faults on REPORT ZONES scsi: ufs: core: Do not set link to OFF state while waking up from hibernation scsi: Revert "scsi: sd: Do not repeat the starting disk message" scsi: ufs: core: Fix deadlock during RTC update scsi: ufs: core: Bypass quick recovery if force reset is needed scsi: ufs: core: Check LSDBS cap when !mcq
2024-08-02scsi: sd: Keep the discard mode stableLi Feng1-4/+2
There is a scenario where a large number of discard commands are issued when the iscsi initiator connects to the target and then performs a session rescan operation. There is a time window, most of the commands are in UNMAP mode, and some discard commands become WRITE SAME with UNMAP. The discard mode has been negotiated during the SCSI probe. If the mode is temporarily changed from UNMAP to WRITE SAME with UNMAP, an I/O ERROR may occur because the target may not implement WRITE SAME with UNMAP. Keep the discard mode stable to fix this issue. Signed-off-by: Li Feng <fengli@smartx.com> Link: https://lore.kernel.org/r/20240718080751.313102-2-fengli@smartx.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-01scsi: sd: Move sd_read_cpr() out of the q->limits_lock regionShin'ichiro Kawasaki1-1/+8
Commit 804e498e0496 ("sd: convert to the atomic queue limits API") introduced pairs of function calls to queue_limits_start_update() and queue_limits_commit_update(). These two functions lock and unlock q->limits_lock. In sd_revalidate_disk(), sd_read_cpr() is called after queue_limits_start_update() call and before queue_limits_commit_update() call. sd_read_cpr() locks q->sysfs_dir_lock and &q->sysfs_lock. Then new lock dependencies were created between q->limits_lock, q->sysfs_dir_lock and q->sysfs_lock, as follows: sd_revalidate_disk queue_limits_start_update mutex_lock(&q->limits_lock) sd_read_cpr disk_set_independent_access_ranges mutex_lock(&q->sysfs_dir_lock) mutex_lock(&q->sysfs_lock) mutex_unlock(&q->sysfs_lock) mutex_unlock(&q->sysfs_dir_lock) queue_limits_commit_update mutex_unlock(&q->limits_lock) However, the three locks already had reversed dependencies in other places. Then the new dependencies triggered the lockdep WARN "possible circular locking dependency detected" [1]. This WARN was observed by running the blktests test case srp/002. To avoid the WARN, move the sd_read_cpr() call in sd_revalidate_disk() after the queue_limits_commit_update() call. In other words, move the sd_read_cpr() call out of the q->limits_lock region. [1] https://lore.kernel.org/linux-scsi/vlmv53ni3ltwxplig5qnw4xsl2h6ccxijfbqzekx76vxoim5a5@dekv7q3es3tx/ Fixes: 804e498e0496 ("sd: convert to the atomic queue limi