Age | Commit message (Collapse) | Author | Files | Lines |
|
Use the right helper to avoid poking around in the list's internals.
Link: https://lore.kernel.org/r/ed669555c73aab95b29444c10066f492c0c43391.1599765652.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pull SCSI updates from James Bottomley:
"This consists of the usual driver updates (ufs, qla2xxx, tcmu, lpfc,
hpsa, zfcp, scsi_debug) and minor bug fixes.
We also have a huge docbook fix update like most other subsystems and
no major update to the core (the few non trivial updates are either
minor fixes or removing an unused feature [scsi_sdb_cache])"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (307 commits)
scsi: scsi_transport_srp: Sanitize scsi_target_block/unblock sequences
scsi: ufs-mediatek: Apply DELAY_AFTER_LPM quirk to Micron devices
scsi: ufs: Introduce device quirk "DELAY_AFTER_LPM"
scsi: virtio-scsi: Correctly handle the case where all LUNs are unplugged
scsi: scsi_debug: Implement tur_ms_to_ready parameter
scsi: scsi_debug: Fix request sense
scsi: lpfc: Fix typo in comment for ULP
scsi: ufs-mediatek: Prevent LPM operation on undeclared VCC
scsi: iscsi: Do not put host in iscsi_set_flashnode_param()
scsi: hpsa: Correct ctrl queue depth
scsi: target: tcmu: Make TMR notification optional
scsi: target: tcmu: Implement tmr_notify callback
scsi: target: tcmu: Fix and simplify timeout handling
scsi: target: tcmu: Factor out new helper ring_insert_padding
scsi: target: tcmu: Do not queue aborted commands
scsi: target: tcmu: Use priv pointer in se_cmd
scsi: target: Add tmr_notify backend function
scsi: target: Modify core_tmr_abort_task()
scsi: target: iscsi: Fix inconsistent debug message
scsi: target: iscsi: Fix login error when receiving
...
|
|
We already maintain a pointer to act->adapter. Use it consistently to avoid
any confusion about whose ->erp_ready_head and ->erp_ready_wq we are
accessing.
Link: https://lore.kernel.org/r/d1bb04322f240dee32f4c4a551bc93bc736f4b01.1593780621.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Suppose that, for unrelated reasons, FSF requests on behalf of recovery are
very slow and can run into the ERP timeout.
In the case at hand, we did adapter recovery to a large degree. However
due to the slowness a LUN open is pending so the corresponding fc_rport
remains blocked. After fast_io_fail_tmo we trigger close physical port
recovery for the port under which the LUN should have been opened. The new
higher order port recovery dismisses the pending LUN open ERP action and
dismisses the pending LUN open FSF request. Such dismissal decouples the
ERP action from the pending corresponding FSF request by setting
zfcp_fsf_req->erp_action to NULL (among other things)
[zfcp_erp_strategy_check_fsfreq()].
If now the ERP timeout for the pending open LUN request runs out, we must
not use zfcp_fsf_req->erp_action in the ERP timeout handler. This is a
problem since v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use
timer_setup()"). Before that we intentionally only passed zfcp_erp_action
as context argument to zfcp_erp_timeout_handler().
Note: The lifetime of the corresponding zfcp_fsf_req object continues until
a (late) response or an (unrelated) adapter recovery.
Just like the regular response path ignores dismissed requests
[zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early] the
ERP timeout handler now needs to ignore dismissed requests. So simply
return early in the ERP timeout handler if the FSF request is marked as
dismissed in its status flags. To protect against the race where
zfcp_erp_strategy_check_fsfreq() dismisses and sets
zfcp_fsf_req->erp_action to NULL after our previous status flag check,
return early if zfcp_fsf_req->erp_action is NULL. After all, the former
ERP action does not need to be woken up as that was already done as part of
the dismissal above [zfcp_erp_action_dismiss()].
This fixes the following panic due to kernel page fault in IRQ context:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:000009859238c00b R2:00000e3e7ffd000b R3:00000e3e7ffcc007 S:00000e3e7ffd7000 P:000000000000013d
Oops: 0004 ilc:2 [#1] SMP
Modules linked in: ...
CPU: 82 PID: 311273 Comm: stress Kdump: loaded Tainted: G E X ...
Hardware name: IBM 8561 T01 701 (LPAR)
Krnl PSW : 0404c00180000000 001fffff80549be0 (zfcp_erp_notify+0x40/0xc0 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000080 00000e3d00000000 00000000000000f0 0000000000030000
000000010028e700 000000000400a39c 000000010028e700 00000e3e7cf87e02
0000000010000000 0700098591cb67f0 0000000000000000 0000000000000000
0000033840e9a000 0000000000000000 001fffe008d6bc18 001fffe008d6bbc8
Krnl Code: 001fffff80549bd4: a7180000 lhi %r1,0
001fffff80549bd8: 4120a0f0 la %r2,240(%r10)
#001fffff80549bdc: a53e0003 llilh %r3,3
>001fffff80549be0: ba132000 cs %r1,%r3,0(%r2)
001fffff80549be4: a7740037 brc 7,1fffff80549c52
001fffff80549be8: e320b0180004 lg %r2,24(%r11)
001fffff80549bee: e31020e00004 lg %r1,224(%r2)
001fffff80549bf4: 412020e0 la %r2,224(%r2)
Call Trace:
[<001fffff80549be0>] zfcp_erp_notify+0x40/0xc0 [zfcp]
[<00000985915e26f0>] call_timer_fn+0x38/0x190
[<00000985915e2944>] expire_timers+0xfc/0x190
[<00000985915e2ac4>] run_timer_softirq+0xec/0x218
[<0000098591ca7c4c>] __do_softirq+0x144/0x398
[<00000985915110aa>] do_softirq_own_stack+0x72/0x88
[<0000098591551b58>] irq_exit+0xb0/0xb8
[<0000098591510c6a>] do_IRQ+0x82/0xb0
[<0000098591ca7140>] ext_int_handler+0x128/0x12c
[<0000098591722d98>] clear_subpage.constprop.13+0x38/0x60
([<000009859172ae4c>] clear_huge_page+0xec/0x250)
[<000009859177e7a2>] do_huge_pmd_anonymous_page+0x32a/0x768
[<000009859172a712>] __handle_mm_fault+0x88a/0x900
[<000009859172a860>] handle_mm_fault+0xd8/0x1b0
[<0000098591529ef6>] do_dat_exception+0x136/0x3e8
[<0000098591ca6d34>] pgm_check_handler+0x1c8/0x220
Last Breaking-Event-Address:
[<001fffff80549c88>] zfcp_erp_timeout_handler+0x10/0x18 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
Link: https://lore.kernel.org/r/20200623140242.98864-1-maier@linux.ibm.com
Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()")
Cc: <stable@vger.kernel.org> #4.15+
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
At the moment we allocate and register the Scsi_Host object corresponding
to a zfcp adapter (FCP device) very early in the life cycle of the adapter
- even before we fully discover and initialize the underlying
firmware/hardware. This had the advantage that we could already use the
Scsi_Host object, and fill in all its information during said discover and
initialize.
Due to commit 737eb78e82d5 ("block: Delay default elevator initialization")
(first released in v5.4), we noticed a regression that would prevent us
from using any storage volume if zfcp is configured with support for DIF or
DIX (zfcp.dif=1 || zfcp.dix=1). Doing so would result in an illegal memory
access as soon as the first request is sent with such an configuration. As
example for a crash resulting from this:
scsi host0: scsi_eh_0: sleeping
scsi host0: zfcp
qdio: 0.0.1900 ZFCP on SC 4bd using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP
scsi 0:0:0:0: scsi scan: INQUIRY pass 1 length 36
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:0000000035c7c007 R3:00000001effcc007 S:00000001effd1000 P:000000000000003d
Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: ...
CPU: 1 PID: 783 Comm: kworker/u760:5 Kdump: loaded Not tainted 5.6.0-rc2-bb-next+ #1
Hardware name: ...
Workqueue: scsi_wq_0 fc_scsi_scan_rport [scsi_transport_fc]
Krnl PSW : 0704e00180000000 000003ff801fcdae (scsi_queue_rq+0x436/0x740 [scsi_mod])
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0fffffffffffffff 0000000000000000 0000000187150120 0000000000000000
000003ff80223d20 000000000000018e 000000018adc6400 0000000187711000
000003e0062337e8 00000001ae719000 0000000187711000 0000000187150000
00000001ab808100 0000000187150120 000003ff801fcd74 000003e0062336a0
Krnl Code: 000003ff801fcd9e: e310a35c0012 lt %r1,860(%r10)
000003ff801fcda4: a7840010 brc 8,000003ff801fcdc4
#000003ff801fcda8: e310b2900004 lg %r1,656(%r11)
>000003ff801fcdae: d71710001000 xc 0(24,%r1),0(%r1)
000003ff801fcdb4: e310b2900004 lg %r1,656(%r11)
000003ff801fcdba: 41201018 la %r2,24(%r1)
000003ff801fcdbe: e32010000024 stg %r2,0(%r1)
000003ff801fcdc4: b904002b lgr %r2,%r11
Call Trace:
[<000003ff801fcdae>] scsi_queue_rq+0x436/0x740 [scsi_mod]
([<000003ff801fcd74>] scsi_queue_rq+0x3fc/0x740 [scsi_mod])
[<00000000349c9970>] blk_mq_dispatch_rq_list+0x390/0x680
[<00000000349d1596>] blk_mq_sched_dispatch_requests+0x196/0x1a8
[<00000000349c7a04>] __blk_mq_run_hw_queue+0x144/0x160
[<00000000349c7ab6>] __blk_mq_delay_run_hw_queue+0x96/0x228
[<00000000349c7d5a>] blk_mq_run_hw_queue+0xd2/0xe0
[<00000000349d194a>] blk_mq_sched_insert_request+0x192/0x1d8
[<00000000349c17b8>] blk_execute_rq_nowait+0x80/0x90
[<00000000349c1856>] blk_execute_rq+0x6e/0xb0
[<000003ff801f8ac2>] __scsi_execute+0xe2/0x1f0 [scsi_mod]
[<000003ff801fef98>] scsi_probe_and_add_lun+0x358/0x840 [scsi_mod]
[<000003ff8020001c>] __scsi_scan_target+0xc4/0x228 [scsi_mod]
[<000003ff80200254>] scsi_scan_target+0xd4/0x100 [scsi_mod]
[<000003ff802d8b96>] fc_scsi_scan_rport+0x96/0xc0 [scsi_transport_fc]
[<0000000034245ce8>] process_one_work+0x458/0x7d0
[<00000000342462a2>] worker_thread+0x242/0x448
[<0000000034250994>] kthread+0x15c/0x170
[<0000000034e1979c>] ret_from_fork+0x30/0x38
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<000003ff801fbc36>] scsi_add_cmd_to_list+0x9e/0xa8 [scsi_mod]
Kernel panic - not syncing: Fatal exception: panic_on_oops
While this issue is exposed by the commit named above, this is only by
accident. The real issue exists for longer already - basically since it's
possible to use blk-mq via scsi-mq, and blk-mq pre-allocates all requests
for a tag-set during initialization of the same. For a given Scsi_Host
object this is done when adding the object to the midlayer
(`scsi_add_host()` and such). In `scsi_mq_setup_tags()` the midlayer
calculates how much memory is required for a single scsi_cmnd, and its
additional data, which also might include space for additional protection
data - depending on whether the Scsi_Host has any form of protection
capabilities (`scsi_host_get_prot()`).
The problem is now thus, because zfcp does this step before we actually
know whether the firmware/hardware has these capabilities, we don't set any
protection capabilities in the Scsi_Host object. And so, no space is
allocated for additional protection data for requests in the Scsi_Host
tag-set.
Once we go through discover and initialize the FCP device firmware/hardware
fully (this is done via the firmware commands "Exchange Config Data" and
"Exchange Port Data") we find out whether it actually supports DIF and DIX,
and we set the corresponding capabilities in the Scsi_Host object (in
`zfcp_scsi_set_prot()`). Now the Scsi_Host potentially has protection
capabilities, but the already allocated requests in the tag-set don't have
any space allocated for that.
When we then trigger target scanning or add scsi_devices manually, the
midlayer will use requests from that tag-set, and before sending most
requests, it will also call `scsi_mq_prep_fn()`. To prepare the scsi_cmnd
this function will check again whether the used Scsi_Host has any
protection capabilities - and now it potentially has - and if so, it will
try to initialize the assumed to be preallocated structures and thus it
causes the crash, like shown above.
Before delaying the default elevator initialization with the commit named
above, we always would also allocate an elevator for any scsi_device before
ever sending any requests - in contrast to now, where we do it after
device-probing. That elevator in turn would have its own tag-set, and that
is initialized after we went through discovery and initialization of the
underlying firmware/hardware. So requests from that tag-set can be
allocated properly, and if used - unless the user changes/disabled the
default elevator - this would hide the underlying issue.
To fix this for any configuration - with or without an elevator - we move
the allocation and registration of the Scsi_Host object for a given FCP
device to after the first complete discovery and initialization of the
underlying firmware/hardware. By doing that we can make all basic
properties of the Scsi_Host known to the midlayer by the time we call
`scsi_add_host()`, including whether we have any protection capabilities.
To do that we have to delay all the accesses that we would have done in the
past during discovery and initialization, and do them instead once we are
finished with it. The previous patches ramp up to this by fencing and
factoring out all these accesses, and make it possible to re-do them later
on. In addition we make also use of the diagnostic buffers we recently
added with
commit 92953c6e0aa7 ("scsi: zfcp: signal incomplete or error for sync exchange config/port data")
commit 7e418833e689 ("scsi: zfcp: diagnostics buffer caching and use for exchange port data")
commit 088210233e6f ("scsi: zfcp: add diagnostics buffer for exchange config data")
(first released in v5.5), because these already cache all the information
we need for that "re-do operation" - the information cached are always
updated during xconf or xport data, so it won't be stale.
In addition to the move and re-do, this patch also updates the
function-documentation of `zfcp_scsi_adapter_register()` and changes how it
reports if a Scsi_Host object already exists. In that case future
recovery-operations can skip this step completely and behave much like they
would do in the past - zfcp does not release a once allocated Scsi_Host
object unless the corresponding FCP device is deconstructed completely.
Link: https://lore.kernel.org/r/030dd6da318bbb529f0b5268ec65cebcd20fc0a3.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Common status flags that all main objects - adapter, port, and unit -
support are propagated to sub-objects when set or cleared. For instance,
when setting the status ZFCP_STATUS_COMMON_ERP_INUSE for an adapter object,
we will propagate this to all its child ports and units - same for when
clearing a common status flag.
Units of an adapter object are enumerated via __shost_for_each_device()
over the scsi host object of the corresponding adapter.
Once we move the scsi host object allocation and registration to after the
first exchange config and exchange port data, this won't be possible for
cases where we set or clear common statuses during the very first adapter
recovery.
But since we won't have any port or unit objects yet at that point of time,
we can just fence the status propagation for cases where the scsi host
object is not yet set in the adapter object. It won't change any effective
status propagations, but will prevent us from dereferencing invalid
pointers.
For any later point in the work flow the scsi host object will be set and
thus nothing is changed then.
Link: https://lore.kernel.org/r/f51fe5f236a1e3d1ce53379c308777561bfe35e1.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When doing the very first adapter recovery - initialization - for a FCP
device in a point-to-point topology we also allocate the port object
corresponding to the attached remote port, and trigger a port recovery for
it that will run after the adapter recovery finished.
Right now this happens right after we finished with the exchange config
data command, and uses the fibre channel host object corresponding to the
FCP device to determine whether a point-to-point topology is used.
When moving the scsi host object allocation and registration - and thus
also the fibre channel host object allocation - to after the first exchange
config and exchange port data, this use of the fc_host object is not
possible anymore at that point in the work flow.
But the allocation and recovery trigger doesn't have notable side-effects
on the following exchange port data processing, so we can move those to
after xport data, and thus also to after the scsi host object allocation,
once we move it. Then the fc_host object can be used again, like it is now.
For any further adapter recoveries this doesn't change anything, because at
that point the port object already exists and recovery is triggered
elsewhere for existing port objects.
Link: https://lore.kernel.org/r/73e5d4ac21e2b37bf0c3ca8e530bc5a5c6e74f8f.1588956679.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Convert the various uses of fallthrough comments to fallthrough;
Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
[bblock@linux.ibm.com: resolved merge conflict with recently upstream-sent patch "zfcp: expose fabric name as common fc_host sysfs attribute"]
Link: https://lore.kernel.org/r/d14669a67a17392490d3184117941123765db1a4.1585663010.git.bblock@linux.ibm.com
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote
ports") introduced zfcp automatic port scan.
Before that, the user had to use the sysfs attribute "port_add" of an FCP
device (adapter) to add and open remote (target) ports, even for the remote
peer port in point-to-point topology. That code path did a proper port open
recovery trigger taking the erp_lock.
Since above commit, a new helper function zfcp_erp_open_ptp_port()
performed an UNlocked port open recovery trigger. This can race with other
parallel recovery triggers. In zfcp_erp_action_enqueue() this could corrupt
e.g. adapter->erp_total_count or adapter->erp_ready_head.
As already found for fabric topology in v4.17 commit fa89adba1941 ("scsi:
zfcp: fix infinite iteration on ERP ready list"), there was an endless loop
during tracing of rport (un)block. A subsequent v4.18 commit 9e156c54ace3
("scsi: zfcp: assert that the ERP lock is held when tracing a recovery
trigger") introduced a lockdep assertion for that case.
As a side effect, that lockdep assertion now uncovered the unlocked code
path for PtP. It is from within an adapter ERP action:
zfcp_erp_strategy[1479] intentionally DROPs erp lock around
zfcp_erp_strategy_do_action()
zfcp_erp_strategy_do_action[1441] NO erp lock
zfcp_erp_adapter_strategy[876] NO erp lock
zfcp_erp_adapter_strategy_open[855] NO erp lock
zfcp_erp_adapter_strategy_open_fsf[806]NO erp lock
zfcp_erp_adapter_strat_fsf_xconf[772] erp lock only around
zfcp_erp_action_to_running(),
BUT *_not_* around
zfcp_erp_enqueue_ptp_port()
zfcp_erp_enqueue_ptp_port[728] BUG: *_not_* taking erp lock
_zfcp_erp_port_reopen[432] assumes to be called with erp lock
zfcp_erp_action_enqueue[314] assumes to be called with erp lock
zfcp_dbf_rec_trig[288] _checks_ to be called with erp lock:
lockdep_assert_held(&adapter->erp_lock);
It causes the following lockdep warning:
WARNING: CPU: 2 PID: 775 at drivers/s390/scsi/zfcp_dbf.c:288
zfcp_dbf_rec_trig+0x16a/0x188
no locks held by zfcperp0.0.17c0/775.
Fix this by using the proper locked recovery trigger helper function.
Link: https://lore.kernel.org/r/20200312174505.51294-2-maier@linux.ibm.com
Fixes: cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports")
Cc: <stable@vger.kernel.org> #v2.6.27+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
No functional change.
The unary not operator only applies to the sub expression before the
logical or. So we return early if (not running) or failed.
Link: https://lore.kernel.org/r/df4f897f6e83eaa528465d0858d5a22daac47a2f.1572018132.git.bblock@linux.ibm.com
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
GCC v9 emits this warning:
CC drivers/s390/scsi/zfcp_erp.o
drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_action_enqueue':
drivers/s390/scsi/zfcp_erp.c:217:26: warning: 'erp_action' may be used uninitialized in this function [-Wmaybe-uninitialized]
217 | struct zfcp_erp_action *erp_action;
| ^~~~~~~~~~
This is a possible false positive case, as also documented in the GCC
documentations:
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized
The actual code-sequence is like this:
Various callers can invoke the function below with the argument "want"
being one of:
ZFCP_ERP_ACTION_REOPEN_ADAPTER,
ZFCP_ERP_ACTION_REOPEN_PORT_FORCED,
ZFCP_ERP_ACTION_REOPEN_PORT, or
ZFCP_ERP_ACTION_REOPEN_LUN.
zfcp_erp_action_enqueue(want, ...)
...
need = zfcp_erp_required_act(want, ...)
need = want
...
maybe: need = ZFCP_ERP_ACTION_REOPEN_PORT
maybe: need = ZFCP_ERP_ACTION_REOPEN_ADAPTER
...
return need
...
zfcp_erp_setup_act(need, ...)
struct zfcp_erp_action *erp_action; // <== line 217
...
switch(need) {
case ZFCP_ERP_ACTION_REOPEN_LUN:
...
erp_action = &zfcp_sdev->erp_action;
WARN_ON_ONCE(erp_action->port != port); // <== access
...
break;
case ZFCP_ERP_ACTION_REOPEN_PORT:
case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
...
erp_action = &port->erp_action;
WARN_ON_ONCE(erp_action->port != port); // <== access
...
break;
case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
...
erp_action = &adapter->erp_action;
WARN_ON_ONCE(erp_action->port != NULL); // <== access
...
break;
}
...
WARN_ON_ONCE(erp_action->adapter != adapter); // <== access
When zfcp_erp_setup_act() is called, 'need' will never be anything else
than one of the 4 possible enumeration-names that are used in the
switch-case, and 'erp_action' is initialized for every one of them, before
it is used. Thus the warning is a false positive, as documented.
We introduce the extra if{} in the beginning to create an extra code-flow,
so the compiler can be convinced that the switch-case will never see any
other value.
BUG_ON()/BUG() is intentionally not used to not crash anything, should
this ever happen anyway - right now it's impossible, as argued above; and
it doesn't introduce a 'default:' switch-case to retain warnings should
'enum zfcp_erp_act_type' ever be extended and no explicit case be
introduced. See also v5.0 commit 399b6c8bc9f7 ("scsi: zfcp: drop old
default switch case which might paper over missing case").
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Suppose more than one non-NPIV FCP device is active on the same channel.
Send I/O to storage and have some of the pending I/O run into a SCSI
command timeout, e.g. due to bit errors on the fibre. Now the error
situation stops. However, we saw FCP requests continue to timeout in the
channel. The abort will be successful, but the subsequent TUR fails.
Scsi_eh starts. The LUN reset fails. The target reset fails. The host
reset only did an FCP device recovery. However, for non-NPIV FCP devices,
this does not close and reopen ports on the SAN-side if other non-NPIV FCP
device(s) share the same open ports.
In order to resolve the continuing FCP request timeouts, we need to
explicitly close and reopen ports on the SAN-side.
This was missing since the beginning of zfcp in v2.6.0 history commit
ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.").
Note: The FSF requests for forced port reopen could run into FSF request
timeouts due to other reasons. This would trigger an internal FCP device
recovery. Pending forced port reopen recoveries would get dismissed. So
some ports might not get fully reopened during this host reset handler.
However, subsequent I/O would trigger the above described escalation and
eventually all ports would be forced reopen to resolve any continuing FCP
request timeouts due to earlier bit errors.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org> #3.0+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
An already deleted SCSI device can exist on the Scsi_Host and remain there
because something still holds a reference. A new SCSI device with the same
H:C:T:L and FCP device, target port WWPN, and FCP LUN can be created. When
we try to unblock an rport, we still find the deleted SCSI device and
return early because the zfcp_scsi_dev of that SCSI device is not
ZFCP_STATUS_COMMON_UNBLOCKED. Hence we miss to unblock the rport, even if
the new proper SCSI device would be in good state.
Therefore, skip deleted SCSI devices when iterating the sdevs of the shost.
[cf. __scsi_device_lookup{_by_target}() or scsi_device_get()]
The following abbreviated trace sequence can indicate such problem:
Area : REC
Tag : ersfs_3
LUN : 0x4045400300000000
WWPN : 0x50050763031bd327
LUN status : 0x40000000 not ZFCP_STATUS_COMMON_UNBLOCKED
Ready count : n not incremented yet
Running count : 0x00000000
ERP want : 0x01
ERP need : 0xc1 ZFCP_ERP_ACTION_NONE
Area : REC
Tag : ersfs_3
LUN : 0x4045400300000000
WWPN : 0x50050763031bd327
LUN status : 0x41000000
Ready count : n+1
Running count : 0x00000000
ERP want : 0x01
ERP need : 0x01
...
Area : REC
Level : 4 only with increased trace level
Tag : ertru_l
LUN : 0x4045400300000000
WWPN : 0x50050763031bd327
LUN status : 0x40000000
Request ID : 0x0000000000000000
ERP status : 0x01800000
ERP step : 0x1000
ERP action : 0x01
ERP count : 0x00
NOT followed by a trace record with tag "scpaddy"
for WWPN 0x50050763031bd327.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery")
Cc: <stable@vger.kernel.org> #2.6.32+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This was introduced with v2.6.27 commit 287ac01acf22 ("[SCSI] zfcp: Cleanup
code in zfcp_erp.c") but would now suppress helpful -Wswitch compiler
warnings when building with W=1 such as the following forced example:
drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_setup_act':
drivers/s390/scsi/zfcp_erp.c:220:2: warning: enumeration value 'ZFCP_ERP_ACTION_REOPEN_PORT' not handled in switch [-Wswitch]
switch (need) {
^~~~~~
But then again, only with W=1 we would notice unhandled enum cases.
Without the default cases and a missed unhandled enum case, the code might
perform unforeseen things we might not want...
As of today, we never run through the removed default case, so removing it
is no functional change. In the future, we never should run through a
default case but introduce the necessary specific case(s) to handle new
functionality.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This was introduced with v4.18 commit 8c3d20aada70 ("scsi: zfcp: fix
missing REC trigger trace for all objects in ERP_FAILED") but would now
suppress helpful -Wswitch compiler warnings when building with W=1 such as
the following forced example:
drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_handle_failed':
drivers/s390/scsi/zfcp_erp.c:126:2: warning: enumeration value 'ZFCP_ERP_ACTION_REOPEN_PORT_FORCED' not handled in switch [-Wswitch]
switch (want) {
^~~~~~
But then again, only with W=1 we would notice unhandled enum cases.
Without the default cases and a missed unhandled enum case, the code might
perform unforeseen things we might not want...
As of today, we never run through the removed default case, so removing it
is no functional change. In the future, we never should run through a
default case but introduce the necessary specific case(s) to handle new
functionality.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
For some reason the already existing substring "fall through" in the
comment is not sufficient for GCC to silence -Wimplicit-fallthrough.
CC [M] drivers/s390/scsi/zfcp_erp.o
drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_lun_strategy':
drivers/s390/scsi/zfcp_erp.c:1065:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_OPEN)
^
drivers/s390/scsi/zfcp_erp.c:1068:2: note: here
case ZFCP_ERP_STEP_LUN_CLOSING:
^~~~
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
While at it also improve some copy & paste kdoc mistakes.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
functions
With that instead of just "int" it becomes clear which functions return
this type and which ones also accept it as argument they just pass through
in some cases or modify in other cases. v2.6.27 commit 287ac01acf22
("[SCSI] zfcp: Cleanup code in zfcp_erp.c") introduced the enum which was
cpp defines previously.
Silence some false -Wswitch compiler warning cases with individual
NOP cases. When adding more enum values and building with W=1 we
would get compiler warnings about missed new cases.
Consistently use the variable name "result", so change "retval" in
zfcp_erp_strategy() to "result". This avoids confusion with other compile
unit variables "retval" having different semantics and type.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Use the already defined enum for this purpose to get at least some build
checking (even though an enum is type equivalent to an int in C). v2.6.27
commit 287ac01acf22 ("[SCSI] zfcp: Cleanup code in zfcp_erp.c") introduced
the enum which was cpp defines previously.
Since struct zfcp_erp_action type is embedded into other structures living
in zfcp_def.h, we have to move enum zfcp_erp_act_type from its private
definition in zfcp_erp.c to the zfcp-global zfcp_def.h
Silence some false -Wswitch compiler warning cases with individual NOP
cases. When adding more enum values and building with W=1 we would get
compiler warnings about missed new cases.
Add missing break statements in some of the above switch cases. No
functional change, but making it future-proof. I think all of these should
have had a break statement ever since, even if these switch cases happened
to be the last ones in the switch statement body.
"Fall through" in the context of switch case usually means not to have a
break and fall through to the subsequent switch case. However, I think this
old comment meant that here we do not have an _early return_ in the switch
case but the code path continues after the switch case body.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
&zfcp_erp_action.action ==> &zfcp_erp_action.type
While at it, make use of the already defined enum for this purpose to get
at least some build checking (even though an enum is type equivalent to an
int in C). v2.6.27 commit 287ac01acf22 ("[SCSI] zfcp: Cleanup code in
zfcp_erp.c") introduced the enum which was cpp defines previously.
To prevent compiler warnings with the switch(act->type), we have to
separate the recently added eyecatchers from enum zfcp_erp_act_type.
Since struct zfcp_erp_action type is embedded into other structures living
in zfcp_def.h, we have to move enum zfcp_erp_act_type from its private
definition in zfcp_erp.c to the zfcp-global zfcp_def.h.
Silence one false -Wswitch compiler warning case: LUNs as the leaves in our
object tree do not have any follow-up success recovery.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
v2.6.30 commit 5ffd51a5e495 ("[SCSI] zfcp: replace current ERP logging with
a more convenient version") changed trace record distinguishing from a
numerical ID to a 7 character string called "trace tag". While starting to
use function arguments with different type and semantics, it did not change
the argument name accordingly.
v2.6.38 commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing
for recovery actions.") renamed variable names "id" into "tag" but only
within zfcp_dbf.*, not within zfcp_erp.c.
This was a bit confusing since the remainder of zfcp does use the term
"trace tag". Also "id" is quite generic and it's not obvious for what.
Just unify it consistently and use the "dbf" prefix to relate the arguments
to the code in zfcp_dbf.*.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
zfcp_erp_thread_setup() update complements v2.6.32 commit 347c6a965dc1
("[SCSI] zfcp: Use kthread API for zfcp erp thread").
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Make use of feature introduced with v3.2 commit 294436914454 ("[SCSI] scsi:
Added support for adapter and firmware reset"). The common code interface
was introduced for commit 95d31262b3c1 ("[SCSI] qla4xxx: Added support for
adapter and firmware reset").
$ echo adapter > /sys/class/scsi_host/host<N>/host_reset
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : scshr_y SCSI sysfs host_reset yes
LUN : 0xffffffffffffffff none (invalid)
WWPN : 0x0000000000000000 none (invalid)
D_ID : 0x00000000 none (invalid)
Adapter status : 0x4500050b
Port status : 0x00000000 none (invalid)
LUN status : 0x00000000 none (invalid)
Ready count : 0x00000001
Running count : 0x00000000
ERP want : 0x04 ZFCP_ERP_ACTION_REOPEN_ADAPTER
ERP need : 0x04 ZFCP_ERP_ACTION_REOPEN_ADAPTER
This is the common code equivalent to the zfcp-specific
&dev_attr_adapter_failed.attr in zfcp_sysfs_adapter_attrs.attrs[]:
$ echo 0 > /sys/bus/ccw/drivers/zfcp/<devbusid>/failed
The unsupported case returns EOPNOTSUPP:
$ echo firmware > /sys/class/scsi_host/host<N>/host_reset
-bash: echo: write error: Operation not supported
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : scshr_n SCSI sysfs host_reset no
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0xffffffff none (invalid)
SCSI LUN : 0xffffffff none (invalid)
SCSI LUN high : 0xffffffff none (invalid)
SCSI result : 0xffffffa1 -EOPNOTSUPP==-95
SCSI retries : 0xff none (invalid)
SCSI allowed : 0xff none (invalid)
SCSI scribble : 0xffffffffffffffff none (invalid)
SCSI opcode : ffffffff ffffffff ffffffff ffffffff none (invalid)
FCP rsp inf cod: 0xff none (invalid)
FCP rsp IU : 00000000 00000000 00000000 00000000 none (invalid)
00000000 00000000
For any other invalid value, common code returns EINVAL without invoking
our callback:
$ echo foo > /sys/class/scsi_host/host<N>/host_reset
-bash: echo: write error: Invalid argument
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Since v2.6.27 commit 553448f6c483 ("[SCSI] zfcp: Message cleanup"), none of
the callers has been interested any more. Values were not returned
consistently in all ERP trigger functions.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Simplify its signature to return boolean and rename it to
zfcp_erp_action_is_running() to indicate its actual unmodified semantics.
It has always been used like this since v2.6.0 history commit ea127f975424
("[PATCH] s390 (7/7): zfcp host adapter.").
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
All constant defines were introduced with v2.6.0 history commit ea127f975424
("[PATCH] s390 (7/7): zfcp host adapter.") and refactored into enums with
commit 287ac01acf22 ("[SCSI] zfcp: Cleanup code in zfcp_erp.c").
ZFCP_STATUS_ERP_DISMISSING and ZFCP_ERP_STEP_FSF_XCONFIG were never used.
v2.6.27 commit 287ac01acf22 ("[SCSI] zfcp: Cleanup code in zfcp_erp.c")
removed the use of ZFCP_ERP_ACTION_READY on refactoring
zfcp_erp_action_exists() to now only check adapter->erp_running_head but no
longer adapter->erp_ready_head. The same commit could have changed the
function return type from int to "enum zfcp_erp_act_state".
ZFCP_ERP_ACTION_READY was never used outside of zfcp_erp_action_exists().
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
I've been mixing up
zfcp_task_mgmt_function() [SCSI] and
zfcp_fsf_fcp_task_mgmt() [FSF]
so often lately that I wanted to fix this.
SCSI changes complement v2.6.27 commit f76af7d7e363 ("[SCSI] zfcp: Cleanup
of code in zfcp_scsi.c").
While at it, also fixup the other inconsistencies elsewhere.
ERP changes complement v2.6.27 commit 287ac01acf22 ("[SCSI] zfcp: Cleanup
code in zfcp_erp.c") which introduced status_change_set().
FC changes complement v2.6.32 commit 6f53a2d2ecae ("[SCSI] zfcp: Apply
common naming conventions to zfcp_fc"). by renaming a leftover introduced
with v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote
ports").
FSF changes fixup v2.6.32 commit a4623c467ff7 ("[SCSI] zfcp: Improve request
allocation through mempools"). which replaced zfcp_fsf_alloc_qtcb()
introduced with v2.6.27 commit c41f8cbddd4e ("[SCSI] zfcp: zfcp_fsf
cleanup.").
SCSI fc_host statistics were introduced with v2.6.16 commit f6cd94b126aa
("[SCSI] zfcp: transport class adaptations").
SCSI fc_host port_state was introduced with v2.6.27 commit 85a82392fe6f
("[SCSI] zfcp: Add port_state attribute to sysfs").
SCSI rport setter for dev_loss_tmo was introduced with v2.6.18 commit
338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target
port is unavailable").
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : .......
LUN : 0x...
WWPN : 0x...
D_ID : 0x...
Adapter status : 0x...
Port status : 0x...
LUN status : 0x...
Ready count : 0x...
Running count : 0x...
ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_...
ERP need : 0xc0 ZFCP_ERP_ACTION_NONE
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
That other commit introduced an inconsistency because it would trace on
ERP_FAILED for all callers of port forced reopen triggers (not just
terminate_rport_io), but it would not trace on ERP_FAILED for all callers of
other ERP triggers such as adapter, port regular, LUN.
Therefore, generalize that other commit. zfcp_erp_action_enqueue() already
had two early outs which re-used the one zfcp_dbf_rec_trig() call. All ERP
trigger functions finally run through zfcp_erp_action_enqueue(). So move
the special handling for ZFCP_STATUS_COMMON_ERP_FAILED into
zfcp_erp_action_enqueue() and add another early out with new trace marker
for pseudo ERP need in this case. This removes all early returns from all
ERP trigger functions so we always end up at zfcp_dbf_rec_trig().
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : .......
LUN : 0x...
WWPN : 0x...
D_ID : 0x...
Adapter status : 0x...
Port status : 0x...
LUN status : 0x...
Ready count : 0x...
Running count : 0x...
ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_...
ERP need : 0xe0 ZFCP_ERP_ACTION_FAILED
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
For problem determination we always want to see when we were invoked on the
terminate_rport_io callback whether we perform something or not.
Temporal |