linux.git/drivers/nvme, branch v4.18.19

nvme: remove ns sibling before clearing path

2018-11-13T19:12:16+00:00

[ Upstream commit 48f78be3326052a7718678ff9a78d6d884a50323 ]

The code had been clearing a namespace being deleted as the current
path while that namespace was still in the path siblings list. It is
possible a new IO could set that namespace back to the current path
since it appeared to be an eligable path to select, which may result in
a use-after-free error.

This patch ensures a namespace being removed is not eligable to be reset
as a current path prior to clearing it as the current path.

Signed-off-by: Keith Busch 
Reviewed-by: Sagi Grimberg 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

nvmet-rdma: fix possible bogus dereference under heavy load

2018-10-10T06:56:00+00:00

[ Upstream commit 8407879c4e0d7731f6e7e905893cecf61a7762c7 ]

Currently we always repost the recv buffer before we send a response
capsule back to the host. Since ordering is not guaranteed for send
and recv completions, it is posible that we will receive a new request
from the host before we got a send completion for the response capsule.

Today, we pre-allocate 2x rsps the length of the queue, but in reality,
under heavy load there is nothing that is really preventing the gap to
expand until we exhaust all our rsps.

To fix this, if we don't have any pre-allocated rsps left, we dynamically
allocate a rsp and make sure to free it when we are done. If under memory
pressure we fail to allocate a rsp, we silently drop the command and
wait for the host to retry.

Reported-by: Steve Wise 
Tested-by: Steve Wise 
Signed-off-by: Sagi Grimberg 
[hch: dropped a superflous assignment]
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

nvme-fcloop: Fix dropped LS's to removed target port

2018-10-03T23:59:26+00:00

[ Upstream commit afd299ca996929f4f98ac20da0044c0cdc124879 ]

When a targetport is removed from the config, fcloop will avoid calling
the LS done() routine thinking the targetport is gone. This leaves the
initiator reset/reconnect hanging as it waits for a status on the
Create_Association LS for the reconnect.

Change the filter in the LS callback path. If tport null (set when
failed validation before "sending to remote port"), be sure to call
done. This was the main bug. But, continue the logic that only calls
done if tport was set but there is no remoteport (e.g. case where
remoteport has been removed, thus host doesn't expect a completion).

Signed-off-by: James Smart 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

nvme-rdma: unquiesce queues when deleting the controller

2018-09-26T06:39:26+00:00

[ Upstream commit 90140624e8face94207003ac9a9d2a329b309d68 ]

If the controller is going away, we need to unquiesce the IO queues so
that all pending request can fail gracefully before moving forward with
controller deletion. Do that before we destroy the IO queues so
blk_cleanup_queue won't block in freeze.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

nvmet: fix file discard return status

2018-09-26T06:39:26+00:00

[ Upstream commit 1b72b71faccee986e2128a271125177dfe91f7b7 ]

If nvmet_copy_from_sgl failed, we falsly return successful
completion status.

Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support")
Signed-off-by: Sagi Grimberg 
Reviewed-by: Chaitanya Kulkarni 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event

2018-09-05T07:29:47+00:00

commit f1ed3df20d2d223e0852cc4ac1f19bba869a7e3c upstream.

In many architectures loads may be reordered with older stores to
different locations.  In the nvme driver the following two operations
could be reordered:

 - Write shadow doorbell (dbbuf_db) into memory.
 - Read EventIdx (dbbuf_ei) from memory.

This can result in a potential race condition between driver and VM host
processing requests (if given virtual NVMe controller has a support for
shadow doorbell).  If that occurs, then the NVMe controller may decide to
wait for MMIO doorbell from guest operating system, and guest driver may
decide not to issue MMIO doorbell on any of subsequent commands.

This issue is purely timing-dependent one, so there is no easy way to
reproduce it. Currently the easiest known approach is to run "Oracle IO
Numbers" (orion) that is shipped with Oracle DB:

orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
	concat -write 40 -duration 120 -matrix row -testname nvme_test

Where nvme_test is a .lun file that contains a list of NVMe block
devices to run test against. Limiting number of vCPUs assigned to given
VM instance seems to increase chances for this bug to occur. On test
environment with VM that got 4 NVMe drives and 1 vCPU assigned the
virtual NVMe controller hang could be observed within 10-20 minutes.
That correspond to about 400-500k IO operations processed (or about
100GB of IO read/writes).

Orion tool was used as a validation and set to run in a loop for 36
hours (equivalent of pushing 550M IO operations). No issues were
observed. That suggest that the patch fixes the issue.

Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices")
Signed-off-by: Michal Wnukowski 
Reviewed-by: Keith Busch 
Reviewed-by: Sagi Grimberg 
[hch: updated changelog and comment a bit]
Signed-off-by: Christoph Hellwig 
Signed-off-by: Greg Kroah-Hartman

Merge tag 'for-linus-20180727' of git://git.kernel.dk/linux-block

2018-07-27T19:51:00+00:00

Pull block fixes from Jens Axboe:
 "Bigger than usual at this time, mostly due to the O_DIRECT corruption
  issue and the fact that I was on vacation last week. This contains:

   - NVMe pull request with two fixes for the FC code, and two target
     fixes (Christoph)

   - a DIF bio reset iteration fix (Greg Edwards)

   - two nbd reply and requeue fixes (Josef)

   - SCSI timeout fixup (Keith)

   - a small series that fixes an issue with bio_iov_iter_get_pages(),
     which ended up causing corruption for larger sized O_DIRECT writes
     that ended up racing with buffered writes (Martin Wilck)"

* tag 'for-linus-20180727' of git://git.kernel.dk/linux-block:
  block: reset bi_iter.bi_done after splitting bio
  block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
  blkdev: __blkdev_direct_IO_simple: fix leak in error case
  block: bio_iov_iter_get_pages: fix size of last iovec
  nvmet: only check for filebacking on -ENOTBLK
  nvmet: fixup crash on NULL device path
  scsi: set timed out out mq requests to complete
  blk-mq: export setting request completion state
  nvme: if_ready checks to fail io to deleting controller
  nvmet-fc: fix target sgl list on large transfers
  nbd: handle unexpected replies better
  nbd: don't requeue the same request twice.

nvmet: only check for filebacking on -ENOTBLK

2018-07-25T11:14:04+00:00

We only need to check for a file-backed namespace if
nvmet_bdev_ns_enable() returns -ENOTBLK. For any other error
it's pointless as the open() error will remain the same.

Fixes: d5eff33e ("nvmet: add simple file backed ns support")
Signed-off-by: Hannes Reinecke 
Reviewed-by: Chaitanya Kulkarni 
Signed-off-by: Christoph Hellwig

nvmet: fixup crash on NULL device path

2018-07-25T11:14:03+00:00

When writing an empty string into the device_path attribute the kernel
will crash with

nvmet: failed to open block device (null): (-22)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000

This patch sanitizes the error handling for invalid device path settings.

Fixes: a07b4970 ("nvmet: add a generic NVMe target")
Signed-off-by: Hannes Reinecke 
Reviewed-by: Chaitanya Kulkarni 
Signed-off-by: Christoph Hellwig

nvme: if_ready checks to fail io to deleting controller

2018-07-24T11:44:40+00:00

The revised if_ready checks skipped over the case of returning error when
the controller is being deleted.  Instead it was returning BUSY, which
caused the ios to retry, which caused the ns delete to hang waiting for
the ios to drain.

Stack trace of hang looks like:
 kworker/u64:2   D    0    74      2 0x80000000
 Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
 Call Trace:
  ? __schedule+0x26d/0x820
  schedule+0x32/0x80
  blk_mq_freeze_queue_wait+0x36/0x80
  ? remove_wait_queue+0x60/0x60
  blk_cleanup_queue+0x72/0x160
  nvme_ns_remove+0x106/0x140 [nvme_core]
  nvme_remove_namespaces+0x7e/0xa0 [nvme_core]
  nvme_delete_ctrl_work+0x4d/0x80 [nvme_core]
  process_one_work+0x160/0x350
  worker_thread+0x1c3/0x3d0
  kthread+0xf5/0x130
  ? process_one_work+0x350/0x350
  ? kthread_bind+0x10/0x10
  ret_from_fork+0x1f/0x30

Extend nvmf_fail_nonready_command() to supply the controller pointer so
that the controller state can be looked at. Fail any io to a controller
that is deleting.

Fixes: 3bc32bb1186c ("nvme-fabrics: refactor queue ready check")
Fixes: 35897b920c8a ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
Signed-off-by: James Smart 
Signed-off-by: Christoph Hellwig 
Tested-by: Ewan D. Milne 
Reviewed-by: Ewan D. Milne