<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/block/blk-mq.c, branch v4.4.122</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>blk-mq: Avoid memory reclaim when remapping queues</title>
<updated>2017-04-18T05:14:37+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@linux.vnet.ibm.com</email>
</author>
<published>2016-12-06T15:31:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=f4522e36edaa9ec0cada0daa5c2628db762dd3d9'/>
<id>f4522e36edaa9ec0cada0daa5c2628db762dd3d9</id>
<content type='text'>
commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 upstream.

While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@linux.vnet.ibm.com&gt;
Cc: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Cc: Douglas Miller &lt;dougmill@linux.vnet.ibm.com&gt;
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 upstream.

While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@linux.vnet.ibm.com&gt;
Cc: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Cc: Douglas Miller &lt;dougmill@linux.vnet.ibm.com&gt;
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: really fix plug list flushing for nomerge queues</title>
<updated>2017-02-26T10:07:49+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2016-06-02T05:18:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=e8330cb5ae475f59308297e2ac9a496d4192912e'/>
<id>e8330cb5ae475f59308297e2ac9a496d4192912e</id>
<content type='text'>
commit 87c279e613f848c691111b29d49de8df3f4f56da upstream.

Commit 0809e3ac6231 ("block: fix plug list flushing for nomerge queues")
updated blk_mq_make_request() to set request_count even when
blk_queue_nomerges() returns true. However, blk_mq_make_request() only
does limited plugging and doesn't use request_count;
blk_sq_make_request() is the one that should have been fixed. Do that
and get rid of the unnecessary work in the mq version.

Fixes: 0809e3ac6231 ("block: fix plug list flushing for nomerge queues")
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 87c279e613f848c691111b29d49de8df3f4f56da upstream.

Commit 0809e3ac6231 ("block: fix plug list flushing for nomerge queues")
updated blk_mq_make_request() to set request_count even when
blk_queue_nomerges() returns true. However, blk_mq_make_request() only
does limited plugging and doesn't use request_count;
blk_sq_make_request() is the one that should have been fixed. Do that
and get rid of the unnecessary work in the mq version.

Fixes: 0809e3ac6231 ("block: fix plug list flushing for nomerge queues")
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: Always schedule hctx-&gt;next_cpu</title>
<updated>2017-01-19T19:17:22+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@linux.vnet.ibm.com</email>
</author>
<published>2016-09-28T03:24:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=6e8210ad2585ba8292d91374e2d0750dd45773e9'/>
<id>6e8210ad2585ba8292d91374e2d0750dd45773e9</id>
<content type='text'>
commit c02ebfdddbafa9a6a0f52fbd715e6bfa229af9d3 upstream.

Commit 0e87e58bf60e ("blk-mq: improve warning for running a queue on the
wrong CPU") attempts to avoid triggering the WARN_ON in
__blk_mq_run_hw_queue when the expected CPU is dead.  Problem is, in the
last batch execution before round robin, blk_mq_hctx_next_cpu can
schedule a dead CPU and also update next_cpu to the next alive CPU in
the mask, which will trigger the WARN_ON despite the previous
workaround.

The following patch fixes this scenario by always scheduling the value
in hctx-&gt;next_cpu.  This changes the moment when we round-robin the CPU
running the hctx, but it really doesn't matter, since it still executes
BLK_MQ_CPU_WORK_BATCH times in a row before switching to another CPU.

Fixes: 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU")
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@linux.vnet.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c02ebfdddbafa9a6a0f52fbd715e6bfa229af9d3 upstream.

Commit 0e87e58bf60e ("blk-mq: improve warning for running a queue on the
wrong CPU") attempts to avoid triggering the WARN_ON in
__blk_mq_run_hw_queue when the expected CPU is dead.  Problem is, in the
last batch execution before round robin, blk_mq_hctx_next_cpu can
schedule a dead CPU and also update next_cpu to the next alive CPU in
the mask, which will trigger the WARN_ON despite the previous
workaround.

The following patch fixes this scenario by always scheduling the value
in hctx-&gt;next_cpu.  This changes the moment when we round-robin the CPU
running the hctx, but it really doesn't matter, since it still executes
BLK_MQ_CPU_WORK_BATCH times in a row before switching to another CPU.

Fixes: 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU")
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@linux.vnet.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: Do not invoke .queue_rq() for a stopped queue</title>
<updated>2017-01-06T10:16:14+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bart.vanassche@sandisk.com</email>
</author>
<published>2016-10-29T00:18:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=f0898dc2852b2e12afabff150bbf2390efec0395'/>
<id>f0898dc2852b2e12afabff150bbf2390efec0395</id>
<content type='text'>
commit bc27c01b5c46d3bfec42c96537c7a3fae0bb2cc4 upstream.

The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.

Reported-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Signed-off-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Reviewed-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bc27c01b5c46d3bfec42c96537c7a3fae0bb2cc4 upstream.

The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.

Reported-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Signed-off-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Reviewed-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: actually hook up defer list when running requests</title>
<updated>2016-10-07T13:23:44+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2016-06-09T01:22:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=4042db311c0fc3ebea3f85e39adba70ef01d5756'/>
<id>4042db311c0fc3ebea3f85e39adba70ef01d5756</id>
<content type='text'>
commit 52b9c330c6a8a4b5a1819bdaddf4ec76ab571e81 upstream.

If -&gt;queue_rq() returns BLK_MQ_RQ_QUEUE_OK, we use continue and skip
over the rest of the loop body. However, dptr is assigned later in the
loop body, and the BLK_MQ_RQ_QUEUE_OK case is exactly the case that we'd
want it for.

NVMe isn't actually using BLK_MQ_F_DEFER_ISSUE yet, nor is any other
in-tree driver, but if the code's going to be there, it might as well
work.

Fixes: 74c450521dd8 ("blk-mq: add a 'list' parameter to -&gt;queue_rq()")
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 52b9c330c6a8a4b5a1819bdaddf4ec76ab571e81 upstream.

If -&gt;queue_rq() returns BLK_MQ_RQ_QUEUE_OK, we use continue and skip
over the rest of the loop body. However, dptr is assigned later in the
loop body, and the BLK_MQ_RQ_QUEUE_OK case is exactly the case that we'd
want it for.

NVMe isn't actually using BLK_MQ_F_DEFER_ISSUE yet, nor is any other
in-tree driver, but if the code's going to be there, it might as well
work.

Fixes: 74c450521dd8 ("blk-mq: add a 'list' parameter to -&gt;queue_rq()")
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: End unstarted requests on dying queue</title>
<updated>2016-09-15T06:27:47+00:00</updated>
<author>
<name>Keith Busch</name>
<email>keith.busch@intel.com</email>
</author>
<published>2016-05-26T16:25:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=4af03e19226cebe72d45c669ddc3b33666adf18e'/>
<id>4af03e19226cebe72d45c669ddc3b33666adf18e</id>
<content type='text'>
[ Upstream commit a59e0f5795fe52dad42a99c00287e3766153b312 ]

Go directly to ending a request if it wasn't started. Previously, completing a
request may invoke a driver callback for a request it didn't initialize.

Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Sagi Grimberg &lt;sagig@mellanox.com&gt;
Reviewed-by: Johannes Thumshirn &lt;jthumshirn at suse.de&gt;
Acked-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit a59e0f5795fe52dad42a99c00287e3766153b312 ]

Go directly to ending a request if it wasn't started. Previously, completing a
request may invoke a driver callback for a request it didn't initialize.

Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Sagi Grimberg &lt;sagig@mellanox.com&gt;
Reviewed-by: Johannes Thumshirn &lt;jthumshirn at suse.de&gt;
Acked-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: fix calling unplug callbacks with preempt disabled</title>
<updated>2015-11-21T03:29:45+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2015-11-21T03:29:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=b094f89ca42fbb8ce40174d5f85ca8430e499da6'/>
<id>b094f89ca42fbb8ce40174d5f85ca8430e499da6</id>
<content type='text'>
Liu reported that running certain parts of xfstests threw the
following error:

BUG: sleeping function called from invalid context at mm/page_alloc.c:3190
in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u16:0
3 locks held by kworker/u16:0/6:
 #0:  ("writeback"){++++.+}, at: [&lt;ffffffff8107f083&gt;] process_one_work+0x173/0x730
 #1:  ((&amp;(&amp;wb-&gt;dwork)-&gt;work)){+.+.+.}, at: [&lt;ffffffff8107f083&gt;] process_one_work+0x173/0x730
 #2:  (&amp;type-&gt;s_umount_key#44){+++++.}, at: [&lt;ffffffff811e6805&gt;] trylock_super+0x25/0x60
CPU: 5 PID: 6 Comm: kworker/u16:0 Tainted: G           OE   4.3.0+ #3
Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
Workqueue: writeback wb_workfn (flush-btrfs-108)
 ffffffff81a3abab ffff88042e282ba8 ffffffff8130191b ffffffff81a3abab
 0000000000000c76 ffff88042e282ba8 ffff88042e27c180 ffff88042e282bd8
 ffffffff8108ed95 ffff880400000004 0000000000000000 0000000000000c76
Call Trace:
 [&lt;ffffffff8130191b&gt;] dump_stack+0x4f/0x74
 [&lt;ffffffff8108ed95&gt;] ___might_sleep+0x185/0x240
 [&lt;ffffffff8108eea2&gt;] __might_sleep+0x52/0x90
 [&lt;ffffffff811817e8&gt;] __alloc_pages_nodemask+0x268/0x410
 [&lt;ffffffff8109a43c&gt;] ? sched_clock_local+0x1c/0x90
 [&lt;ffffffff8109a6d1&gt;] ? local_clock+0x21/0x40
 [&lt;ffffffff810b9eb0&gt;] ? __lock_release+0x420/0x510
 [&lt;ffffffff810b534c&gt;] ? __lock_acquired+0x16c/0x3c0
 [&lt;ffffffff811ca265&gt;] alloc_pages_current+0xc5/0x210
 [&lt;ffffffffa0577105&gt;] ? rbio_is_full+0x55/0x70 [btrfs]
 [&lt;ffffffff810b7ed8&gt;] ? mark_held_locks+0x78/0xa0
 [&lt;ffffffff81666d50&gt;] ? _raw_spin_unlock_irqrestore+0x40/0x60
 [&lt;ffffffffa0578c0a&gt;] full_stripe_write+0x5a/0xc0 [btrfs]
 [&lt;ffffffffa0578ca9&gt;] __raid56_parity_write+0x39/0x60 [btrfs]
 [&lt;ffffffffa0578deb&gt;] run_plug+0x11b/0x140 [btrfs]
 [&lt;ffffffffa0578e33&gt;] btrfs_raid_unplug+0x23/0x70 [btrfs]
 [&lt;ffffffff812d36c2&gt;] blk_flush_plug_list+0x82/0x1f0
 [&lt;ffffffff812e0349&gt;] blk_sq_make_request+0x1f9/0x740
 [&lt;ffffffff812ceba2&gt;] ? generic_make_request_checks+0x222/0x7c0
 [&lt;ffffffff812cf264&gt;] ? blk_queue_enter+0x124/0x310
 [&lt;ffffffff812cf1d2&gt;] ? blk_queue_enter+0x92/0x310
 [&lt;ffffffff812d0ae2&gt;] generic_make_request+0x172/0x2c0
 [&lt;ffffffff812d0ad4&gt;] ? generic_make_request+0x164/0x2c0
 [&lt;ffffffff812d0ca0&gt;] submit_bio+0x70/0x140
 [&lt;ffffffffa0577b29&gt;] ? rbio_add_io_page+0x99/0x150 [btrfs]
 [&lt;ffffffffa0578a89&gt;] finish_rmw+0x4d9/0x600 [btrfs]
 [&lt;ffffffffa0578c4c&gt;] full_stripe_write+0x9c/0xc0 [btrfs]
 [&lt;ffffffffa057ab7f&gt;] raid56_parity_write+0xef/0x160 [btrfs]
 [&lt;ffffffffa052bd83&gt;] btrfs_map_bio+0xe3/0x2d0 [btrfs]
 [&lt;ffffffffa04fbd6d&gt;] btrfs_submit_bio_hook+0x8d/0x1d0 [btrfs]
 [&lt;ffffffffa05173c4&gt;] submit_one_bio+0x74/0xb0 [btrfs]
 [&lt;ffffffffa0517f55&gt;] submit_extent_page+0xe5/0x1c0 [btrfs]
 [&lt;ffffffffa0519b18&gt;] __extent_writepage_io+0x408/0x4c0 [btrfs]
 [&lt;ffffffffa05179c0&gt;] ? alloc_dummy_extent_buffer+0x140/0x140 [btrfs]
 [&lt;ffffffffa051dc88&gt;] __extent_writepage+0x218/0x3a0 [btrfs]
 [&lt;ffffffff810b7ed8&gt;] ? mark_held_locks+0x78/0xa0
 [&lt;ffffffffa051e2c9&gt;] extent_write_cache_pages.clone.0+0x2f9/0x400 [btrfs]
 [&lt;ffffffffa051e422&gt;] extent_writepages+0x52/0x70 [btrfs]
 [&lt;ffffffffa05001f0&gt;] ? btrfs_set_inode_index+0x70/0x70 [btrfs]
 [&lt;ffffffffa04fcc17&gt;] btrfs_writepages+0x27/0x30 [btrfs]
 [&lt;ffffffff81184df3&gt;] do_writepages+0x23/0x40
 [&lt;ffffffff81212229&gt;] __writeback_single_inode+0x89/0x4d0
 [&lt;ffffffff81212a60&gt;] ? writeback_sb_inodes+0x260/0x480
 [&lt;ffffffff81212a60&gt;] ? writeback_sb_inodes+0x260/0x480
 [&lt;ffffffff8121295f&gt;] ? writeback_sb_inodes+0x15f/0x480
 [&lt;ffffffff81212ad2&gt;] writeback_sb_inodes+0x2d2/0x480
 [&lt;ffffffff810b1397&gt;] ? down_read_trylock+0x57/0x60
 [&lt;ffffffff811e6805&gt;] ? trylock_super+0x25/0x60
 [&lt;ffffffff810d629f&gt;] ? rcu_read_lock_sched_held+0x4f/0x90
 [&lt;ffffffff81212d0c&gt;] __writeback_inodes_wb+0x8c/0xc0
 [&lt;ffffffff812130b5&gt;] wb_writeback+0x2b5/0x500
 [&lt;ffffffff810b7ed8&gt;] ? mark_held_locks+0x78/0xa0
 [&lt;ffffffff810660a8&gt;] ? __local_bh_enable_ip+0x68/0xc0
 [&lt;ffffffff81213362&gt;] ? wb_do_writeback+0x62/0x310
 [&lt;ffffffff812133c1&gt;] wb_do_writeback+0xc1/0x310
 [&lt;ffffffff8107c3d9&gt;] ? set_worker_desc+0x79/0x90
 [&lt;ffffffff81213842&gt;] wb_workfn+0x92/0x330
 [&lt;ffffffff8107f133&gt;] process_one_work+0x223/0x730
 [&lt;ffffffff8107f083&gt;] ? process_one_work+0x173/0x730
 [&lt;ffffffff8108035f&gt;] ? worker_thread+0x18f/0x430
 [&lt;ffffffff810802ed&gt;] worker_thread+0x11d/0x430
 [&lt;ffffffff810801d0&gt;] ? maybe_create_worker+0xf0/0xf0
 [&lt;ffffffff810801d0&gt;] ? maybe_create_worker+0xf0/0xf0
 [&lt;ffffffff810858df&gt;] kthread+0xef/0x110
 [&lt;ffffffff8108f74e&gt;] ? schedule_tail+0x1e/0xd0
 [&lt;ffffffff810857f0&gt;] ? __init_kthread_worker+0x70/0x70
 [&lt;ffffffff816673bf&gt;] ret_from_fork+0x3f/0x70
 [&lt;ffffffff810857f0&gt;] ? __init_kthread_worker+0x70/0x70

The issue is that we've got the software context pinned while
calling blk_flush_plug_list(), which flushes callbacks that
are allowed to sleep. btrfs and raid has such callbacks.

Flip the checks around a bit, so we can enable preempt a bit
earlier and flush plugs without having preempt disabled.

This only affects blk-mq driven devices, and only those that
register a single queue.

Reported-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Tested-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Liu reported that running certain parts of xfstests threw the
following error:

BUG: sleeping function called from invalid context at mm/page_alloc.c:3190
in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u16:0
3 locks held by kworker/u16:0/6:
 #0:  ("writeback"){++++.+}, at: [&lt;ffffffff8107f083&gt;] process_one_work+0x173/0x730
 #1:  ((&amp;(&amp;wb-&gt;dwork)-&gt;work)){+.+.+.}, at: [&lt;ffffffff8107f083&gt;] process_one_work+0x173/0x730
 #2:  (&amp;type-&gt;s_umount_key#44){+++++.}, at: [&lt;ffffffff811e6805&gt;] trylock_super+0x25/0x60
CPU: 5 PID: 6 Comm: kworker/u16:0 Tainted: G           OE   4.3.0+ #3
Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
Workqueue: writeback wb_workfn (flush-btrfs-108)
 ffffffff81a3abab ffff88042e282ba8 ffffffff8130191b ffffffff81a3abab
 0000000000000c76 ffff88042e282ba8 ffff88042e27c180 ffff88042e282bd8
 ffffffff8108ed95 ffff880400000004 0000000000000000 0000000000000c76
Call Trace:
 [&lt;ffffffff8130191b&gt;] dump_stack+0x4f/0x74
 [&lt;ffffffff8108ed95&gt;] ___might_sleep+0x185/0x240
 [&lt;ffffffff8108eea2&gt;] __might_sleep+0x52/0x90
 [&lt;ffffffff811817e8&gt;] __alloc_pages_nodemask+0x268/0x410
 [&lt;ffffffff8109a43c&gt;] ? sched_clock_local+0x1c/0x90
 [&lt;ffffffff8109a6d1&gt;] ? local_clock+0x21/0x40
 [&lt;ffffffff810b9eb0&gt;] ? __lock_release+0x420/0x510
 [&lt;ffffffff810b534c&gt;] ? __lock_acquired+0x16c/0x3c0
 [&lt;ffffffff811ca265&gt;] alloc_pages_current+0xc5/0x210
 [&lt;ffffffffa0577105&gt;] ? rbio_is_full+0x55/0x70 [btrfs]
 [&lt;ffffffff810b7ed8&gt;] ? mark_held_locks+0x78/0xa0
 [&lt;ffffffff81666d50&gt;] ? _raw_spin_unlock_irqrestore+0x40/0x60
 [&lt;ffffffffa0578c0a&gt;] full_stripe_write+0x5a/0xc0 [btrfs]
 [&lt;ffffffffa0578ca9&gt;] __raid56_parity_write+0x39/0x60 [btrfs]
 [&lt;ffffffffa0578deb&gt;] run_plug+0x11b/0x140 [btrfs]
 [&lt;ffffffffa0578e33&gt;] btrfs_raid_unplug+0x23/0x70 [btrfs]
 [&lt;ffffffff812d36c2&gt;] blk_flush_plug_list+0x82/0x1f0
 [&lt;ffffffff812e0349&gt;] blk_sq_make_request+0x1f9/0x740
 [&lt;ffffffff812ceba2&gt;] ? generic_make_request_checks+0x222/0x7c0
 [&lt;ffffffff812cf264&gt;] ? blk_queue_enter+0x124/0x310
 [&lt;ffffffff812cf1d2&gt;] ? blk_queue_enter+0x92/0x310
 [&lt;ffffffff812d0ae2&gt;] generic_make_request+0x172/0x2c0
 [&lt;ffffffff812d0ad4&gt;] ? generic_make_request+0x164/0x2c0
 [&lt;ffffffff812d0ca0&gt;] submit_bio+0x70/0x140
 [&lt;ffffffffa0577b29&gt;] ? rbio_add_io_page+0x99/0x150 [btrfs]
 [&lt;ffffffffa0578a89&gt;] finish_rmw+0x4d9/0x600 [btrfs]
 [&lt;ffffffffa0578c4c&gt;] full_stripe_write+0x9c/0xc0 [btrfs]
 [&lt;ffffffffa057ab7f&gt;] raid56_parity_write+0xef/0x160 [btrfs]
 [&lt;ffffffffa052bd83&gt;] btrfs_map_bio+0xe3/0x2d0 [btrfs]
 [&lt;ffffffffa04fbd6d&gt;] btrfs_submit_bio_hook+0x8d/0x1d0 [btrfs]
 [&lt;ffffffffa05173c4&gt;] submit_one_bio+0x74/0xb0 [btrfs]
 [&lt;ffffffffa0517f55&gt;] submit_extent_page+0xe5/0x1c0 [btrfs]
 [&lt;ffffffffa0519b18&gt;] __extent_writepage_io+0x408/0x4c0 [btrfs]
 [&lt;ffffffffa05179c0&gt;] ? alloc_dummy_extent_buffer+0x140/0x140 [btrfs]
 [&lt;ffffffffa051dc88&gt;] __extent_writepage+0x218/0x3a0 [btrfs]
 [&lt;ffffffff810b7ed8&gt;] ? mark_held_locks+0x78/0xa0
 [&lt;ffffffffa051e2c9&gt;] extent_write_cache_pages.clone.0+0x2f9/0x400 [btrfs]
 [&lt;ffffffffa051e422&gt;] extent_writepages+0x52/0x70 [btrfs]
 [&lt;ffffffffa05001f0&gt;] ? btrfs_set_inode_index+0x70/0x70 [btrfs]
 [&lt;ffffffffa04fcc17&gt;] btrfs_writepages+0x27/0x30 [btrfs]
 [&lt;ffffffff81184df3&gt;] do_writepages+0x23/0x40
 [&lt;ffffffff81212229&gt;] __writeback_single_inode+0x89/0x4d0
 [&lt;ffffffff81212a60&gt;] ? writeback_sb_inodes+0x260/0x480
 [&lt;ffffffff81212a60&gt;] ? writeback_sb_inodes+0x260/0x480
 [&lt;ffffffff8121295f&gt;] ? writeback_sb_inodes+0x15f/0x480
 [&lt;ffffffff81212ad2&gt;] writeback_sb_inodes+0x2d2/0x480
 [&lt;ffffffff810b1397&gt;] ? down_read_trylock+0x57/0x60
 [&lt;ffffffff811e6805&gt;] ? trylock_super+0x25/0x60
 [&lt;ffffffff810d629f&gt;] ? rcu_read_lock_sched_held+0x4f/0x90
 [&lt;ffffffff81212d0c&gt;] __writeback_inodes_wb+0x8c/0xc0
 [&lt;ffffffff812130b5&gt;] wb_writeback+0x2b5/0x500
 [&lt;ffffffff810b7ed8&gt;] ? mark_held_locks+0x78/0xa0
 [&lt;ffffffff810660a8&gt;] ? __local_bh_enable_ip+0x68/0xc0
 [&lt;ffffffff81213362&gt;] ? wb_do_writeback+0x62/0x310
 [&lt;ffffffff812133c1&gt;] wb_do_writeback+0xc1/0x310
 [&lt;ffffffff8107c3d9&gt;] ? set_worker_desc+0x79/0x90
 [&lt;ffffffff81213842&gt;] wb_workfn+0x92/0x330
 [&lt;ffffffff8107f133&gt;] process_one_work+0x223/0x730
 [&lt;ffffffff8107f083&gt;] ? process_one_work+0x173/0x730
 [&lt;ffffffff8108035f&gt;] ? worker_thread+0x18f/0x430
 [&lt;ffffffff810802ed&gt;] worker_thread+0x11d/0x430
 [&lt;ffffffff810801d0&gt;] ? maybe_create_worker+0xf0/0xf0
 [&lt;ffffffff810801d0&gt;] ? maybe_create_worker+0xf0/0xf0
 [&lt;ffffffff810858df&gt;] kthread+0xef/0x110
 [&lt;ffffffff8108f74e&gt;] ? schedule_tail+0x1e/0xd0
 [&lt;ffffffff810857f0&gt;] ? __init_kthread_worker+0x70/0x70
 [&lt;ffffffff816673bf&gt;] ret_from_fork+0x3f/0x70
 [&lt;ffffffff810857f0&gt;] ? __init_kthread_worker+0x70/0x70

The issue is that we've got the software context pinned while
calling blk_flush_plug_list(), which flushes callbacks that
are allowed to sleep. btrfs and raid has such callbacks.

Flip the checks around a bit, so we can enable preempt a bit
earlier and flush plugs without having preempt disabled.

This only affects blk-mq driven devices, and only those that
register a single queue.

Reported-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Tested-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: mark __blk_mq_complete_request() static</title>
<updated>2015-11-11T16:36:56+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2015-11-05T21:32:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=1fa8cc52f46c14fb1afc20c220855c40a5d28fcd'/>
<id>1fa8cc52f46c14fb1afc20c220855c40a5d28fcd</id>
<content type='text'>
It's no longer used outside of blk-mq core.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It's no longer used outside of blk-mq core.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block</title>
<updated>2015-11-11T01:23:49+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-11-11T01:23:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=3419b45039c6b799c974a8019361c045e7ca232c'/>
<id>3419b45039c6b799c974a8019361c045e7ca232c</id>
<content type='text'>
Pull block IO poll support from Jens Axboe:
 "Various groups have been doing experimentation around IO polling for
  (really) fast devices.  The code has been reviewed and has been
  sitting on the side for a few releases, but this is now good enough
  for coordinated benchmarking and further experimentation.

  Currently O_DIRECT sync read/write are supported.  A framework is in
  the works that allows scalable stats tracking so we can auto-tune
  this.  And we'll add libaio support as well soon.  Fow now, it's an
  opt-in feature for test purposes"

* 'for-4.4/io-poll' of git://git.kernel.dk/linux-block:
  direct-io: be sure to assign dio-&gt;bio_bdev for both paths
  directio: add block polling support
  NVMe: add blk polling support
  block: add block polling support
  blk-mq: return tag/queue combo in the make_request_fn handlers
  block: change -&gt;make_request_fn() and users to return a queue cookie
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull block IO poll support from Jens Axboe:
 "Various groups have been doing experimentation around IO polling for
  (really) fast devices.  The code has been reviewed and has been
  sitting on the side for a few releases, but this is now good enough
  for coordinated benchmarking and further experimentation.

  Currently O_DIRECT sync read/write are supported.  A framework is in
  the works that allows scalable stats tracking so we can auto-tune
  this.  And we'll add libaio support as well soon.  Fow now, it's an
  opt-in feature for test purposes"

* 'for-4.4/io-poll' of git://git.kernel.dk/linux-block:
  direct-io: be sure to assign dio-&gt;bio_bdev for both paths
  directio: add block polling support
  NVMe: add blk polling support
  block: add block polling support
  blk-mq: return tag/queue combo in the make_request_fn handlers
  block: change -&gt;make_request_fn() and users to return a queue cookie
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: return tag/queue combo in the make_request_fn handlers</title>
<updated>2015-11-07T17:40:47+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2015-11-05T17:41:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=7b371636fb6d187873d9d2730c2b1febc48a9b47'/>
<id>7b371636fb6d187873d9d2730c2b1febc48a9b47</id>
<content type='text'>
Return a cookie, blk_qc_t, from the blk-mq make request functions, that
allows a later caller to uniquely identify a specific IO. The cookie
doesn't mean anything to the caller, but the caller can use it to later
pass back to the block layer. The block layer can then identify the
hardware queue and request from that cookie.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Acked-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Keith Busch &lt;keith.busch@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Return a cookie, blk_qc_t, from the blk-mq make request functions, that
allows a later caller to uniquely identify a specific IO. The cookie
doesn't mean anything to the caller, but the caller can use it to later
pass back to the block layer. The block layer can then identify the
hardware queue and request from that cookie.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Acked-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Keith Busch &lt;keith.busch@intel.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
