<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/btrfs, branch v4.10.4</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>Btrfs: fix data loss after truncate when using the no-holes feature</title>
<updated>2017-03-15T02:20:10+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2017-02-14T16:56:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2708a2d33e04af19f7d7ab4ce8982b89288ebd42'/>
<id>2708a2d33e04af19f7d7ab4ce8982b89288ebd42</id>
<content type='text'>
commit 76b42abbf7488121c4f9f1ea5941123306e25d99 upstream.

If we have a file with an implicit hole (NO_HOLES feature enabled) that
has an extent following the hole, delayed writes against regions of the
file behind the hole happened before but were not yet flushed and then
we truncate the file to a smaller size that lies inside the hole, we
end up persisting a wrong disk_i_size value for our inode that leads to
data loss after umounting and mounting again the filesystem or after
the inode is evicted and loaded again.

This happens because at inode.c:btrfs_truncate_inode_items() we end up
setting last_size to the offset of the extent that we deleted and that
followed the hole. We then pass that value to btrfs_ordered_update_i_size()
which updates the inode's disk_i_size to a value smaller then the offset
of the buffered (delayed) writes.

Example reproducer:

 $ mkfs.btrfs -f /dev/sdb
 $ mount /dev/sdb /mnt

 $ xfs_io -f -c "pwrite -S 0x01 0K 32K" /mnt/foo
 $ xfs_io -d -c "pwrite -S 0x02 -b 32K 64K 32K" /mnt/foo
 $ xfs_io -c "truncate 60K" /mnt/foo
   --&gt; inode's disk_i_size updated to 0

 $ md5sum /mnt/foo
 3c5ca3c3ab42f4b04d7e7eb0b0d4d806  /mnt/foo

 $ umount /dev/sdb
 $ mount /dev/sdb /mnt

 $ md5sum /mnt/foo
 d41d8cd98f00b204e9800998ecf8427e  /mnt/foo
   --&gt; Empty file, all data lost!

Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 76b42abbf7488121c4f9f1ea5941123306e25d99 upstream.

If we have a file with an implicit hole (NO_HOLES feature enabled) that
has an extent following the hole, delayed writes against regions of the
file behind the hole happened before but were not yet flushed and then
we truncate the file to a smaller size that lies inside the hole, we
end up persisting a wrong disk_i_size value for our inode that leads to
data loss after umounting and mounting again the filesystem or after
the inode is evicted and loaded again.

This happens because at inode.c:btrfs_truncate_inode_items() we end up
setting last_size to the offset of the extent that we deleted and that
followed the hole. We then pass that value to btrfs_ordered_update_i_size()
which updates the inode's disk_i_size to a value smaller then the offset
of the buffered (delayed) writes.

Example reproducer:

 $ mkfs.btrfs -f /dev/sdb
 $ mount /dev/sdb /mnt

 $ xfs_io -f -c "pwrite -S 0x01 0K 32K" /mnt/foo
 $ xfs_io -d -c "pwrite -S 0x02 -b 32K 64K 32K" /mnt/foo
 $ xfs_io -c "truncate 60K" /mnt/foo
   --&gt; inode's disk_i_size updated to 0

 $ md5sum /mnt/foo
 3c5ca3c3ab42f4b04d7e7eb0b0d4d806  /mnt/foo

 $ umount /dev/sdb
 $ mount /dev/sdb /mnt

 $ md5sum /mnt/foo
 d41d8cd98f00b204e9800998ecf8427e  /mnt/foo
   --&gt; Empty file, all data lost!

Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs</title>
<updated>2017-02-11T17:15:58+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-02-11T17:15:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2b95550a4323e501e133dac1c9c9cad6ff17f4c1'/>
<id>2b95550a4323e501e133dac1c9c9cad6ff17f4c1</id>
<content type='text'>
Pull btrfs fixes from Chris Mason:
 "This has two last minute fixes. The highest priority here is a
  regression fix for the decompression code, but we also fixed up a
  problem with the 32-bit compat ioctls.

  The decompression bug could hand back the wrong data on big reads when
  zlib was used. I have a larger cleanup to make the math here less
  error prone, but at this stage in the release Omar's patch is the best
  choice"

* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix btrfs_decompress_buf2page()
  btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull btrfs fixes from Chris Mason:
 "This has two last minute fixes. The highest priority here is a
  regression fix for the decompression code, but we also fixed up a
  problem with the 32-bit compat ioctls.

  The decompression bug could hand back the wrong data on big reads when
  zlib was used. I have a larger cleanup to make the math here less
  error prone, but at this stage in the release Omar's patch is the best
  choice"

* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix btrfs_decompress_buf2page()
  btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: fix btrfs_decompress_buf2page()</title>
<updated>2017-02-11T03:11:03+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2017-02-10T23:03:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=6e78b3f7a193546b1c00a6d084596e774f147169'/>
<id>6e78b3f7a193546b1c00a6d084596e774f147169</id>
<content type='text'>
If btrfs_decompress_buf2page() is handed a bio with its page in the
middle of the working buffer, then we adjust the offset into the working
buffer. After we copy into the bio, we advance the iterator by the
number of bytes we copied. Then, we have some logic to handle the case
of discontiguous pages and adjust the offset into the working buffer
again. However, if we didn't advance the bio to a new page, we may enter
this case in error, essentially repeating the adjustment that we already
made when we entered the function. The end result is bogus data in the
bio.

Previously, we only checked for this case when we advanced to a new
page, but the conversion to bio iterators changed that. This restores
the old, correct behavior.

A case I saw when testing with zlib was:

    buf_start = 42769
    total_out = 46865
    working_bytes = total_out - buf_start = 4096
    start_byte = 45056

The condition (total_out &gt; start_byte &amp;&amp; buf_start &lt; start_byte) is
true, so we adjust the offset:

    buf_offset = start_byte - buf_start = 2287
    working_bytes -= buf_offset = 1809
    current_buf_start = buf_start = 42769

Then, we copy

    bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
    buf_offset += bytes = 4096
    working_bytes -= bytes = 0
    current_buf_start += bytes = 44578

After bio_advance(), we are still in the same page, so start_byte is the
same. Then, we check (total_out &gt; start_byte &amp;&amp; current_buf_start &lt; start_byte),
which is true! So, we adjust the values again:

    buf_offset = start_byte - buf_start = 2287
    working_bytes = total_out - start_byte = 1809
    current_buf_start = buf_start + buf_offset = 45056

But note that working_bytes was already zero before this, so we should
have stopped copying.

Fixes: 974b1adc3b10 ("btrfs: use bio iterators for the decompression handlers")
Reported-by: Pat Erley &lt;pat-lkml@erley.org&gt;
Reviewed-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Tested-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If btrfs_decompress_buf2page() is handed a bio with its page in the
middle of the working buffer, then we adjust the offset into the working
buffer. After we copy into the bio, we advance the iterator by the
number of bytes we copied. Then, we have some logic to handle the case
of discontiguous pages and adjust the offset into the working buffer
again. However, if we didn't advance the bio to a new page, we may enter
this case in error, essentially repeating the adjustment that we already
made when we entered the function. The end result is bogus data in the
bio.

Previously, we only checked for this case when we advanced to a new
page, but the conversion to bio iterators changed that. This restores
the old, correct behavior.

A case I saw when testing with zlib was:

    buf_start = 42769
    total_out = 46865
    working_bytes = total_out - buf_start = 4096
    start_byte = 45056

The condition (total_out &gt; start_byte &amp;&amp; buf_start &lt; start_byte) is
true, so we adjust the offset:

    buf_offset = start_byte - buf_start = 2287
    working_bytes -= buf_offset = 1809
    current_buf_start = buf_start = 42769

Then, we copy

    bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
    buf_offset += bytes = 4096
    working_bytes -= bytes = 0
    current_buf_start += bytes = 44578

After bio_advance(), we are still in the same page, so start_byte is the
same. Then, we check (total_out &gt; start_byte &amp;&amp; current_buf_start &lt; start_byte),
which is true! So, we adjust the values again:

    buf_offset = start_byte - buf_start = 2287
    working_bytes = total_out - start_byte = 1809
    current_buf_start = buf_start + buf_offset = 45056

But note that working_bytes was already zero before this, so we should
have stopped copying.

Fixes: 974b1adc3b10 ("btrfs: use bio iterators for the decompression handlers")
Reported-by: Pat Erley &lt;pat-lkml@erley.org&gt;
Reviewed-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Tested-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls</title>
<updated>2017-02-08T16:47:30+00:00</updated>
<author>
<name>Jeff Mahoney</name>
<email>jeffm@suse.com</email>
</author>
<published>2017-02-07T00:39:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2a362249187a8d0f6d942d6e1d763d150a296f47'/>
<id>2a362249187a8d0f6d942d6e1d763d150a296f47</id>
<content type='text'>
Commit 4c63c2454ef incorrectly assumed that returning -ENOIOCTLCMD would
cause the native ioctl to be called.  The -&gt;compat_ioctl callback is
expected to handle all ioctls, not just compat variants.  As a result,
when using 32-bit userspace on 64-bit kernels, everything except those
three ioctls would return -ENOTTY.

Fixes: 4c63c2454ef ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 4c63c2454ef incorrectly assumed that returning -ENOIOCTLCMD would
cause the native ioctl to be called.  The -&gt;compat_ioctl callback is
expected to handle all ioctls, not just compat variants.  As a result,
when using 32-bit userspace on 64-bit kernels, everything except those
three ioctls would return -ENOTTY.

Fixes: 4c63c2454ef ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs</title>
<updated>2017-01-27T20:41:46+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-01-27T20:41:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=5906374446386fd16fe562b042429d905d231ec3'/>
<id>5906374446386fd16fe562b042429d905d231ec3</id>
<content type='text'>
Pull btrfs updates from Chris Mason:
 "Some fixes that we've collected from the list.

  We still have one more pending to nail down a regression in lzo
  compression, but I wanted to get this batch out the door"

* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: remove -&gt;{get, set}_acl() from btrfs_dir_ro_inode_operations
  Btrfs: disable xattr operations on subvolume directories
  Btrfs: remove old tree_root case in btrfs_read_locked_inode()
  Btrfs: fix truncate down when no_holes feature is enabled
  Btrfs: Fix deadlock between direct IO and fast fsync
  btrfs: fix false enospc error when truncating heavily reflinked file
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull btrfs updates from Chris Mason:
 "Some fixes that we've collected from the list.

  We still have one more pending to nail down a regression in lzo
  compression, but I wanted to get this batch out the door"

* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: remove -&gt;{get, set}_acl() from btrfs_dir_ro_inode_operations
  Btrfs: disable xattr operations on subvolume directories
  Btrfs: remove old tree_root case in btrfs_read_locked_inode()
  Btrfs: fix truncate down when no_holes feature is enabled
  Btrfs: Fix deadlock between direct IO and fast fsync
  btrfs: fix false enospc error when truncating heavily reflinked file
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: remove -&gt;{get, set}_acl() from btrfs_dir_ro_inode_operations</title>
<updated>2017-01-26T23:48:56+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2017-01-26T01:06:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=57b59ed2e5b91e958843609c7884794e29e6c4cb'/>
<id>57b59ed2e5b91e958843609c7884794e29e6c4cb</id>
<content type='text'>
Subvolume directory inodes can't have ACLs.

Cc: &lt;stable@vger.kernel.org&gt; # 4.9.x
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Subvolume directory inodes can't have ACLs.

Cc: &lt;stable@vger.kernel.org&gt; # 4.9.x
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: disable xattr operations on subvolume directories</title>
<updated>2017-01-26T23:48:55+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2017-01-26T01:06:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=1fdf41941b8010691679638f8d0c8d08cfee7726'/>
<id>1fdf41941b8010691679638f8d0c8d08cfee7726</id>
<content type='text'>
When you snapshot a subvolume containing a subvolume, you get a
placeholder directory where the subvolume would be. These directory
inodes have -&gt;i_ops set to btrfs_dir_ro_inode_operations. Previously,
these i_ops didn't include the xattr operation callbacks. The conversion
to xattr_handlers missed this case, leading to bogus attempts to set
xattrs on these inodes. This manifested itself as failures when running
delayed inodes.

To fix this, clear IOP_XATTR in -&gt;i_opflags on these inodes.

Fixes: 6c6ef9f26e59 ("xattr: Stop calling {get,set,remove}xattr inode operations")
Cc: Andreas Gruenbacher &lt;agruenba@redhat.com&gt;
Reported-by: Chris Murphy &lt;lists@colorremedies.com&gt;
Tested-by: Chris Murphy &lt;lists@colorremedies.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 4.9.x
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When you snapshot a subvolume containing a subvolume, you get a
placeholder directory where the subvolume would be. These directory
inodes have -&gt;i_ops set to btrfs_dir_ro_inode_operations. Previously,
these i_ops didn't include the xattr operation callbacks. The conversion
to xattr_handlers missed this case, leading to bogus attempts to set
xattrs on these inodes. This manifested itself as failures when running
delayed inodes.

To fix this, clear IOP_XATTR in -&gt;i_opflags on these inodes.

Fixes: 6c6ef9f26e59 ("xattr: Stop calling {get,set,remove}xattr inode operations")
Cc: Andreas Gruenbacher &lt;agruenba@redhat.com&gt;
Reported-by: Chris Murphy &lt;lists@colorremedies.com&gt;
Tested-by: Chris Murphy &lt;lists@colorremedies.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 4.9.x
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: remove old tree_root case in btrfs_read_locked_inode()</title>
<updated>2017-01-26T23:48:55+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2017-01-26T01:06:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=67ade058ef2c65a3e56878af9c293ec76722a2e5'/>
<id>67ade058ef2c65a3e56878af9c293ec76722a2e5</id>
<content type='text'>
As Jeff explained in c2951f32d36c ("btrfs: remove old tree_root dirent
processing in btrfs_real_readdir()"), supporting this old format is no
longer necessary since the Btrfs magic number has been updated since we
changed to the current format. There are other places where we still
handle this old format, but since this is part of a fix that is going to
stable, I'm only removing this one for now.

Cc: &lt;stable@vger.kernel.org&gt; # 4.9.x
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As Jeff explained in c2951f32d36c ("btrfs: remove old tree_root dirent
processing in btrfs_real_readdir()"), supporting this old format is no
longer necessary since the Btrfs magic number has been updated since we
changed to the current format. There are other places where we still
handle this old format, but since this is part of a fix that is going to
stable, I'm only removing this one for now.

Cc: &lt;stable@vger.kernel.org&gt; # 4.9.x
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: fix truncate down when no_holes feature is enabled</title>
<updated>2017-01-19T17:02:22+00:00</updated>
<author>
<name>Liu Bo</name>
<email>bo.li.liu@oracle.com</email>
</author>
<published>2016-12-01T21:43:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=91298eec05cd8d4e828cf7ee5d4a6334f70cf69a'/>
<id>91298eec05cd8d4e828cf7ee5d4a6334f70cf69a</id>
<content type='text'>
For such a file mapping,

[0-4k][hole][8k-12k]

In NO_HOLES mode, we don't have the [hole] extent any more.
Commit c1aa45759e90 ("Btrfs: fix shrinking truncate when the no_holes feature is enabled")
 fixed disk isize not being updated in NO_HOLES mode when data is not flushed.

However, even if data has been flushed, we can still have trouble
in updating disk isize since we updated disk isize to 'start' of
the last evicted extent.

Reviewed-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For such a file mapping,

[0-4k][hole][8k-12k]

In NO_HOLES mode, we don't have the [hole] extent any more.
Commit c1aa45759e90 ("Btrfs: fix shrinking truncate when the no_holes feature is enabled")
 fixed disk isize not being updated in NO_HOLES mode when data is not flushed.

However, even if data has been flushed, we can still have trouble
in updating disk isize since we updated disk isize to 'start' of
the last evicted extent.

Reviewed-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: Fix deadlock between direct IO and fast fsync</title>
<updated>2017-01-19T17:01:02+00:00</updated>
<author>
<name>Chandan Rajendra</name>
<email>chandan@linux.vnet.ibm.com</email>
</author>
<published>2016-12-23T09:30:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=97dcdea076ecef41ea4aaa23d4397c2f622e4265'/>
<id>97dcdea076ecef41ea4aaa23d4397c2f622e4265</id>
<content type='text'>
The following deadlock is seen when executing generic/113 test,

 ---------------------------------------------------------+----------------------------------------------------
  Direct I/O task                                           Fast fsync task
 ---------------------------------------------------------+----------------------------------------------------
  btrfs_direct_IO
    __blockdev_direct_IO
     do_blockdev_direct_IO
      do_direct_IO
       btrfs_get_blocks_direct
        while (blocks needs to written)
         get_more_blocks (first iteration)
          btrfs_get_blocks_direct
           btrfs_create_dio_extent
             down_read(&amp;BTRFS_I(inode) &gt;dio_sem)
             Create and add extent map and ordered extent
             up_read(&amp;BTRFS_I(inode) &gt;dio_sem)
                                                            btrfs_sync_file
                                                              btrfs_log_dentry_safe
                                                               btrfs_log_inode_parent
                                                                btrfs_log_inode
                                                                 btrfs_log_changed_extents
                                                                  down_write(&amp;BTRFS_I(inode) &gt;dio_sem)
                                                                   Collect new extent maps and ordered extents
                                                                    wait for ordered extent completion
         get_more_blocks (second iteration)
          btrfs_get_blocks_direct
           btrfs_create_dio_extent
             down_read(&amp;BTRFS_I(inode) &gt;dio_sem)
 --------------------------------------------------------------------------------------------------------------

In the above description, Btrfs direct I/O code path has not yet started
submitting bios for file range covered by the initial ordered
extent. Meanwhile, The fast fsync task obtains the write semaphore and
waits for I/O on the ordered extent to get completed. However, the
Direct I/O task is now blocked on obtaining the read semaphore.

To resolve the deadlock, this commit modifies the Direct I/O code path
to obtain the read semaphore before invoking
__blockdev_direct_IO(). The semaphore is then given up after
__blockdev_direct_IO() returns. This allows the Direct I/O code to
complete I/O on all the ordered extents it creates.

Signed-off-by: Chandan Rajendra &lt;chandan@linux.vnet.ibm.com&gt;
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following deadlock is seen when executing generic/113 test,

 ---------------------------------------------------------+----------------------------------------------------
  Direct I/O task                                           Fast fsync task
 ---------------------------------------------------------+----------------------------------------------------
  btrfs_direct_IO
    __blockdev_direct_IO
     do_blockdev_direct_IO
      do_direct_IO
       btrfs_get_blocks_direct
        while (blocks needs to written)
         get_more_blocks (first iteration)
          btrfs_get_blocks_direct
           btrfs_create_dio_extent
             down_read(&amp;BTRFS_I(inode) &gt;dio_sem)
             Create and add extent map and ordered extent
             up_read(&amp;BTRFS_I(inode) &gt;dio_sem)
                                                            btrfs_sync_file
                                                              btrfs_log_dentry_safe
                                                               btrfs_log_inode_parent
                                                                btrfs_log_inode
                                                                 btrfs_log_changed_extents
                                                                  down_write(&amp;BTRFS_I(inode) &gt;dio_sem)
                                                                   Collect new extent maps and ordered extents
                                                                    wait for ordered extent completion
         get_more_blocks (second iteration)
          btrfs_get_blocks_direct
           btrfs_create_dio_extent
             down_read(&amp;BTRFS_I(inode) &gt;dio_sem)
 --------------------------------------------------------------------------------------------------------------

In the above description, Btrfs direct I/O code path has not yet started
submitting bios for file range covered by the initial ordered
extent. Meanwhile, The fast fsync task obtains the write semaphore and
waits for I/O on the ordered extent to get completed. However, the
Direct I/O task is now blocked on obtaining the read semaphore.

To resolve the deadlock, this commit modifies the Direct I/O code path
to obtain the read semaphore before invoking
__blockdev_direct_IO(). The semaphore is then given up after
__blockdev_direct_IO() returns. This allows the Direct I/O code to
complete I/O on all the ordered extents it creates.

Signed-off-by: Chandan Rajendra &lt;chandan@linux.vnet.ibm.com&gt;
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
