linux.git/fs/xfs, branch v3.2.21

xfs: Fix oops on IO error during xlog_recover_process_iunlinks()

2012-04-02T16:53:06+00:00

commit d97d32edcd732110758799ae60af725e5110b3dc upstream.

When an IO error happens during inode deletion run from
xlog_recover_process_iunlinks() filesystem gets shutdown. Thus any subsequent
attempt to read buffers fails. Code in xlog_recover_process_iunlinks() does not
count with the fact that read of a buffer which was read a while ago can
really fail which results in the oops on
  agi = XFS_BUF_TO_AGI(agibp);

Fix the problem by cleaning up the buffer handling in
xlog_recover_process_iunlinks() as suggested by Dave Chinner. We release buffer
lock but keep buffer reference to AG buffer. That is enough for buffer to stay
pinned in memory and we don't have to call xfs_read_agi() all the time.

Signed-off-by: Jan Kara 
Reviewed-by: Dave Chinner 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: fix inode lookup race

2012-04-02T16:52:50+00:00

commit f30d500f809eca67a21704347ab14bb35877b5ee upstream.

When we get concurrent lookups of the same inode that is not in the
per-AG inode cache, there is a race condition that triggers warnings
in unlock_new_inode() indicating that we are initialising an inode
that isn't in a the correct state for a new inode.

When we do an inode lookup via a file handle or a bulkstat, we don't
serialise lookups at a higher level through the dentry cache (i.e.
pathless lookup), and so we can get concurrent lookups of the same
inode.

The race condition is between the insertion of the inode into the
cache in the case of a cache miss and a concurrently lookup:

Thread 1			Thread 2
xfs_iget()
  xfs_iget_cache_miss()
    xfs_iread()
    lock radix tree
    radix_tree_insert()
				rcu_read_lock
				radix_tree_lookup
				lock inode flags
				XFS_INEW not set
				igrab()
				unlock inode flags
				rcu_read_unlock
				use uninitialised inode
				.....
    lock inode flags
    set XFS_INEW
    unlock inode flags
    unlock radix tree
  xfs_setup_inode()
    inode flags = I_NEW
    unlock_new_inode()
      WARNING as inode flags != I_NEW

This can lead to inode corruption, inode list corruption, etc, and
is generally a bad thing to occur.

Fix this by setting XFS_INEW before inserting the inode into the
radix tree. This will ensure any concurrent lookup will find the new
inode with XFS_INEW set and that forces the lookup to wait until the
XFS_INEW flag is removed before allowing the lookup to succeed.

Signed-off-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()

2012-02-03T17:21:27+00:00

commit 9b025eb3a89e041bab6698e3858706be2385d692 upstream.

Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted
symplink and bailed out. Fix it by jumping to 'out' instead of doing return.

CC: Carlos Maiolino 
Signed-off-by: Jan Kara 
Reviewed-by: Alex Elder 
Reviewed-by: Dave Chinner 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: fix endian conversion issue in discard code

2012-01-26T00:13:55+00:00

commit b1c770c273a4787069306fc82aab245e9ac72e9d upstream

When finding the longest extent in an AG, we read the value directly
out of the AGF buffer without endian conversion. This will give an
incorrect length, resulting in FITRIM operations potentially not
trimming everything that it should.

Signed-off-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: fix acl count validation in xfs_acl_from_disk()

2012-01-12T19:29:46+00:00

commit 093019cf1b18dd31b2c3b77acce4e000e2cbc9ce upstream.

Commit fa8b18ed didn't prevent the integer overflow and possible
memory corruption.  "count" can go negative and bypass the check.

Signed-off-by: Xi Wang 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: log all dirty inodes in xfs_fs_sync_fs

2011-12-23T22:41:47+00:00

Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync().  Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.

The older_than_this checks that are now more strictly enforced since

    writeback: avoid livelocking WB_SYNC_ALL writeback

by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once.  But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened.  I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.

Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty.  This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Tested-by: Mark Tinguely 
Reviewed-by: Mark Tinguely 
Signed-off-by: Ben Myers

xfs: log the inode in ->write_inode calls for kupdate

2011-12-23T22:41:47+00:00

If the writeback code writes back an inode because it has expired we currently
use the non-blockin ->write_inode path.  This means any inode that is pinned
is skipped.  With delayed logging and a workload that has very little log
traffic otherwise it is very likely that an inode that gets constantly
written to is always pinned, and thus we keep refusing to write it.  The VM
writeback code at that point redirties it and doesn't try to write it again
for another 30 seconds.  This means under certain scenarious time based
metadata writeback never happens.

Fix this by calling into xfs_log_inode for kupdate in addition to data
integrity syncs, and thus transfer the inode to the log ASAP.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Tested-by: Mark Tinguely 
Reviewed-by: Mark Tinguely 
Signed-off-by: Ben Myers

xfs: fix the logspace waiting algorithm

2011-12-06T20:19:47+00:00

Apply the scheme used in log_regrant_write_log_space to wake up any other
threads waiting for log space before the newly added one to
log_regrant_write_log_space as well, and factor the code into readable
helpers.  For each of the queues we have add two helpers:

 - one to try to wake up all waiting threads.  This helper will also be
   usable by xfs_log_move_tail once we remove the current opportunistic
   wakeups in it.
 - one to sleep on t_wait until enough log space is available, loosely
   modelled after Linux waitqueues.
 
And use them to reimplement the guts of log_regrant_write_log_space and
log_regrant_write_log_space.  These two function now use one and the same
algorithm for waiting on log space instead of subtly different ones before,
with an option to completely unify them in the near future.

Also move the filesystem shutdown handling to the common caller given
that we had to touch it anyway.

Based on hard debugging and an earlier patch from
Chandra Seetharaman .

Signed-off-by: Christoph Hellwig 
Reviewed-by: Chandra Seetharaman 
Tested-by: Chandra Seetharaman 
Signed-off-by: Ben Myers

xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels

2011-12-06T16:46:23+00:00

The i_ino field in the VFS inode is of type unsigned long and thus can't
hold the full 64-bit inode number on 32-bit kernels.  We have the full
inode number in the XFS inode, so use that one for nfs exports.  Note
that I've also switched the 32-bit file handles types to it, just to make
the code more consistent and copy & paste errors less likely to happen.

Reported-by: Guoquan Yang 
Reported-by: Hank Peng 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Ben Myers

xfs: fix allocation length overflow in xfs_bmapi_write()

2011-12-02T22:24:02+00:00

When testing the new xfstests --large-fs option that does very large
file preallocations, this assert was tripped deep in
xfs_alloc_vextent():

XFS: Assertion failed: args->minlen <= args->maxlen, file: fs/xfs/xfs_alloc.c, line: 2239

The allocation was trying to allocate a zero length extent because
the lower 32 bits of the allocation length was zero. The remaining
length of the allocation to be done was an exact multiple of 2^32 -
the first case I saw was at 496TB remaining to be allocated.

This turns out to be an overflow when converting the allocation
length (a 64 bit quantity) into the extent length to allocate (a 32
bit quantity), and it requires the length to be allocated an exact
multiple of 2^32 blocks to trip the assert.

Fix it by limiting the extent lenth to allocate to MAXEXTLEN.

Signed-off-by: Dave Chinner 
Signed-off-by: Ben Myers 
Reviewed-by: Christoph Hellwig