linux.git/fs, branch v4.6.3

fix d_walk()/non-delayed __d_free() race

2016-06-24T17:22:04+00:00

commit 3d56c25e3bb0726a5c5e16fc2d9e38f8ed763085 upstream.

Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay.  Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.

Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover.  Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.

Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

proc: prevent stacking filesystems on top

2016-06-24T17:22:04+00:00

commit e54ad7f1ee263ffa5a2de9c609d58dfa27b21cd9 upstream.

This prevents stacking filesystems (ecryptfs and overlayfs) from using
procfs as lower filesystem.  There is too much magic going on inside
procfs, and there is no good reason to stack stuff on top of procfs.

(For example, procfs does access checks in VFS open handlers, and
ecryptfs by design calls open handlers from a kernel thread that doesn't
drop privileges or so.)

Signed-off-by: Jann Horn 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ecryptfs: forbid opening files without mmap handler

2016-06-24T17:22:03+00:00

commit 2f36db71009304b3f0b95afacd8eba1f9f046b87 upstream.

This prevents users from triggering a stack overflow through a recursive
invocation of pagefault handling that involves mapping procfs files into
virtual memory.

Signed-off-by: Jann Horn 
Acked-by: Tyler Hicks 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

xfs: skip stale inodes in xfs_iflush_cluster

2016-06-08T01:23:43+00:00

commit 7d3aa7fe970791f1a674b14572a411accf2f4d4e upstream.

We don't write back stale inodes so we should skip them in
xfs_iflush_cluster, too.

Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: fix inode validity check in xfs_iflush_cluster

2016-06-08T01:23:43+00:00

commit 51b07f30a71c27405259a0248206ed4e22adbee2 upstream.

Some careless idiot(*) wrote crap code in commit 1a3e8f3 ("xfs:
convert inode cache lookups to use RCU locking") back in late 2010,
and so xfs_iflush_cluster checks the wrong inode for whether it is
still valid under RCU protection. Fix it to lock and check the
correct inode.

(*) Careless-idiot: Dave Chinner 

Discovered-by: Brain Foster 
Signed-off-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: xfs_iflush_cluster fails to abort on error

2016-06-08T01:23:43+00:00

commit b1438f477934f5a4d5a44df26f3079a7575d5946 upstream.

When a failure due to an inode buffer occurs, the error handling
fails to abort the inode writeback correctly. This can result in the
inode being reclaimed whilst still in the AIL, leading to
use-after-free situations as well as filesystems that cannot be
unmounted as the inode log items left in the AIL never get removed.

Fix this by ensuring fatal errors from xfs_imap_to_bp() result in
the inode flush being aborted correctly.

Reported-by: Shyam Kaushik 
Diagnosed-by: Shyam Kaushik 
Tested-by: Shyam Kaushik 
Signed-off-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: remove xfs_fs_evict_inode()

2016-06-08T01:23:43+00:00

commit 8179c03629de67f515d3ab825b5a9428687d4b85 upstream.

Joe Lawrence reported a list_add corruption with 4.6-rc1 when
testing some custom md administration code that made it's own
block device nodes for the md array. The simple test loop of:

for i in {0..100}; do
	mknod --mode=0600 $tmp/tmp_node b $MAJOR $MINOR
	mdadm --detail --export $tmp/tmp_node > /dev/null
	rm -f $tmp/tmp_node
done


Would produce this warning in bd_acquire() when mdadm opened the
device node:

list_add double add: new=ffff88043831c7b8, prev=ffff8804380287d8, next=ffff88043831c7b8.

And then produce this from bd_forget from kdevtmpfs evicting a block
dev inode:

list_del corruption. prev->next should be ffff8800bb83eb10, but was ffff88043831c7b8

This is a regression caused by commit c19b3b05 ("xfs: mode di_mode
to vfs inode"). The issue is that xfs_inactive() frees the
unlinked inode, and the above commit meant that this freeing zeroed
the mode in the struct inode. The problem is that after evict() has
called ->evict_inode, it expects the i_mode to be intact so that it
can call bd_forget() or cd_forget() to drop the reference to the
block device inode attached to the XFS inode.

In reality, the only thing we do in xfs_fs_evict_inode() that is not
generic is call xfs_inactive(). We can move the xfs_inactive() call
to xfs_fs_destroy_inode() without any problems at all, and this
will leave the VFS inode intact until it is completely done with it.

So, remove xfs_fs_evict_inode(), and do the work it used to do in
->destroy_inode instead.

Reported-by: Joe Lawrence 
Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: Don't wrap growfs AGFL indexes

2016-06-08T01:23:43+00:00

commit ad747e3b299671e1a53db74963cc6c5f6cdb9f6d upstream.

Commit 96f859d ("libxfs: pack the agfl header structure so
XFS_AGFL_SIZE is correct") allowed the freelist to use the empty
slot at the end of the freelist on 64 bit systems that was not
being used due to sizeof() rounding up the structure size.

This has caused versions of xfs_repair prior to 4.5.0 (which also
has the fix) to report this as a corruption once the filesystem has
been grown. Older kernels can also have problems (seen from a whacky
container/vm management environment) mounting filesystems grown on a
system with a newer kernel than the vm/container it is deployed on.

To avoid this problem, change the initial free list indexes not to
wrap across the end of the AGFL, hence avoiding the initialisation
of agf_fllast to the last index in the AGFL.

Signed-off-by: Dave Chinner 
Reviewed-by: Carlos Maiolino 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: disallow rw remount on fs with unknown ro-compat features

2016-06-08T01:23:43+00:00

commit d0a58e833931234c44e515b5b8bede32bd4e6eed upstream.

Today, a kernel which refuses to mount a filesystem read-write
due to unknown ro-compat features can still transition to read-write
via the remount path.  The old kernel is most likely none the wiser,
because it's unaware of the new feature, and isn't using it.  However,
writing to the filesystem may well corrupt metadata related to that
new feature, and moving to a newer kernel which understand the feature
will have problems.

Right now the only ro-compat feature we have is the free inode btree,
which showed up in v3.16.  It would be good to push this back to
all the active stable kernels, I think, so that if anyone is using
newer mkfs (which enables the finobt feature) with older kernel
releases, they'll be protected.

Signed-off-by: Eric Sandeen 
Reviewed-by: Bill O'Donnell 
Reviewed-by: Dave Chinner 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

nfs: avoid race that crashes nfs_init_commit

2016-06-08T01:23:42+00:00

commit ade8febde0271513360bac44883dbebad44276c3 upstream.

Since the patch "NFS: Allow multiple commit requests in flight per file"
we can run multiple simultaneous commits on the same inode.  This
introduced a race over collecting pages to commit that made it possible
to call nfs_init_commit() with an empty list - which causes crashes like
the one below.

The fix is to catch this race and avoid calling nfs_init_commit and
initiate_commit when there is no work to do.

Here is the crash:

[600522.076832] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[600522.078475] IP: [] nfs_init_commit+0x22/0x130 [nfs]
[600522.078745] PGD 4272b1067 PUD 4272cb067 PMD 0
[600522.078972] Oops: 0000 [#1] SMP
[600522.079204] Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dcdbas ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vmw_vsock_vmci_transport vsock bonding ipmi_devintf ipmi_msghandler coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev vmw_balloon parport_pc parport acpi_cpufreq vmw_vmci i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw vmxnet3
[600522.081380]  vmw_pvscsi ata_generic pata_acpi
[600522.081809] CPU: 3 PID: 15667 Comm: /usr/bin/python Not tainted 4.1.9-100.pd.88.el7.x86_64 #1
[600522.082281] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[600522.082814] task: ffff8800bbbfa780 ti: ffff88042ae84000 task.ti: ffff88042ae84000
[600522.083378] RIP: 0010:[]  [] nfs_init_commit+0x22/0x130 [nfs]
[600522.083973] RSP: 0018:ffff88042ae87438  EFLAGS: 00010246
[600522.084571] RAX: 0000000000000000 RBX: ffff880003485e40 RCX: ffff88042ae87588
[600522.085188] RDX: 0000000000000000 RSI: ffff88042ae874b0 RDI: ffff880003485e40
[600522.085756] RBP: ffff88042ae87448 R08: ffff880003486010 R09: ffff88042ae874b0
[600522.086332] R10: 0000000000000000 R11: 0000000000000005 R12: ffff88042ae872d0
[600522.086905] R13: ffff88042ae874b0 R14: ffff880003485e40 R15: ffff88042704c840
[600522.087484] FS:  00007f4728ff2740(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
[600522.088070] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[600522.088663] CR2: 0000000000000040 CR3: 000000042b6aa000 CR4: 00000000001406e0
[600522.089327] Stack:
[600522.089926]  0000000000000001 ffff88042ae87588 ffff88042ae874f8 ffffffffa04f09fa
[600522.090549]  0000000000017840 0000000000017840 ffff88042ae87588 ffff8803258d9930
[600522.091169]  ffff88042ae87578 ffffffffa0563d80 0000000000000000 ffff88042704c840
[600522.091789] Call Trace:
[600522.092420]  [] pnfs_generic_commit_pagelist+0x1da/0x320 [nfsv4]
[600522.093052]  [] ? ff_layout_commit_prepare_v3+0x30/0x30 [nfs_layout_flexfiles]
[600522.093696]  [] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles]
[600522.094359]  [] nfs_generic_commit_list+0xe8/0x120 [nfs]
[600522.095032]  [] nfs_commit_inode+0xba/0x110 [nfs]
[600522.095719]  [] nfs_release_page+0x44/0xd0 [nfs]
[600522.096410]  [] try_to_release_page+0x32/0x50
[600522.097109]  [] shrink_page_list+0x961/0xb30
[600522.097812]  [] shrink_inactive_list+0x1cd/0x550
[600522.098530]  [] shrink_lruvec+0x635/0x840
[600522.099250]  [] shrink_zone+0xf0/0x2f0
[600522.099974]  [] do_try_to_free_pages+0x192/0x470
[600522.100709]  [] try_to_free_pages+0xda/0x170
[600522.101464]  [] __alloc_pages_nodemask+0x588/0x970
[600522.102235]  [] alloc_pages_vma+0xb5/0x230
[600522.103000]  [] ? cpumask_any_but+0x39/0x50
[600522.103774]  [] wp_page_copy.isra.55+0x95/0x490
[600522.104558]  [] ? __wake_up+0x48/0x60
[600522.105357]  [] do_wp_page+0xab/0x4f0
[600522.106137]  [] ? release_task+0x36b/0x470
[600522.106902]  [] ? eventfd_ctx_read+0x67/0x1c0
[600522.107659]  [] handle_mm_fault+0xc78/0x1900
[600522.108431]  [] __do_page_fault+0x181/0x420
[600522.109173]  [] ? __audit_syscall_exit+0x1e6/0x280
[600522.109893]  [] do_page_fault+0x30/0x80
[600522.110594]  [] ? syscall_trace_leave+0xc6/0x120
[600522.111288]  [] page_fault+0x28/0x30
[600522.111947] Code: 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 4c 8d 87 d0 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce <48> 8b 40 40 4c 8b 50 30 74 24 48 8b 87 d0 01 00 00 48 8b 7e 08
[600522.113343] RIP  [] nfs_init_commit+0x22/0x130 [nfs]
[600522.114003]  RSP 
[600522.114636] CR2: 0000000000000040

Fixes: af7cf057 (NFS: Allow multiple commit requests in flight per file)
Signed-off-by: Weston Andros Adamson 
Signed-off-by: Anna Schumaker 
Signed-off-by: Greg Kroah-Hartman