Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 8286f8b622990194207df9ab852e0f87c60d35e9 ]
The error flow in nfsd4_copy() calls cleanup_async_copy(), which
already decrements nn->pending_async_copies.
Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 63fab04cbd0f96191b6e5beedc3b643b01c15889 ]
Ensure the refcount and async_copies fields are initialized early.
cleanup_async_copy() will reference these fields if an error occurs
in nfsd4_copy(). If they are not correctly initialized, at the very
least, a refcount underflow occurs.
Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a upstream.
There is a race between laundromat handling of revoked delegations
and a client sending free_stateid operation. Laundromat thread
finds that delegation has expired and needs to be revoked so it
marks the delegation stid revoked and it puts it on a reaper list
but then it unlock the state lock and the actual delegation revocation
happens without the lock. Once the stid is marked revoked a racing
free_stateid processing thread does the following (1) it calls
list_del_init() which removes it from the reaper list and (2) frees
the delegation stid structure. The laundromat thread ends up not
calling the revoke_delegation() function for this particular delegation
but that means it will no release the lock lease that exists on
the file.
Now, a new open for this file comes in and ends up finding that
lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
up trying to derefence a freed delegation stateid. Leading to the
followint use-after-free KASAN warning:
kernel: ==================================================================
kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
kernel:
kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
kernel: Call trace:
kernel: dump_backtrace+0x98/0x120
kernel: show_stack+0x1c/0x30
kernel: dump_stack_lvl+0x80/0xe8
kernel: print_address_description.constprop.0+0x84/0x390
kernel: print_report+0xa4/0x268
kernel: kasan_report+0xb4/0xf8
kernel: __asan_report_load8_noabort+0x1c/0x28
kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
kernel: nfsd4_open+0xa08/0xe80 [nfsd]
kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
kernel: svc_process+0x3d4/0x7e0 [sunrpc]
kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
kernel: nfsd+0x270/0x400 [nfsd]
kernel: kthread+0x288/0x310
kernel: ret_from_fork+0x10/0x20
This patch proposes a fixed that's based on adding 2 new additional
stid's sc_status values that help coordinate between the laundromat
and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()).
First to make sure, that once the stid is marked revoked, it is not
removed by the nfsd4_free_stateid(), the laundromat take a reference
on the stateid. Then, coordinating whether the stid has been put
on the cl_revoked list or we are processing FREE_STATEID and need to
make sure to remove it from the list, each check that state and act
accordingly. If laundromat has added to the cl_revoke list before
the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove
it from the list. If nfsd4_free_stateid() finds that operations arrived
before laundromat has placed it on cl_revoke list, it marks the state
freed and then laundromat will no longer add it to the list.
Also, for nfsd4_delegreturn() when looking for the specified stid,
we need to access stid that are marked removed or freeable, it means
the laundromat has started processing it but hasn't finished and this
delegreturn needs to return nfserr_deleg_revoked and not
nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the
lack of it will leave this stid on the cl_revoked list indefinitely.
Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock")
CC: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d5ff2fb2e7167e9483846e34148e60c0c016a1f6 ]
In the normal case, when we excute `echo 0 > /proc/fs/nfsd/threads`, the
function `nfs4_state_destroy_net` in `nfs4_state_shutdown_net` will
release all resources related to the hashed `nfs4_client`. If the
`nfsd_client_shrinker` is running concurrently, the `expire_client`
function will first unhash this client and then destroy it. This can
lead to the following warning. Additionally, numerous use-after-free
errors may occur as well.
nfsd_client_shrinker echo 0 > /proc/fs/nfsd/threads
expire_client nfsd_shutdown_net
unhash_client ...
nfs4_state_shutdown_net
/* won't wait shrinker exit */
/* cancel_work(&nn->nfsd_shrinker_work)
* nfsd_file for this /* won't destroy unhashed client1 */
* client1 still alive nfs4_state_destroy_net
*/
nfsd_file_cache_shutdown
/* trigger warning */
kmem_cache_destroy(nfsd_file_slab)
kmem_cache_destroy(nfsd_file_mark_slab)
/* release nfsd_file and mark */
__destroy_client
====================================================================
BUG nfsd_file (Not tainted): Objects remaining in nfsd_file on
__kmem_cache_shutdown()
--------------------------------------------------------------------
CPU: 4 UID: 0 PID: 764 Comm: sh Not tainted 6.12.0-rc3+ #1
dump_stack_lvl+0x53/0x70
slab_err+0xb0/0xf0
__kmem_cache_shutdown+0x15c/0x310
kmem_cache_destroy+0x66/0x160
nfsd_file_cache_shutdown+0xac/0x210 [nfsd]
nfsd_destroy_serv+0x251/0x2a0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
====================================================================
BUG nfsd_file_mark (Tainted: G B W ): Objects remaining
nfsd_file_mark on __kmem_cache_shutdown()
--------------------------------------------------------------------
dump_stack_lvl+0x53/0x70
slab_err+0xb0/0xf0
__kmem_cache_shutdown+0x15c/0x310
kmem_cache_destroy+0x66/0x160
nfsd_file_cache_shutdown+0xc8/0x210 [nfsd]
nfsd_destroy_serv+0x251/0x2a0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
To resolve this issue, cancel `nfsd_shrinker_work` using synchronous
mode in nfs4_state_shutdown_net.
Fixes: 7c24fa225081 ("NFSD: replace delayed_work with work_struct for nfsd_client_shrinker")
Signed-off-by: Yang Erkun <yangerkun@huaweicloud.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c88c150a467fcb670a1608e2272beeee3e86df6e ]
When multiple FREE_STATEIDs are sent for the same delegation stateid,
it can lead to a possible either use-after-free or counter refcount
underflow errors.
In nfsd4_free_stateid() under the client lock we find a delegation
stateid, however the code drops the lock before calling nfs4_put_stid(),
that allows another FREE_STATE to find the stateid again. The first one
will proceed to then free the stateid which leads to either
use-after-free or decrementing already zeroed counter.
Fixes: 3f29cc82a84c ("nfsd: split sc_status out of sc_type")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
failed
[ Upstream commit 53e4e17557049d7688ca9dadeae80864d40cf0b7 ]
If nfsd_startup_net() fails and so ->nfsd_net_up is false,
nfsd_destroy_serv() doesn't currently call svc_destroy(). It should.
Fixes: 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dc0d0f885aa422f621bc1c2124133eff566b0bc8 ]
NeilBrown says:
> The handling of NFSD_FILE_CACHE_UP is strange. nfsd_file_cache_init()
> sets it, but doesn't clear it on failure. So if nfsd_file_cache_init()
> fails for some reason, nfsd_file_cache_shutdown() would still try to
> clean up if it was called.
Reported-by: NeilBrown <neilb@suse.de>
Fixes: c7b824c3d06c ("NFSD: Replace the "init once" mechanism")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit aadc3bbea163b6caaaebfdd2b6c4667fbc726752 ]
Nothing appears to limit the number of concurrent async COPY
operations that clients can start. In addition, AFAICT each async
COPY can copy an unlimited number of 4MB chunks, so can run for a
long time. Thus IMO async COPY can become a DoS vector.
Add a restriction mechanism that bounds the number of concurrent
background COPY operations. Start simple and try to be fair -- this
patch implements a per-namespace limit.
An async COPY request that occurs while this limit is exceeded gets
NFS4ERR_DELAY. The requesting client can choose to send the request
again after a delay or fall back to a traditional read/write style
copy.
If there is need to make the mechanism more sophisticated, we can
visit that in future patches.
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9ed666eba4e0a2bb8ffaa3739d830b64d4f2aaad ]
Currently, when NFSD handles an asynchronous COPY, it returns a
zero write verifier, relying on the subsequent CB_OFFLOAD callback
to pass the write verifier and a stable_how4 value to the client.
However, if the CB_OFFLOAD never arrives at the client (for example,
if a network partition occurs just as the server sends the
CB_OFFLOAD operation), the client will never receive this verifier.
Thus, if the client sends a follow-up COMMIT, there is no way for
the client to assess the COMMIT result.
The usual recovery for a missing CB_OFFLOAD is for the client to
send an OFFLOAD_STATUS operation, but that operation does not carry
a write verifier in its result. Neither does it carry a stable_how4
value, so the client /must/ send a COMMIT in this case -- which will
always fail because currently there's still no write verifier in the
COPY result.
Thus the server needs to return a normal write verifier in its COPY
result even if the COPY operation is to be performed asynchronously.
If the server recognizes the callback stateid in subsequent
OFFLOAD_STATUS operations, then obviously it has not restarted, and
the write verifier the client received in the COPY result is still
valid and can be used to assess a COMMIT of the copied data, if one
is needed.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 60749cbe3d8ae572a6c7dda675de3e8b25797a18 ]
sp_nrthreads is only ever accessed under the service mutex
nlmsvc_mutex nfs_callback_mutex nfsd_mutex
so these is no need for it to be an atomic_t.
The fact that all code using it is single-threaded means that we can
simplify svc_pool_victim and remove the temporary elevation of
sp_nrthreads.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 202f39039a11402dcbcd5fece8d9fa6be83f49ae upstream.
According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH.
Replace the XDR decoder for PUTPUBFH with a "noop" since we no
longer want the minorversion check, and PUTPUBFH has no arguments to
decode. (Ideally nfsd4_decode_noop should really be called
nfsd4_decode_void).
PUTPUBFH should now behave just like PUTROOTFH.
Reported-by: Cedric Blancher <cedric.blancher@gmail.com>
Fixes: e1a90ebd8b23 ("NFSD: Combine decode operations for v4 and v4.1")
Cc: Dan Shelton <dan.f.shelton@gmail.com>
Cc: Roland Mainz <roland.mainz@nrubsig.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 340e61e44c1d2a15c42ec72ade9195ad525fd048 upstream.
Ext4 will throw -EBADMSG through ext4_readdir when a checksum error
occurs, resulting in the following WARNING.
Fix it by mapping EBADMSG to nfserr_io.
nfsd_buffered_readdir
iterate_dir // -EBADMSG -74
ext4_readdir // .iterate_shared
ext4_dx_readdir
ext4_htree_fill_tree
htree_dirblock_to_tree
ext4_read_dirblock
__ext4_read_dirblock
ext4_dirblock_csum_verify
warn_no_space_for_csum
__warn_no_space_for_csum
return ERR_PTR(-EFSBADCRC) // -EBADMSG -74
nfserrno // WARNING
[ 161.115610] ------------[ cut here ]------------
[ 161.116465] nfsd: non-standard errno: -74
[ 161.117315] WARNING: CPU: 1 PID: 780 at fs/nfsd/nfsproc.c:878 nfserrno+0x9d/0xd0
[ 161.118596] Modules linked in:
[ 161.119243] CPU: 1 PID: 780 Comm: nfsd Not tainted 5.10.0-00014-g79679361fd5d #138
[ 161.120684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qe
mu.org 04/01/2014
[ 161.123601] RIP: 0010:nfserrno+0x9d/0xd0
[ 161.124676] Code: 0f 87 da 30 dd 00 83 e3 01 b8 00 00 00 05 75 d7 44 89 ee 48 c7 c7 c0 57 24 98 89 44 24 04 c6
05 ce 2b 61 03 01 e8 99 20 d8 00 <0f> 0b 8b 44 24 04 eb b5 4c 89 e6 48 c7 c7 a0 6d a4 99 e8 cc 15 33
[ 161.127797] RSP: 0018:ffffc90000e2f9c0 EFLAGS: 00010286
[ 161.128794] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 161.130089] RDX: 1ffff1103ee16f6d RSI: 0000000000000008 RDI: fffff520001c5f2a
[ 161.131379] RBP: 0000000000000022 R08: 0000000000000001 R09: ffff8881f70c1827
[ 161.132664] R10: ffffed103ee18304 R11: 0000000000000001 R12: 0000000000000021
[ 161.133949] R13: 00000000ffffffb6 R14: ffff8881317c0000 R15: ffffc90000e2fbd8
[ 161.135244] FS: 0000000000000000(0000) GS:ffff8881f7080000(0000) knlGS:0000000000000000
[ 161.136695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 161.137761] CR2: 00007fcaad70b348 CR3: 0000000144256006 CR4: 0000000000770ee0
[ 161.139041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 161.140291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 161.141519] PKRU: 55555554
[ 161.142076] Call Trace:
[ 161.142575] ? __warn+0x9b/0x140
[ 161.143229] ? nfserrno+0x9d/0xd0
[ 161.143872] ? report_bug+0x125/0x150
[ 161.144595] ? handle_bug+0x41/0x90
[ 161.145284] ? exc_invalid_op+0x14/0x70
[ 161.146009] ? asm_exc_invalid_op+0x12/0x20
[ 161.146816] ? nfserrno+0x9d/0xd0
[ 161.147487] nfsd_buffered_readdir+0x28b/0x2b0
[ 161.148333] ? nfsd4_encode_dirent_fattr+0x380/0x380
[ 161.149258] ? nfsd_buffered_filldir+0xf0/0xf0
[ 161.150093] ? wait_for_concurrent_writes+0x170/0x170
[ 161.151004] ? generic_file_llseek_size+0x48/0x160
[ 161.151895] nfsd_readdir+0x132/0x190
[ 161.152606] ? nfsd4_encode_dirent_fattr+0x380/0x380
[ 161.153516] ? nfsd_unlink+0x380/0x380
[ 161.154256] ? override_creds+0x45/0x60
[ 161.155006] nfsd4_encode_readdir+0x21a/0x3d0
[ 161.155850] ? nfsd4_encode_readlink+0x210/0x210
[ 161.156731] ? write_bytes_to_xdr_buf+0x97/0xe0
[ 161.157598] ? __write_bytes_to_xdr_buf+0xd0/0xd0
[ 161.158494] ? lock_downgrade+0x90/0x90
[ 161.159232] ? nfs4svc_decode_voidarg+0x10/0x10
[ 161.160092] nfsd4_encode_operation+0x15a/0x440
[ 161.160959] nfsd4_proc_compound+0x718/0xe90
[ 161.161818] nfsd_dispatch+0x18e/0x2c0
[ 161.162586] svc_process_common+0x786/0xc50
[ 161.163403] ? nfsd_svc+0x380/0x380
[ 161.164137] ? svc_printk+0x160/0x160
[ 161.164846] ? svc_xprt_do_enqueue.part.0+0x365/0x380
[ 161.165808] ? nfsd_svc+0x380/0x380
[ 161.166523] ? rcu_is_watching+0x23/0x40
[ 161.167309] svc_process+0x1a5/0x200
[ 161.168019] nfsd+0x1f5/0x380
[ 161.168663] ? nfsd_shutdown_threads+0x260/0x260
[ 161.169554] kthread+0x1c4/0x210
[ 161.170224] ? kthread_insert_work_sanity_check+0x80/0x80
[ 161.171246] ret_from_fork+0x1f/0x30
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 45bb63ed20e02ae146336412889fe5450316a84f upstream.
The pair of bloom filtered used by delegation_blocked() was intended to
block delegations on given filehandles for between 30 and 60 seconds. A
new filehandle would be recorded in the "new" bit set. That would then
be switch to the "old" bit set between 0 and 30 seconds later, and it
would remain as the "old" bit set for 30 seconds.
Unfortunately the code intended to clear the old bit set once it reached
30 seconds old, preparing it to be the next new bit set, instead cleared
the *new* bit set before switching it to be the old bit set. This means
that the "old" bit set is always empty and delegations are blocked
between 0 and 30 seconds.
This patch updates bd->new before clearing the set with that index,
instead of afterwards.
Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 6282cd565553 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bf92e5008b17f935a6de8b708551e02c2294121c ]
At this point in compound processing, currentfh refers to the parent of
the file, not the file itself. Get the correct dentry from the delegation
stateid instead.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a078a7dc0eaa9db288ae45319f7f7503968af546 ]
The code in nfsd4_deleg_getattr_conflict() is convoluted and buggy.
With this patch we:
- properly handle non-nfsd leases. We must not assume flc_owner is a
delegation unless fl_lmops == &nfsd_lease_mng_ops
- move the main code out of the for loop
- have a single exit which calls nfs4_put_stid()
(and other exits which don't need to call that)
[ jlayton: refactored on top of Neil's other patch: nfsd: fix
nfsd4_deleg_getattr_conflict in presence of third party lease ]
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 22451a16b7ab7debefce660672566be887db1637 ]
When we have a corrupted main.sqlite in /var/lib/nfs/nfsdcld/, it may
result in namelen being 0, which will cause memdup_user() to return
ZERO_SIZE_PTR.
When we access the name.data that has been assigned the value of
ZERO_SIZE_PTR in nfs4_client_to_reclaim(), null pointer dereference is
triggered.
[ T1205] ==================================================================
[ T1205] BUG: KASAN: null-ptr-deref in nfs4_client_to_reclaim+0xe9/0x260
[ T1205] Read of size 1 at addr 0000000000000010 by task nfsdcld/1205
[ T1205]
[ T1205] CPU: 11 PID: 1205 Comm: nfsdcld Not tainted 5.10.0-00003-g2c1423731b8d #406
[ T1205] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[ T1205] Call Trace:
[ T1205] dump_stack+0x9a/0xd0
[ T1205] ? nfs4_client_to_reclaim+0xe9/0x260
[ T1205] __kasan_report.cold+0x34/0x84
[ T1205] ? nfs4_client_to_reclaim+0xe9/0x260
[ T1205] kasan_report+0x3a/0x50
[ T1205] nfs4_client_to_reclaim+0xe9/0x260
[ T1205] ? nfsd4_release_lockowner+0x410/0x410
[ T1205] cld_pipe_downcall+0x5ca/0x760
[ T1205] ? nfsd4_cld_tracking_exit+0x1d0/0x1d0
[ T1205] ? down_write_killable_nested+0x170/0x170
[ T1205] ? avc_policy_seqno+0x28/0x40
[ T1205] ? selinux_file_permission+0x1b4/0x1e0
[ T1205] rpc_pipe_write+0x84/0xb0
[ T1205] vfs_write+0x143/0x520
[ T1205] ksys_write+0xc9/0x170
[ T1205] ? __ia32_sys_read+0x50/0x50
[ T1205] ? ktime_get_coarse_real_ts64+0xfe/0x110
[ T1205] ? ktime_get_coarse_real_ts64+0xa2/0x110
[ T1205] do_syscall_64+0x33/0x40
[ T1205] entry_SYSCALL_64_after_hwframe+0x67/0xd1
[ T1205] RIP: 0033:0x7fdbdb761bc7
[ T1205] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 514
[ T1205] RSP: 002b:00007fff8c4b7248 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ T1205] RAX: ffffffffffffffda RBX: 000000000000042b RCX: 00007fdbdb761bc7
[ T1205] RDX: 000000000000042b RSI: 00007fff8c4b75f0 RDI: 0000000000000008
[ T1205] RBP: 00007fdbdb761bb0 R08: 0000000000000000 R09: 0000000000000001
[ T1205] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000042b
[ T1205] R13: 0000000000000008 R14: 00007fff8c4b75f0 R15: 0000000000000000
[ T1205] ==================================================================
Fix it by checking namelen.
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Fixes: 74725959c33c ("nfsd: un-deprecate nfsdcld")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Scott Mayhew <smayhew@redhat.com>
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d078cbf5c38de83bc31f83c47dcd2184c04a50c7 ]
If not enough buffer space available, but idmap_lookup has triggered
lookup_fn which calls cache_get and returns successfully. Then we
missed to call cache_put here which pairs with cache_get.
Fixes: ddd1ea563672 ("nfsd4: use xdr_reserve_space in attribute encoding")
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Reviwed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8a7926176378460e0d91e02b03f0ff20a8709a60 ]
If we wait_for_construction and find that the file is no longer hashed,
and we're going to retry the open, the old nfsd_file reference is
currently leaked. Put the reference before retrying.
Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Youzhong Yang <youzhong@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 81a95c2b1d605743220f28db04b8da13a65c4059 ]
Given that we do the search and insertion while holding the i_lock, I
don't think it's possible for us to get EEXIST here. Remove this case.
Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Youzhong Yang <youzhong@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- One more write delegation fix
* tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
|
|
It is not safe to dereference fl->c.flc_owner without first confirming
fl->fl_lmops is the expected manager. nfsd4_deleg_getattr_conflict()
tests fl_lmops but largely ignores the result and assumes that flc_owner
is an nfs4_delegation anyway. This is wrong.
With this patch we restore the "!= &nfsd_lease_mng_ops" case to behave
as it did before the change mentioned below. This is the same as the
current code, but without any reference to a possible delegation.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix a number of crashers
- Update email address for an NFSD reviewer
* tag 'nfsd-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
fs/nfsd: fix update of inode attrs in CB_GETATTR
nfsd: fix potential UAF in nfsd4_cb_getattr_release
nfsd: hold reference to delegation when updating it for cb_getattr
MAINTAINERS: Update Olga Kornievskaia's email address
nfsd: prevent panic for nfsv4.0 closed files in nfs4_show_open
nfsd: ensure that nfsd4_fattr_args.context is zeroed out
|
|
Currently, we copy the mtime and ctime to the in-core inode and then
mark the inode dirty. This is fine for certain types of filesystems, but
not all. Some require a real setattr to properly change these values
(e.g. ceph or reexported NFS).
Fix this code to call notify_change() instead, which is the proper way
to effect a setattr. There is one problem though:
In this case, the client is holding a write delegation and has sent us
attributes to update our cache. We don't want to break the delegation
for this since that would defeat the purpose. Add a new ATTR_DELEG flag
that makes notify_change bypass the try_break_deleg call.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Once we drop the delegation reference, the fields embedded in it are no
longer safe to access. Do that last.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Once we've dropped the flc_lock, there is nothing that ensures that the
delegation that was found will still be around later. Take a reference
to it while holding the lock and then drop it when we've finished with
the delegation.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Prior to commit 3f29cc82a84c ("nfsd: split sc_status out of
sc_type") states_show() relied on sc_type field to be of valid
type before calling into a subfunction to show content of a
particular stateid. From that commit, we split the validity of
the stateid into sc_status and no longer changed sc_type to 0
while unhashing the stateid. This resulted in kernel oopsing
for nfsv4.0 opens that stay around and in nfs4_show_open()
would derefence sc_file which was NULL.
Instead, for closed open stateids forgo displaying information
that relies of having a valid sc_file.
To reproduce: mount the server with 4.0, read and close
a file and then on the server cat /proc/fs/nfsd/clients/2/states
[ 513.590804] Call trace:
[ 513.590925] _raw_spin_lock+0xcc/0x160
[ 513.591119] nfs4_show_open+0x78/0x2c0 [nfsd]
[ 513.591412] states_show+0x44c/0x488 [nfsd]
[ 513.591681] seq_read_iter+0x5d8/0x760
[ 513.591896] seq_read+0x188/0x208
[ 513.592075] vfs_read+0x148/0x470
[ 513.592241] ksys_read+0xcc/0x178
Fixes: 3f29cc82a84c ("nfsd: split sc_status out of sc_type")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If nfsd4_encode_fattr4 ends up doing a "goto out" before we get to
checking for the security label, then args.context will be set to
uninitialized junk on the stack, which we'll then try to free.
Initialize it early.
Fixes: f59388a579c6 ("NFSD: Add nfsd4_encode_fattr4_sec_label()")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two minor fixes for recent changes
* tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd sockets
sunrpc: avoid -Wformat-security warning
|
|
When creating nfsd sockets via the netlink interface, we do want to
register with the portmapper. Don't set SVC_SOCK_ANONYMOUS.
Reported-by: Steve Dickson <steved@redhat.com>
Fixes: 16a471177496 ("NFSD: add listener-{set,get} netlink command")
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull nfsd updates from Chuck Lever:
"This is a light release containing optimizations, code clean-ups, and
minor bug fixes.
This development cycle focused on work outside of upstream kernel
development:
- Continuing to build upstream CI for NFSD based on kdevops
- Continuing to focus on the quality of NFSD in LTS kernels
- Participation in IETF nfsv4 WG discussions about NFSv4 ACLs,
directory delegation, and NFSv4.2 COPY offload
Notable features for v6.11 that do not come through the NFSD tree
include NFS server-side support for the new pNFS NVMe layout type
[RFC9561]. Functional testing for pNFS block layouts like this one has
been introduced to our kdevops CI harness. Work on improving the
resolution of file attribute time stamps in local filesystems is also
ongoing tree-wide.
As always I am grateful to NFSD contributors, reviewers, testers, and
bug reporters who participated during this cycle"
* tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument
gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey
MAINTAINERS: Add a bugzilla link for NFSD
nfsd: new netlink ops to get/set server pool_mode
sunrpc: refactor pool_mode setting code
nfsd: allow passing in array of thread counts via netlink
nfsd: make nfsd_svc take an array of thread counts
sunrpc: fix up the special handling of sv_nrpools == 1
SUNRPC: Add a trace point in svc_xprt_deferred_close
NFSD: Support write delegations in LAYOUTGET
lockd: Use *-y instead of *-objs in Makefile
NFSD: Fix nfsdcld warning
svcrdma: Handle ADDR_CHANGE CM event properly
svcrdma: Refactor the creation of listener CMA ID
NFSD: remove unused structs 'nfsd3_voidargs'
NFSD: harden svcxdr_dupstr() and svcxdr_tmpalloc() against integer overflows
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Support passing NULL along AT_EMPTY_PATH for statx().
NULL paths with any flag value other than AT_EMPTY_PATH go the
usual route and end up with -EFAULT to retain compatibility (Rust
is abusing calls of the sort to detect availability of statx)
This avoids path lookup code, lockref management, memory allocation
and in case of NULL path userspace memory access (which can be
quite expensive with SMAP on x86_64)
- Don't block i_writecount during exec. Remove the
deny_write_access() mechanism for executables
- Relax open_by_handle_at() permissions in specific cases where we
can prove that the caller had sufficient privileges to open a file
- Switch timespec64 fields in struct inode to discrete integers
freeing up 4 bytes
Fixes:
- Fix false positive circular locking warning in hfsplus
- Initialize hfs_inode_info after hfs_alloc_inode() in hfs
- Avoid accidental overflows in vfs_fallocate()
- Don't interrupt fallocate with EINTR in tmpfs to avoid constantly
restarting shmem_fallocate()
- Add missing quote in comment in fs/readdir
Cleanups:
- Don't assign and test in an if statement in mqueue. Move the
assignment out of the if statement
- Reflow the logic in may_create_in_sticky()
- Remove the usage of the deprecated ida_simple_xx() API from procfs
- Reject FSCONFIG_CMD_CREATE_EXCL requets that depend on the new
mount api early
- Rename variables in copy_tree() to make it easier to understand
- Replace WARN(down_read_trylock, ...) abuse with proper asserts in
various places in the VFS
- Get rid of user_path_at_empty() and drop the empty argument from
getname_flags()
- Check for error while copying and no path in one branch in
getname_flags()
- Avoid redundant smp_mb() for THP handling in do_dentry_open()
- Rename parent_ino to d_parent_ino and make it use RCU
- Remove unused header include in fs/readdir
- Export in_group_capable() helper and switch f2fs and fuse over to
it instead of open-coding the logic in both places"
* tag 'vfs-6.11.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
ipc: mqueue: remove assignment from IS_ERR argument
vfs: rename parent_ino to d_parent_ino and make it use RCU
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
stat: use vfs_empty_path() helper
fs: new helper vfs_empty_path()
fs: reflow may_create_in_sticky()
vfs: remove redundant smp_mb for thp handling in do_dentry_open
fuse: Use in_group_or_capable() helper
f2fs: Use in_group_or_capable() helper
fs: Export in_group_or_capable()
vfs: reorder checks in may_create_in_sticky
hfs: fix to initialize fields of hfs_inode_info after hfs_alloc_inode()
proc: Remove usage of the deprecated ida_simple_xx() API
hfsplus: fix to avoid false alarm of circular locking
Improve readability of copy_tree
vfs: shave a branch in getname_flags
vfs: retire user_path_at_empty and drop empty arg from getname_flags
vfs: stop using user_path_at_empty in do_readlinkat
tmpfs: don't interrupt fallocate with EINTR
fs: don't block i_writecount during exec
...
|
|
"data" actually refers to a file_lease and not a file_lock. Both structs
have their file_lock_core as the first field though, so this bug should
be harmless without struct randomization in play.
Reported-by: Florian Evers <florian-evers@gmx.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219008
Fixes: 05580bbfc6bc ("nfsd: adapt to breakup of struct file_lock")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Florian Evers <florian-evers@gmx.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Now that nfsd_svc can handle an array of thread counts, fix up the
netlink threads interface to construct one from the netlink call
and pass it through so we can start a pooled server the same way we
would start a normal one.
Note that any unspecified values in the array are considered zeroes,
so it's possible to shut down a pooled server by passing in a short
array that has only zeros, or even an empty array.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Now that the refcounting is fixed, rework nfsd_svc to use the same
thread setup as the pool_threads interface. Have it take an array of
thread counts instead of just a single value, and pass that from the
netlink threads set interface. Since the new netlink interface doesn't
have the same restriction as pool_threads, move the guard against
shutting down all threads to write_pool_threads.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
I noticed LAYOUTGET(LAYOUTIOMODE4_RW) returning NFS4ERR_ACCESS
unexpectedly. The NFS client had created a file with mode 0444, and
the server had returned a write delegation on the OPEN(CREATE). The
client was requesting a RW layout using the write delegation stateid
so that it could flush file modifications.
Creating a read-only file does not seem to be problematic for
NFSv4.1 without pNFS, so I began looking at NFSD's implementation of
LAYOUTGET.
The failure was because fh_verify() was doing a permission check as
part of verifying the FH presented during the LAYOUTGET. It uses the
loga_iomode value to specify the @accmode argument to fh_verify().
fh_verify(MAY_WRITE) on a file whose mode is 0444 fails with -EACCES.
To permit LAYOUT* operations in this case, add OWNER_OVERRIDE when
checking the access permission of the incoming file handle for
LAYOUTGET and LAYOUTCOMMIT.
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # v6.6+
Message-Id: 4E9C0D74-A06D-4DC3-A48A-73034DC40395@oracle.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since CONFIG_NFSD_LEGACY_CLIENT_TRACKING is a new config option, its
initial default setting should have been Y (if we are to follow the
common practice of "default Y, wait, default N, wait, remove code").
Paul also suggested adding a clearer remedy action to the warning
message.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Message-Id: <d2ab4ee7-ba0f-44ac-b921-90c8fa5a04d2@molgen.mpg.de>
Fixes: 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
'nfsd3_voidargs' in nfs[23]acl.c is unused since
commit 788f7183fba8 ("NFSD: Add common helpers to decode void args and
encode void results").
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
These lengths come from xdr_stream_decode_u32() and so we should be a
bit careful with them. Use size_add() and struct_size() to avoid
integer overflows. Saving size_add()/struct_size() results to a u32 is
unsafe because it truncates away the high bits.
Also generally storing sizes in longs is safer. Most systems these days
use 64 bit CPUs. It's harder for an addition to overflow 64 bits than
it is to overflow 32 bits. Also functions like vmalloc() can
successfully allocate UINT_MAX bytes, but nothing can allocate ULONG_MAX
bytes.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Due to a late review, revert and re-fix a recent crasher fix
* tag 'nfsd-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "nfsd: fix oops when reading pool_stats before server is started"
nfsd: initialise nfsd_info.mutex early.
|
|
nfsd_info.mutex can be dereferenced by svc_pool_stats_start()
immediately after the new netns is created. Currently this can
trigger an oops.
Move the initialisation earlier before it can possibly be dereferenced.
Fixes: 7b207ccd9833 ("svc: don't hold reference for poolstats, only mutex.")
Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Closes: https://lore.kernel.org/all/c2e9f6de-1ec4-4d3a-b18d-d5a6ec0814a0@linux.ibm.com/
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix crashes triggered by administrative operations on the server
* tag 'nfsd-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: grab nfsd_mutex in nfsd_nl_rpc_status_get_dumpit()
nfsd: fix oops when reading pool_stats before server is started
|
|
Grab nfsd_mutex lock in nfsd_nl_rpc_status_get_dumpit routine and remove
nfsd_nl_rpc_status_get_start() and nfsd_nl_rpc_status_get_do |