summaryrefslogtreecommitdiff
path: root/fs/nfs/flexfilelayout/flexfilelayout.c
AgeCommit message (Collapse)AuthorFilesLines
2021-05-19pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()Nikola Livic1-1/+1
[ Upstream commit ed34695e15aba74f45247f1ee2cf7e09d449f925 ] We (adam zabrocki, alexander matrosov, alexander tereshkin, maksym bazalii) observed the check: if (fh->size > sizeof(struct nfs_fh)) should not use the size of the nfs_fh struct which includes an extra two bytes from the size field. struct nfs_fh { unsigned short size; unsigned char data[NFS_MAXFHSIZE]; } but should determine the size from data[NFS_MAXFHSIZE] so the memcpy will not write 2 bytes beyond destination. The proposed fix is to compare against the NFS_MAXFHSIZE directly, as is done elsewhere in fs code base. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Nikola Livic <nlivic@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()Trond Myklebust1-1/+1
[ Upstream commit 52104f274e2d7f134d34bab11cada8913d4544e2 ] Don't bump the index twice. Fixes: 563c53e73b8b ("NFS: Fix flexfiles read failover") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-30pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabledTrond Myklebust1-6/+21
If the flexfiles mirroring is enabled, then the read code expects to be able to set pgio->pg_mirror_idx to point to the data server that is being used for this particular read. However it does not change the pg_mirror_count because we only need to send a single read. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-18pNFS/flexfiles: Be consistent about mirror index typesTrond Myklebust1-17/+17
A mirror index is always of type u32. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-18pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on readTrond Myklebust1-5/+6
While it is true that reading from an unmirrored source always uses index 0, that is no longer true for mirrored sources when we fail over. Fixes: 563c53e73b8b ("NFS: Fix flexfiles read failover") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva1-2/+2
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-12NFS: Fix flexfiles read failoverTrond Myklebust1-14/+36
The current mirrored read failover code is correctly resetting the mirror index between failed reads, however it is not able to actually flip the RPC call over to the next RPC client. The end result is that we keep resending the RPC call to the same client over and over. The fix is to use the pnfs_read_resend_pnfs() mechanism to schedule a new RPC call, but we need to add the ability to pass in a mirror index so that we always retry the next mirror in the list. Fixes: 166bd5b889ac ("pNFS/flexfiles: Fix layoutstats handling during read failovers") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-12pNFS/flexfiles: The mirror count could depend on the layout segment rangeTrond Myklebust1-2/+2
Make sure we specify the layout segment range when calculating the mirror count. In theory, that number could depend on the range to which we're writing. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-12pNFS/flexfiles: Clean up redundant calls to pnfs_put_lseg()Trond Myklebust1-8/+2
Both nfs_pageio_reset_read_mds() and nfs_pageio_reset_write_mds() do call pnfs_generic_pg_cleanup() for us. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-06-26pNFS/flexfiles: Fix list corruption if the mirror count changesTrond Myklebust1-4/+7
If the mirror count changes in the new layout we pick up inside ff_layout_pg_init_write(), then we can end up adding the request to the wrong mirror and corrupting the mirror->pg_list. Fixes: d600ad1f2bdb ("NFS41: pop some layoutget errors to application") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27pNFS/flexfiles: Specify the layout segment range in LAYOUTGETTrond Myklebust1-4/+4
Move from requesting only full file layout segments, to requesting layout segments that match our I/O size. This means the server is still free to return a full file layout, but we will no longer error out if it does not. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27pNFS/flexfiles: remove requirement for whole file layoutsTrond Myklebust1-21/+0
Remove the requirement that the server always sends whole file layouts. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27pNFS/flexfiles: Check the layout segment range before doing I/OTrond Myklebust1-2/+10
When starting to read or write with a layout segment, check that the range matches our request. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27pNFS/flexfile: Don't merge layout segments if the mirrors don't matchTrond Myklebust1-0/+19
Check that the number of mirrors, and the mirror information matches before deciding to merge layout segments in pNFS/flexfiles. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27NFS/pNFS: Clean up pNFS commit operationsTrond Myklebust1-7/+12
Move the pNFS commit related operations into a separate structure that can be carried by the pnfs_ds_commit_info. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27NFS: Remove bucket array from struct pnfs_ds_commit_infoTrond Myklebust1-76/+0
Remove the unused bucket array in struct pnfs_ds_commit_info. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27pNFS: Enable per-layout segment commit structuresTrond Myklebust1-0/+19
Enable adding and lookup of per-layout segment commits in filelayout and flexfilelayout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27pNFS: Add infrastructure for cleaning up per-layout commit structuresTrond Myklebust1-0/+11
Ensure that both the file and flexfiles layout types clean up when freeing the layout segments. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27NFSv4/pnfs: Support a list of commit arrays in struct pnfs_ds_commit_infoTrond Myklebust1-0/+1
When we have multiple layout segments with different lists of mirrored data, we need to track the commits on a per layout segment basis. This patch adds a list to support this tracking in struct pnfs_ds_commit_info. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-26pNFS/flexfiles: Simplify allocation of the mirror arrayTrond Myklebust1-16/+5
Just allocate the array at the end of the layout segment structure, instead of allocating it as a separate array of pointers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4: Ensure layout headers are RCU safeTrond Myklebust1-3/+3
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16pNFS/flexfiles: Report DELAY and GRACE errors from the DS to the serverTrond Myklebust1-9/+11
Ensure that if the DS is returning too many DELAY and GRACE errors, we also report that to the MDS through the layouterror mechanism. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-01-15pNFS/flexfiles: Add tracing for layout errorsTrond Myklebust1-9/+19
Trace layout errors for pNFS/flexfiles on read/write/commit operations. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15pNFS/flexfiles: Record resend attempts on I/O failureTrond Myklebust1-3/+3
If the attempt to do pNFS fails, then record what action we take to recover (resend, reset to pnfs or reset to mds). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-08-26pNFS/flexfiles: Don't time out requests on hard mountsTrond Myklebust1-2/+9
If the mount is hard, we should ignore the 'io_maxretrans' module parameter so that we always keep retrying. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"Trond Myklebust1-17/+0
This reverts commit a79f194aa4879e9baad118c3f8bb2ca24dbef765. The mechanism for aborting I/O is racy, since we are not guaranteed that the request is asleep while we're changing both task->tk_status and task->tk_action. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1
2019-07-18pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDSTrond Myklebust1-0/+26
Add tracepoints to allow debugging of the event chain leading to a pnfs fallback to doing I/O through the MDS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner1-0/+1
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25NFS: Add a helper to return a pointer to the open context of a struct nfs_pageTrond Myklebust1-3/+3
Add a helper for when we remove the explicit pointer to the open context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25pNFS: Add tracking to limit the number of pNFS retriesTrond Myklebust1-0/+8
When the client is reading or writing using pNFS, and hits an error on the DS, then it typically sends a LAYOUTERROR and/or LAYOUTRETURN to the MDS, before redirtying the failed pages, and going for a new round of reads/writebacks. The problem is that if the server has no way to fix the DS, then we may need a way to interrupt this loop after a set number of attempts have been made. This patch adds an optional module parameter that allows the admin to specify how many times to retry the read/writeback process before failing with a fatal error. The default behaviour is to retry forever. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-03-23pNFS/flexfiles: Fix layoutstats handling during read failoversTrond Myklebust1-1/+4
During a read failover, we may end up changing the value of the pgio_mirror_idx, so make sure that we record the layout stats before that update. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid()Trond Myklebust1-6/+2
Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfile: Simplify nfs4_ff_layout_ds_version()Trond Myklebust1-3/+3
Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Simplify ff_layout_get_ds_cred()Trond Myklebust1-3/+3
Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client()Trond Myklebust1-3/+3
Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh()Trond Myklebust1-2/+2
Pass in a pointer to the mirror rather than having to retrieve it from the array and then verify the resulting pointer. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Speed up read failover when DSes are downTrond Myklebust1-12/+62
If we notice that a DS may be down, we should attempt to read from the other mirrors first before we go back to retry the dead DS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Remove bogus checks for invalid deviceidsTrond Myklebust1-20/+0
We already check the deviceids before we start the RPC call. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: refactor calls to fs4_ff_layout_prepare_ds()Trond Myklebust1-6/+20
While we may want to skip attempting to connect to a downed mirror when we're deciding which mirror to select for a read, we do not want to do so once we've committed to attempting the I/O in ff_layout_read/write_pagelist(), or ff_layout_initiate_commit() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Send LAYOUTERROR when failing over mirrored readsTrond Myklebust1-5/+55
When a read to the preferred mirror returns an error, the flexfiles driver records the error in the inode list and currently marks the layout for return before failing over the attempted read to the next mirror. What we actually want to do is fire off a LAYOUTERROR to notify the MDS that there is an issue with the preferred mirror, then we fail over. Only once we've failed to read from all mirrors should we return the layout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFSv4/flexfiles: Abort I/O early if the layout segment was invalidatedTrond Myklebust1-0/+17
If a layout segment gets invalidated while a pNFS I/O operation is queued for transmission, then we ideally want to abort immediately. This is particularly the case when there is a large number of I/O related RPCs queued in the RPC layer, and the layout segment gets invalidated due to an ENOSPC error, or an EACCES (because the client was fenced). We may end up forced to spam the MDS with a lot of otherwise unnecessary LAYOUTERRORs after that I/O fails. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Fix up sparse RCU annotationsTrond Myklebust1-2/+2
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-13SUNRPC: Add xdr_stream::rqst fieldChuck Lever1-1/+1
Having access to the controlling rpc_rqst means a trace point in the XDR code can report: - the XID - the task ID and client ID - the p_name of RPC being processed Subsequent patches will introduce such trace points. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.NeilBrown1-22/+11
SUNRPC has two sorts of credentials, both of which appear as "struct rpc_cred". There are "generic credentials" which are supplied by clients such as NFS and passed in 'struct rpc_message' to indicate which user should be used to authorize the request, and there are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS which describe the credential to be sent over the wires. This patch replaces all the generic credentials by 'struct cred' pointers - the credential structure used throughout Linux. For machine credentials, there is a special 'struct cred *' pointer which is statically allocated and recognized where needed as having a special meaning. A look-up of a low-level cred will map this to a machine credential. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: remove uid and gid from struct auth_credNeilBrown1-6/+8
Use cred->fsuid and cred->fsgid instead. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: remove groupinfo from struct auth_cred.NeilBrown1-13/+1
We can use cred->groupinfo (from the 'struct cred') instead. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19SUNRPC: add 'struct cred *' to auth_cred and rpc_credNeilBrown1-0/+17
The SUNRPC credential framework was put together before Linux has 'struct cred'. Now that we have it, it makes sense to use it. This first step just includes a suitable 'struct cred *' pointer in every 'struct auth_cred' and almost every 'struct rpc_cred'. The rpc_cred used for auth_null has a NULL 'struct cred *' as nothing else really makes sense. For rpc_cred, the pointer is reference counted. For auth_cred it isn't. struct auth_cred are either allocated on the stack, in which case the thread owns a reference to the auth, or are part of 'struct generic_cred' in which case gc_base owns the reference, and "acred" shares it. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-02flexfiles: enforce per-mirror stateid only for v4 DSesTigran Mkrtchyan1-2/+4
Since commit bb21ce0ad227 we always enforce per-mirror stateid. However, this makes sense only for v4+ servers. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-22flexfiles: use per-mirror specified stateid for IOTigran Mkrtchyan1-12/+9
rfc8435 says: For tight coupling, ffds_stateid provides the stateid to be used by the client to access the file. However current implementation replaces per-mirror provided stateid with by open or lock stateid. Ensure that per-mirror stateid is used by ff_layout_write_prepare_v4 and nfs4_ff_layout_prepare_ds. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30pNFS: Don't allocate more pages than we need to fit a layoutget responseTrond Myklebust1-0/+1
For the 'files' and 'flexfiles' layout types, we do not expect the reply to be any larger than 4k. The block and scsi layout types are a little more greedy, so we keep allocating the maximum response size for now. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>