| Age | Commit message (Collapse) | Author | Files | Lines |
|
If a delayed broadcast by a previous cluster lock holder arrives, the
new legitimate leader will accept this without questioning in
leader_handler(). Without this patch rec->leader will never be
overwritten, and because rec->pnn != rec->leader we'll also never send
out fresh leader broadcasts. And because we hold the cluster lock,
nobody else can step up.
Fix this in the next round of leader broadcast timeout.
Bug: https://bugzilla.samba.org/show_bug.cgi?id=15892
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Aug 7 02:59:20 UTC 2025 on atb-devel-224
|
|
Change the variable name to "path" so it makes sense to reuse it for
the directory.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Jul 23 00:02:47 UTC 2025 on atb-devel-224
|
|
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Dropping this from ctdb_tunable_load_file() allows that function to be
called multiple times for different files. The caller sets the
defaults.
In the test script, factor out the handling of a single tunables file
in a similar way. Ignoring missing/unreadable files is OK because
this function will only be called for test successes (hence "ok" in
the name). There will never be existing, unreadable files. The code
being tested ignores missing files, so do that here too.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu May 29 10:57:35 UTC 2025 on atb-devel-224
|
|
See documentation change for details.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Even though all nodes may be shutting down there is still a very small
window for a race when multiple nodes are shut down. For simplicity,
assume 2 nodes. Assume the shutdowns of nodes are staggered, which is
usual because they're usually initiated by a loop (e.g. onnode -p all
ctdb shutdown). Although commands can continue in parallel, some
commands are started later than others.
Consider this sequence:
1. Node 0 reaches ctdb_shutdown_takeover() in
ctdb_shutdown_sequence() and a takeover run starts
2. Node 1 has not yet set its runlevel to SHUTDOWN in
ctdb_shutdown_sequence()
3. The leader node asks node 1 which IPs it can host
4. Node 1 replies "all of them"
5. Node 1 now sets its runlevel to SHUTDOWN in
ctdb_shutdown_sequence()
6. The leader node continues with the takeover run, first asking all
nodes to run "startipreallocate"
7. Node 0 runs "startipreallocate", so its NFS server starts grace
8. Node 1 does not run "startipreallocate" because it is not in
RUNNING runstate, so its NFS server does not start grace
9. The leader node continues with the takeover run, first asking all
nodes to run "releaseip" for IPs they can no longer hold
10. Node 0 releases all IPs, since it is SHUTDOWN runstate (so can't
host IPs)
11. As part of this, the NFS server on node 0 releases locks held
against IPs it is releasing
12. A client connected to node 1, where the NFS server is not in
grace, takes ("steals") one of those locks
This client is then permitted to reclaim the lock when nodes are
restarted.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Allows the timeout for failover during shutdown to be modified.
Defaults to 10s.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
SQ
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Without this, NFS servers on other nodes will not go into grace before
this node releases locks. This should also support improved behaviour
for SMB durable file handles.
The timeout is currently a constant 10s. However, it will
subsequently be switched to an option.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
An early shutdown can put ctdbd into SHUTDOWN runstate before ctdbd
has completed all early initialisation. Some of the start-time
transitions then attempt to set the runstate to FIRST_RECOVERY or
RUNNING, which would make the runstate go backwards, so ctdbd aborts.
Upcoming changes cause ctdbd shutdown to take longer, so the problem
will become more likely. With those changes, this can be
unreliably (50% of the time?) triggered by:
ctdb/tests/INTEGRATION/simple/cluster.091.version_check.sh
since it does an early shutdown due to a version mismatch.
Avoid this by noticing when the runstate is SHUTDOWN and refusing to
continue with subsequent early initialisation steps, which aren't
needed when shutting down.
Earlier runstate transitions do not seems likely to cause an abort
during early shutdown. The following:
./tests/local_daemons.sh foo start 0; ./tests/local_daemons.sh foo stop 0
sees ctdbd already into FIRST_RECOVERY before the shutdown is
processed.
The change to ctdb_run_startup() probably isn't strictly necessary.
There will be no abort in this case. ctdb_shutdown_sequence() will
always run the "shutdown" event and then stop the event daemon, so it
doesn't seem possible that services could be left running. However,
we might as well avoid running the "startup" event when shutting down,
even if only to avoid confusing logs.
Ultimately, it seems like some redesign would be needed to avoid this
in a more predictable manner, rather than responding when an early
initialisation step inconveniently completes during shutdown. For
example, hanging a lot of the start-time event handling off a common
talloc context, could allow it to be cancelled with a single
TALLOC_FREE(). However, a change like that would involve a lot of
analysis to ensure that the talloc hierarchy is correct and there is
no change of free'd pointers being dereferenced. So, we're probably
better off just keeping this issue in mind during a broader redesign.
This workaround appears to be sufficient.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
bzero() function has been deprecated for a long time.
In samba it is replaced with memset(). But samba also
provides common memory-zeroing macros, like ZERO_STRUCT().
In all places where bzero() is used, it actually meant to
zero a structure or an array.
So replace these bzero() calls with ZERO_STRUCT() or
ZERO_ARRAY() as appropriate, and remove bzero() replacement
and testing entirely.
While at it, also stop checking for presence of memset() -
this function is standard for a very long time, and the
only conditional where HAVE_MEMSET were used, was to
provide replacement for bzero() - in all other places
memset() is used unconditionally.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Pavel Filipenský <pfilipensky@samba.org>
|
|
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
We should not directly overwrite the pointer we are realloc'ing
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
Initialise the pointer to NULL and fall through to let
talloc_realloc() do the allocation. talloc_realloc() does the right
thing with a NULL pointer...
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
This is cheap when tcparray is NULL and let's the code that now
follows be simplified.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
This is harmless, so it doesn't generally need to be logged.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
Apply README.Coding, modernise logging, pre-render connection as a
string for logging, switch terminology from "tickle" to "connection",
tidy up comments.
No changes in functionality.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
Reorder code to use early returns, modernise debug.
Best reviewed with "git show -w".
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Autobuild-User(master): Anoop C S <anoopcs@samba.org>
Autobuild-Date(master): Tue Oct 8 06:42:04 UTC 2024 on atb-devel-224
|
|
Fix the comment (NULL versus -1), apply some README.Coding.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Uses of CTDB_BASE in the subsequent code are now handled by the path
module, so there is no point getting the value of CTDB_BASE. Instead,
check that the attempt to set it worked, noting that:
[...] if overwrite is zero, then the value of name is not
changed (and setenv() returns a success status).
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
No need to use CTDB_BASE directly.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Add some missing error handling and error messages.
Remove a use of CTDB_NO_MEMORY(), which then renders the caller's use
of ctdb_errstr() pointless, so remove that too.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Modernise the debug macros along the way.
These are done separately because they will require a little more
patience to review.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Define a static function to return the string. This clearly doesn't
need a ctdb_ prefix, but it matches ctdb_vnn_iface_string(), so
doesn't look out of place.
Use it in the places where review is trivial.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
These are currently converted to strings constantly in log messages
and other places. This clutters the code and probably has a minor
performance impact.
Add a new string field to the VNN structure. Populate it when a
public address is added and the VNN structure is allocated. This is
consistent with how node addresses are handled.
Don't use it yet, or this commit becomes huge.
A short-term goal is that each VNN public address will be converted to
a string only once. A longer-term goal is to reduce use of
ctdb_addr_to_str().
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
The word "no" was accidentally dropped in commit
1e47a1b3f6ab1e2ad9d86dfb28c3e086c99a97e5.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Unused since commit a10545ab6bd8a1b9ca87b0fdba8381cb8af0e284.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Currently, event failures are completely ignored in favour of checking
if the IP is on an interface. This misses the case where event
scripts up to and including 10.interface succeed, but something later
fails. When that occurs, count is incremented, so the failure is
counted as a success in the summary that is logged.
Fail when releaseip fails even though 10.interface succeeded in
releasing the IP. This may result in the IP address coming back, but
that's a different problem.
Underlying this is a design question about when releaseip is
successful. Should releaseip be a distinct operation, with subsequent
reconfigurations considered separately?
Update logging to clearly identify each of the 3 possible errors.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
It is more efficient calling ctdb_sys_local_ip_check() inside a loop
compared to calling ctdb_sys_have_ip(). There is a chance that this
is premature optimisation... but it sure is easy. Fall back to
checking with bind().
I think these checks really exist because of the weirdness fixed by
commit 4b4e4d8870475d994fe42a7b2c57dc69842d91f6. However, we might as
well do what we can.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Improve readability by not repeating the complex expression now
assigned to addr. ctdb_sys_have_ip() is called in both arms of the
if/else, so call it once when declaring the new variable.
Modernise debug macros while touching lines.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Saves lines, str_list_add_printf takes care of NULL checks
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Sun Sep 22 10:44:59 UTC 2024 on atb-devel-224
|
|
I could not find out how to cast a char ** to const char ** without
warning. This transfers fine to the execv call as well.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
Saves lines, str_list_add_printf takes care of NULL checks
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Noel Power <noel.power@suse.com>
|
|
Don't hide the real action inside an if-branch
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Noel Power <noel.power@suse.com>
|
|
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jennifer Sutton <jsutton@samba.org>
|
|
Code to setup the transport is about to be cleaned up, including
removing uses of ctdb_set_error(), so avoid logging a NULL pointer or
some other old error.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
|
|
In a future commit we will add support for loading the config file from
the `ctdb` command line tool. Prior to this change the config file load
func always called D_NOTICE that causes the command to emit new text and
thus break all the tests that rely on the specific test output (not to
mention something users could notice). This change plumbs a new
`verbose` argument into some of the config file loading functions.
Generally, all existing functions will have verbose set to true to match
the existing behavior. Future callers of this function can set it to
false in order to avoid emitting the extra text.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
Rename ctdb_load_nodes_file to ctdb_load_nodes as it can now load nodes
from more than a regular file.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
Rename the `struct ctdb_context` field nodes_file to nodes_source to
better match that the field may indicate something other than a true
file.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
Use the new "nodes list" configuration option. Executing the given path
if the path is prefixed by a `!`. The use case is to decouple the nodes
file from the shared storage, especially in the case where the shared
storage is provided by a vfs module.
For an example, imagine a script that runs `curl` on a URL for a
highly-available web server where the URL provides the content
of the nodes file.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
While here, fix a trivial memory leak (ctdbd will exit anyway if this
function fails).
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jul 23 12:39:18 UTC 2024 on atb-devel-224
|
|
ctdb_control_getnodesfile() calls ctdb_read_nodes(), which returns a
struct ctdb_node_map rather than the old version, so update associated
marshalling. While here modernise a debug message and wrap the
function arguments.
For ctdb_load_nodes_file() to use ctdb_read_nodes(), tweak
convert_node_map_to_list() to also use the modern node map structure.
Remove unused copy of ctdb_read_nodes_file().
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
We might end up using it elsewhere.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Guenther Deschner <gd@samba.org>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Leave common/conf.[ch] where they are to make this commit
comprehensible.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Guenther Deschner <gd@samba.org>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Apr 17 00:54:55 UTC 2024 on atb-devel-224
|
|
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
|
|
We have the same function in tevent, no need to duplicate code.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
|