| Age | Commit message (Collapse) | Author | Files | Lines |
|
Reimplement ip_maskbits_iface() using the ip -brief option. Do less
parsing, no longer extract maskbits but return whole prefix.
Retain ip_maskbits_iface() for backward compatibility in case custom
event scripts are using it.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
|
|
ip addr assumes these defaults anyway. They are just noise.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
|
|
Using $_bcast to determine if the address is an IPv6 one is lazy. It
causes anyone reading the code (including the original author) to have
to go back and confirm that the condition makes sense.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
|
|
For consistency with new ip_addr_del().
Update all callers of add_ip_to_iface() to use this function
instead.
Retain add_ip_to_iface() for backward compatibility in case custom
event scripts are using it.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
|
|
Using a prefix is more natural because it matches "ip addr ..." usage.
It should also allow for less parsing.
Update all callers of delete_ip_from_iface() to use this function
instead.
Retain delete_ip_from_iface() for backward compatibility in case
custom event scripts are using it.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
|
|
Well known, explicit structured programming constructs are arguably
easier to understand than implicit shell magic.
Only change instances that will be updated by subsequent commits.
Doing this separately, instead of in each subsequent commit, will make
those commits easier to understand.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
|
|
Best reviewed with "git show -w" or similar.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
|
|
tests/UNIT/eventscripts/10.interface.020.sh fails in case
"10.interface.script releaseip dev123 10.0.0.3 24" with:
--------------------------------------------------
Output (Exit status: 0):
--------------------------------------------------
Killed 10/10 TCP connections to released IP 10.0.0.3, using ss -K
--------------------------------------------------
Required output (Exit status: 0):
--------------------------------------------------
Killed 10/10 TCP connections to released IP 10.0.0.3, using ss -K
FAILED
==========================================================================
TEST FAILED: ./tests/UNIT/eventscripts/10.interface.020.sh (status 1) (duration: 1s)
==========================================================================
We have seen this type of thing before when output doesn't match
because FreeBSD wc -l space-pads output. For example, see commit
c6c81ea287924c2924aebc6dc0cdea1dc4322ae2.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15935
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
|
|
Rename CTDB_NFS_STATE_MNT to CTDB_NFS_SHARED_STATE_DIR. It doesn't
have to be a mount but can be any directory in a cluster filesystem.
CTDB_NFS_SHARED_STATE_DIR will soon be used in statd_callout_helper,
so the variable name might as well be better.
With this change, it will still only be used by nfs-ganesha-callout,
which isn't yet supported (i.e. it still lives in doc/examples). The
rest of the comments below refer to behaviour changes in that script.
CTDB_NFS_SHARED_STATE_DIR is now mandatory when GPFS is used. This is
much saner that choosing the first GPFS filesystem - if the state
directory changes then connection metadata can be lost.
Drop CTDB_NFS_STATE_FS_TYPE. The filesystem type is now determined
from CTDB_NFS_SHARED_STATE_DIR and it is now checked against supported
filesystems. This will catch the case when the filesystem for the
specified directory has not been mounted and the filesystem for the
mountpoint (e.g. ext4) is not a supported filesystem for shared state.
A side-effect is that the filesystem containing
CTDB_NFS_SHARED_STATE_DIR must be mounted when nfs-ganesha-callout is
first run.
While touching this file, my shfmt pre-commit hook wants to insert a
trailing ;; into a case statement. Let's sneak that in here too.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
This allows CTDB to be configured to use "ss -K" to reset TCP
connections on "releaseip". This is only supported when the kernel is
configured with CONFIG_INET_DIAG_DESTROY enabled.
From the documentation:
ss -K has been supported in ss since iproute 4.5 in March 2016 and
in the Linux kernel since 4.4 in December 2015. However, the
required kernel configuration item CONFIG_INET_DIAG_DESTROY is
disabled by default. Although enabled in Debian kernels since
~2017 and in Ubuntu since at least 18.04,, this has only recently
been enabled in distributions such as RHEL. There seems to be no
way, including running ss -K, to determine if this is supported, so
use of this feature needs to be configurable. When available, it
should be the fastest, most reliable way of killing connections.
For RHEL and derivatives, this was enabled as follows:
* RHEL 8 via https://bugzilla.redhat.com/show_bug.cgi?id=2230213,
arriving in version kernel-4.18.0-513.5.1.el8_9
* RHEL 9 via https://issues.redhat.com/browse/RHEL-212, arriving in
kernel-5.14.0-360.el9
Enabling this option results in a small behaviour change because ss -K
always does a 2-way kill (i.e. it also sends a RST to the client).
Only a 1-way kill is done for SMB connections when ctdb_killtcp is
used - the reasons for this are shrouded in history and the 2-way kill
seems to work fine.
For the summary that is logged, when CTDB_KILLTCP_USE_SS_KILL is "yes"
or "try", always log the method used, even the fallback to
ctdb_killtcp. However, when set to "no", maintain the existing
output.
The decision to use -K rather than --kill is because short options are
trivial to implement in test stubs.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Nov 7 00:12:34 UTC 2024 on atb-devel-224
|
|
This will be used in a slightly different context in a subsequent
commit. In that case, the number of killed connections will be passed
instead of the total number of connections, so support this here via
different modes instead of churning later.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
Currently TCP ports like NFS lock manager are not tracked. It is
easier to track all connections than to add a configuration system to
try to track specified ports, so do that.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
Running ss to get current connections before running ctdb gettickles
means the ss output might be out of date when the 2 lists are
compared. Some tickles might have been added after ss was run by some
other means (e.g. SMB tickles, added internally) and they would be
deleted according to the stale ss output.
This isn't currently a problem because update_tickles() is currently
only called with port 2049, so all tickles are managed by this code.
That will change in a subsequent commit.
Changing the order means the reverse problem can occur, where
update_tickles() attempts to delete an already deleted tickle. That
may happen occasionally but is harmless because it doesn't result in
missing information. It (currently) just causes a message to be
logged at DEBUG level.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
This option has been available since ~2018 and has been implemented in
the stub since then. I guess we didn't use it because CentOS 7?
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
Since commit 224e99804efef960ef4ce2ff2f4f6dced1e74146, square brackets
have been parsed by daemon and tool code, so drop the compatibility
code from here.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
This avoids duplicating logic.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
With an empty IP filter, all incoming connections to port 2049 will be
listed, not just those to public IP addresses. This causes error
messages like the following to be logged:
ctdb-eventd[...]: 60.nfs: Failed to add 1 tickles
since the connection being added seems to be for a random NFS mount
that doesn't use a public IP addresses.
This has been a problem for a long time (probably since commit
04fe9e20749985c71fef1bce7f6e4c439fe11c81 in 2015). It isn't currently
a huge deal because it only affects NFS connections. However, this
code will soon be used to track connections to public IP addresses on
all ports. This would result in a constant stream of log messages,
since there will always be some active connections.
The theory behind the fix is that if a node hosts no public IPs then
it should have no relevant connections and has no business changing
the list of registered tickles.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
|
|
If an NFS service check is set to, say, unhealthy_after=2 then it will
always switch from the (default startup) unhealthy state to healthy,
even if there is a fatal problem. If all services/scripts appear OK
then the node will become healthy. When the counter hits the limit it
will return to unhealthy. This is misleading.
Instead, never use the counter at startup, until the service becomes
healthy. This stops services flapping unhealthy-healthy-unhealthy.
A side-effect is that a service that starts in a broken state will
never be restarted to try to fix the problem. This makes sense. The
counting and restarting really exist to deal with problems that might
occur under load. The first monitor events occur before public IPs
are hosted, so there can be no load. If a service doesn't start
reliably the first time then the admin probably wants to know about
it.
nfs_iterate_test() is updated to run an initial monitor event to mark
the services as healthy. This initialises the counter so it can be
used for the important part of the test. Passing the -i option avoids
running the extra monitor event, so the first iteration will be the
initial monitor event.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
rpc.statd is single-threaded and runs its HA callout synchronously. If
it is too slow then latency accumulates and rpc.statd's backlog grows.
Running a pair of add-client/del-client events with the current code
averages ~0.030s in my test environment. This mean that 1000 clients
reclaiming locks after failover can easily cause 10s of latency. This
could cause rpc.statd to become unresponsive, resulting in a time out
for an rpcinfo-based health check of the status service.
Split the add-client/del-client events out to a standalone
statd_callout executable, written in C, to be used as the HA callout
for rpc.statd. All other functions move to statd_callout_helper.
Now, running a pair of add-client/del-client events in my test
environment averages only ~0.002s. This seems less likely to cause
latency problems.
The standalone statd_callout executable needs to read a configuration
file, which is generated by statd_callout_helper from the "startup"
event. It also needs access to a list of currently assigned public
IPs.
For backward compatibility, during installation a symlink is created
from $CTDB_BASE/statd-callout to the new statd_callout, which is
installed in the helper directory.
Testing this as part of the eventscript unit tests starts to become
even more of a hack than it used to be. However, the dependency on
stubs and the corresponding setup of fake state makes it hard to move
this elsewhere.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jun 25 04:24:57 UTC 2024 on atb-devel-224
|
|
Exports may be contained in an include file rather than the top-level
ganesha.conf.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
|
|
This is way more complicated than I would like but, as per the
comment, this is due to complexities in the way public IPs work. The
main consumer will be statd-callout, which will then be able to run as
a non-root user.
Also generate the cache file in test code, whenever the PNN is set.
However, this can cause "ctdb ip" to generate a fake IP layout before
public IPs are setup. So, have the "ctdb ip" stub generate the IP
layout every time it is run to avoid it being stale.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
|
|
get_all_interfaces() functions gets all names for all public interfaces.
However name is misleading. Thus renamed it to get_public_ifaces() and
moved it under functions.
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
|
|
/etc/os-release is quite universal. It can be found on most Linux
distros and on FreeBSD.
Attempt to use /etc/os-release to detect Red Hat, SUSE and Debian
based distros. If /etc/os-release exists but distro is unknown then
$ID is printed as the detected distro, which will probably result in
sub-optimal behaviour, but when tracing it will at least indicate that
a new distro needs to be handled.
The only way to handle missing /etc/os-release is to set
CTDB_INIT_STYLE - see ctdb.sysconfig(5) for details.
The event script unit tests are updated to use /etc/os-release so
the new logic is exercised.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Oct 30 09:19:11 UTC 2023 on atb-devel-224
|
|
This can be used for simple failure counting, without restarts, as
used in the 40.vsftpd event script. That case will subsequently be
converted and this functionality can also be used elsewhere.
Add documentation to ctdb-script.options(5) to allow parameters that
use this to be more easily described.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Uninitialised counters are treated as 0, but still produce an error.
The redirect to stderr needs to come before the redirect for a missing
counter file.
The seemingly saner alternative of moving it outside the subshell
works when dash is /bin/sh (e.g. on Debian) but does not work when
bash is /bin/sh (e.g. on Fedora).
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Logging in statd-callout tests is currently useless. This will
provide a way of seeing errors in those tests.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
SC2162 read without -r will mangle backslashes.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
|
|
To debug ctdb_killtcp failures, add
CTDB_KILLTCP_DEBUGLEVEL=DEBUG
to script.options.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Sep 20 11:42:16 UTC 2022 on sn-devel-184
|
|
This can now be made trivial.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
VLAN configuration on Linux often uses a convention of naming a VLAN
on <iface> with VLAN ID <tag> as <iface>.<tag>. To be able to monitor
the underlying interface, the original 10.interface code naively
simply stripped off the '.' and everything after (i.e. ".*", as a glob
pattern).
Some users do not use the above convention. A VLAN can be named
without including the underlying interface, but still with a
tag (e.g. vlan<tag> - the word "vlan" following by the tag) or, more
generally, perhaps without a tag (e.g. <vlan> - an arbitrary name).
The ip(8) command lists a VLAN as <vlan>@<iface>. The underlying
interface can be found by stripping everything up to and including an
'@' (i.e. "*@").
Commit bc71251433ce618c95c674d7cbe75b01a94adad9 added support for
stripping "*@". However, on suspicion, it kept support for the case
where there is no '@', falling back to stripping ".*". If ip(8) ever
did this then it was a long time ago - it has been printing a format
including '@' since at least 2004.
Stripping ".*" interferes with interesting administrative decisions,
like having '.' in interface names.
So, drop the fallback to stripping ".*" because it appears to be
unnecessary and can cause inconvenience.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
eval is not required and causes the follow ShellCheck warning:
SC2294 (warning): eval negates the benefit of arrays. Drop eval to
preserve whitespace/symbols (or eval as string).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 24 10:40:50 UTC 2022 on sn-devel-184
|
|
Note that order of sed expressions matters: the expression to delete
comment lines must come first as the second expression would transform
# comment
to
comment
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14826
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
ss added square brackets around IPv6 addresses in versions > 4.12.0
via commit aba9c23a6e1cb134840c998df14888dca469a485. CentOS 7 added
this feature somewhere mid-release. So, backward compatibility is
obviously needed.
As per the comment protocol/protocol_util.c should probably print and
parse such square brackets. However, for backward compatibility the
brackets would have to be stripped in both places in
update_tickles()... or added to the ss output when missing. Best to
leave this until we have a connection tracking daemon.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14227
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
The code has changed so this is no longer needed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
The situation for NFS config has got more complicated and is probably
broken in statd-callout on Debian-like systems at the moment. Allow
several alternative configuration names to be tried. Stop after the
first that is found and loaded.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13860
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@samba.org>
|
|
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13520
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13520
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
|
|
This is the initial location that will be used by the new
multi-component aware event daemon.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Drop function loadconfig(), replacing uses with "load_system_config
ctdb". Drop translation of old-style configuration to new
configuration file. Drop export of debugging variables. Drop
documentation and configuration examples.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu May 17 07:03:04 CEST 2018 on sn-devel-144
|
|
This pulls database options from the configuration file, caches then
and makes the values available in scripts.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
This allows other scripts to use the given options for a particular
event script. One interesting example is that the ctdb_natgw tool
should look for configuration in events.d/11.natgw.options.
In the future this will be something like
events/failover/11.natgw.options, so require the component to be
specified even though it isn't yet used.
Test support is also updated.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
This is no longer necessary after the removal of support for
CTDB_DBDIR=tmpfs.
File-local variable ctdb_rundir is no longer used, so drop it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue May 1 16:20:37 CEST 2018 on sn-devel-144
|
|
This is not used.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Apr 27 09:37:49 CEST 2018 on sn-devel-144
|
|
For now this loads the global CTDB configuration too. This will
change in the future after things are properly modularised.
This also anticipates a future change where event scripts end with a
".script" suffix.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
loadconfig() currently tries to load the CTDB configuration and also
any system configuration relevant to the current (event) script.
Instead add a new function load_system_config() to load the
distribution-specific system configuration for a component. Call this
directly in the rare scripts that need the system configuration.
Also call load_system_config when loading the CTDB configuration to
pull in anything from the CTDB system configuration. This is partly
for backward compatibility but also to get options that can be used
anywhere.
loadconfig() no longer takes an argument. It simply loads the CTDB
configuration.
Drop support for falling back to /etc/ctdb/sysconfig/ctdb (or
similar). Surely there's nobody who uses that!
Also, drop the indirection where loadconfig() calls _loadconfig().
This was used years ago as a test hook and is no longer required.
Inexplicably, this change introduces a new shellcheck test failure, so
silence this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|
|
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
|