linux.git/include/linux/bitmap.h, branch v6.6.131

lib/bitmap: add bitmap_{read,write}()

2024-10-04T14:29:59+00:00

[ Upstream commit 63c15822b8dd02a2423cfd92232245ace3f7a11b ]

The two new functions allow reading/writing values of length up to
BITS_PER_LONG bits at arbitrary position in the bitmap.

The code was taken from "bitops: Introduce the for_each_set_clump macro"
by Syed Nayyar Waris with a number of changes and simplifications:
 - instead of using roundup(), which adds an unnecessary dependency
   on , we calculate space as BITS_PER_LONG-offset;
 - indentation is reduced by not using else-clauses (suggested by
   checkpatch for bitmap_get_value());
 - bitmap_get_value()/bitmap_set_value() are renamed to bitmap_read()
   and bitmap_write();
 - some redundant computations are omitted.

Cc: Arnd Bergmann 
Signed-off-by: Syed Nayyar Waris 
Signed-off-by: William Breathitt Gray 
Link: https://lore.kernel.org/lkml/fe12eedf3666f4af5138de0e70b67a07c7f40338.1592224129.git.syednwaris@gmail.com/
Suggested-by: Yury Norov 
Co-developed-by: Alexander Potapenko 
Signed-off-by: Alexander Potapenko 
Reviewed-by: Andy Shevchenko 
Acked-by: Yury Norov 
Signed-off-by: Yury Norov 
Signed-off-by: Alexander Lobakin 
Signed-off-by: David S. Miller 
Stable-dep-of: 77b0b98bb743 ("btrfs: subpage: fix the bitmap dump which can cause bitmap corruption")
Signed-off-by: Sasha Levin

fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE

2024-08-29T15:33:14+00:00

commit 9a2fa1472083580b6c66bdaf291f591e1170123a upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor #128, despite #64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

bitmap: introduce generic optimized bitmap_size()

2024-08-29T15:33:14+00:00

commit a37fbe666c016fd89e4460d0ebfcea05baba46dc upstream.

The number of times yet another open coded
`BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge.
Some generic helper is long overdue.

Add one, bitmap_size(), but with one detail.
BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both
divident and divisor are compile-time constants or when the divisor
is not a pow-of-2. When it is however, the compilers sometimes tend
to generate suboptimal code (GCC 13):

48 83 c0 3f          	add    $0x3f,%rax
48 c1 e8 06          	shr    $0x6,%rax
48 8d 14 c5 00 00 00 00	lea    0x0(,%rax,8),%rdx

%BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does
full division of `nbits + 63` by it and then multiplication by 8.
Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC:

8d 50 3f             	lea    0x3f(%rax),%edx
c1 ea 03             	shr    $0x3,%edx
81 e2 f8 ff ff 1f    	and    $0x1ffffff8,%edx

Now it shifts `nbits + 63` by 3 positions (IOW performs fast division
by 8) and then masks bits[2:0]. bloat-o-meter:

add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617)

Clang does it better and generates the same code before/after starting
from -O1, except that with the ALIGN() approach it uses %edx and thus
still saves some bytes:

add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520)

Note that we can't expand DIV_ROUND_UP() by adding a check and using
this approach there, as it's used in array declarations where
expressions are not allowed.
Add this helper to tools/ as well.

Reviewed-by: Przemek Kitszel 
Acked-by: Yury Norov 
Signed-off-by: Alexander Lobakin 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

lib/bitmap: drop optimization of bitmap_{from,to}_arr64

2023-06-22T20:57:41+00:00

bitmap_{from,to}_arr64() optimization is overly optimistic on 32-bit LE
architectures when it's wired to bitmap_copy_clear_tail().

bitmap_copy_clear_tail() takes care of unused bits in the bitmap up to
the next word boundary. But on 32-bit machines when copying bits from
bitmap to array of 64-bit words, it's expected that the unused part of
a recipient array must be cleared up to 64-bit boundary, so the last 4
bytes may stay untouched when nbits % 64 <= 32.

While the copying part of the optimization works correct, that clear-tail
trick makes corresponding tests reasonably fail:

test_bitmap: bitmap_to_arr64(nbits == 1): tail is not safely cleared: 0xa5a5a5a500000001 (must be 0x0000000000000001)

Fix it by removing bitmap_{from,to}_arr64() optimization for 32-bit LE
arches.

Reported-by: Guenter Roeck 
Link: https://lore.kernel.org/lkml/20230225184702.GA3587246@roeck-us.net/
Fixes: 0a97953fd221 ("lib: add bitmap_{from,to}_arr64")
Signed-off-by: Yury Norov 
Tested-by: Guenter Roeck 
Reviewed-by: Andy Shevchenko 
Reviewed-by: Alexander Lobakin

lib/bitmap: remove bitmap_ord_to_pos

2022-09-26T19:19:12+00:00

Now that we have find_nth_bit(), we can drop bitmap_ord_to_pos().

Signed-off-by: Yury Norov

lib/bitmap: add bitmap_weight_and()

2022-09-26T19:19:12+00:00

The function calculates Hamming weight of (bitmap1 & bitmap2). Now we
have to do like this:
	tmp = bitmap_alloc(nbits);
	bitmap_and(tmp, map1, map2, nbits);
	weight = bitmap_weight(tmp, nbits);
	bitmap_free(tmp);

This requires additional memory, adds pressure on alloc subsystem, and
way less cache-friendly than just:
	weight = bitmap_weight_and(map1, map2, nbits);

The following patches apply it for cpumask functions.

Signed-off-by: Yury Norov

Merge tag 'bitmap-6.0-rc1' of https://github.com/norov/linux

2022-08-08T00:52:35+00:00

Pull bitmap updates from Yury Norov:

 - fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo)

 - optimize out non-atomic bitops on compile-time constants (Alexander
   Lobakin)

 - cleanup bitmap-related headers (Yury Norov)

 - x86/olpc: fix 'logical not is only applied to the left hand side'
   (Alexander Lobakin)

 - lib/nodemask: inline wrappers around bitmap (Yury Norov)

* tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits)
  lib/nodemask: inline next_node_in() and node_random()
  powerpc: drop dependency on  in archrandom.h
  x86/olpc: fix 'logical not is only applied to the left hand side'
  lib/cpumask: move some one-line wrappers to header file
  headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure
  headers/deps: mm: Split  out of 
  headers/deps: mm: Optimize  header dependencies
  lib/cpumask: move trivial wrappers around find_bit to the header
  lib/cpumask: change return types to unsigned where appropriate
  cpumask: change return types to bool where appropriate
  lib/bitmap: change type of bitmap_weight to unsigned long
  lib/bitmap: change return types to bool where appropriate
  arm: align find_bit declarations with generic kernel
  iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)
  lib/test_bitmap: test the tail after bitmap_to_arr64()
  lib/bitmap: fix off-by-one in bitmap_to_arr64()
  lib: test_bitmap: add compile-time optimization/evaluations assertions
  bitmap: don't assume compiler evaluates small mem*() builtins calls
  net/ice: fix initializing the bitmap in the switch code
  bitops: let optimize out non-atomic bitops on compile-time constants
  ...

lib/bitmap: change type of bitmap_weight to unsigned long

2022-07-15T13:35:54+00:00

bitmap_weight() doesn't return negative values, so change it's type
to unsigned long. It may help compiler to generate better code and
catch bugs.

Signed-off-by: Yury Norov

lib/bitmap: change return types to bool where appropriate

2022-07-14T22:21:43+00:00

Some bitmap functions return boolean results in int variables. Fix it
by changing return types to bool.

Signed-off-by: Yury Norov

bitmap: don't assume compiler evaluates small mem*() builtins calls

2022-07-01T02:52:42+00:00

Intel kernel bot triggered the build bug on ARC architecture that
in fact is as follows:

	DECLARE_BITMAP(bitmap, BITS_PER_LONG);

	bitmap_clear(bitmap, 0, BITS_PER_LONG);
	BUILD_BUG_ON(!__builtin_constant_p(*bitmap));

which can be expanded to:

	unsigned long bitmap[1];

	memset(bitmap, 0, sizeof(*bitmap));
	BUILD_BUG_ON(!__builtin_constant_p(*bitmap));

In most cases, a compiler is able to expand small/simple mem*()
calls to simple assignments or bitops, in this case that would mean:

	unsigned long bitmap[1] = { 0 };

	BUILD_BUG_ON(!__builtin_constant_p(*bitmap));

and on most architectures this works, but not on ARC, despite having
-O3 for every build.
So, to make this work, in case when the last bit to modify is still
within the first long (small_const_nbits()), just use plain
assignments for the rest of bitmap_*() functions which still use
mem*(), but didn't receive such compile-time optimizations yet.
This doesn't have the same coverage as compilers provide, but at
least something to start:

text: add/remove: 3/7 grow/shrink: 43/78 up/down: 1848/-3370 (-1546)
data: add/remove: 1/11 grow/shrink: 0/8 up/down: 4/-356 (-352)

notably cpumask_*() family when NR_CPUS <= BITS_PER_LONG:

netif_get_num_default_rss_queues              38       4     -34
cpumask_copy                                  90       -     -90
cpumask_clear                                146       -    -146

and the abovementioned assertion started passing.

Signed-off-by: Alexander Lobakin 
Signed-off-by: Yury Norov