summaryrefslogtreecommitdiff
path: root/lib/compression/tests
AgeCommit message (Collapse)AuthorFilesLines
2025-10-17Add missing include needed for cmocka.hAndreas Schneider2-0/+2
This will be required in future. Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Anoop C S <anoopcs@samba.org>
2025-08-27lib:compression: Fix code spellingJennifer Sutton1-1/+1
Signed-off-by: Jennifer Sutton <jennifersutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2025-08-11compression/tests: Fix possible out of bound access CID:1517301Vinit Agnihotri1-0/+5
This would fix additional coverity issue:1517285 Signed-off-by: Vinit Agnihotri <vagnihot@redhat.com> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Mon Aug 11 22:59:26 UTC 2025 on atb-devel-224
2025-03-29lib/compression: add a windows python script for test vectorsDouglas Bagnall1-0/+155
The C program we have (generate-windows-test-vectors.c) uses a higher level API than MS-XCA refers to, which plays tricks like refusing to do compression if the result would be larger than the original. It does that because I could not successfully compile something using the correct RtlCompressBuffer API in Cygwin. It turns out you don't need to compile anything; using the Python ctypes library, the Windows libraries are available to Python. The compression *is* the same, which is what we expected. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Ralph Boehme <slow@samba.org>
2023-04-03lib:compression: Fix code spellingAndreas Schneider2-4/+4
Best reviewed with: `git show --word-diff`. Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-12-19compression tests: avoid div by zero in failure (CID 1517297)Douglas Bagnall2-0/+2
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
2022-12-19compression/tests: calm the static analysts (CID: numerous)Douglas Bagnall2-5/+21
None of our test vectors are 18446744073709551615 bytes long, which means we can know an `expected_length == returned_length` check will catch the case where the compression function returns -1 for error. We know that, but Coverity doesn't. It's the same thing over and over again, in two different patterns: >>> CID 1517301: Memory - corruptions (OVERRUN) >>> Calling "memcmp" with "original.data" and "original.length" is suspicious because of the very large index, 18446744073709551615. The index may be due to a negative parameter being interpreted as unsigned. 393 if (original.length != decomp_written || 394 memcmp(decompressed.data, 395 original.data, 396 original.length) != 0) { 397 debug_message("\033[1;31mgot %zd, expected %zu\033[0m\n", 398 decomp_written, *** CID 1517299: Memory - corruptions (OVERRUN) /lib/compression/tests/test_lzxpress_plain.c: 296 in test_lzxpress_plain_decompress_more_compressed_files() 290 debug_start_timer(); 291 written = lzxpress_decompress(p.compressed.data, 292 p.compressed.length, 293 dest, 294 p.decompressed.length); 295 debug_end_timer("decompress", p.decompressed.length); >>> CID 1517299: Memory - corruptions (OVERRUN) >>> Calling "memcmp" with "p.decompressed.data" and "p.decompressed.length" is suspicious because of the very large index, 18446744073709551615. The index may be due to a negative parameter being interpreted as unsigned. 296 if (written == p.decompressed.length && 297 memcmp(dest, p.decompressed.data, p.decompressed.length) == 0) { 298 debug_message("\033[1;32mdecompressed %s! Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
2022-12-06lib/compression: Include missing stat header fileAnoop C S2-0/+2
<sys/stat.h> was missing from compression library tests which resulted in the following compile time error: ../../lib/compression/tests/test_lzx_huffman.c: In function ‘datablob_from_file’: ../../lib/compression/tests/test_lzx_huffman.c:383:21: error: storage size of ‘s’ isn’t known 383 | struct stat s; | ^ ../../lib/compression/tests/test_lzx_huffman.c:389:15: warning: implicit declaration of function ‘fstat’ [-Wimplicit-function-declaration] 389 | ret = fstat(fileno(fh), &s); | ^~~~~ Signed-off-by: Anoop C S <anoopcs@samba.org> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Volker Lendecke <vl@samba.org> Autobuild-User(master): Volker Lendecke <vl@samba.org> Autobuild-Date(master): Tue Dec 6 11:39:16 UTC 2022 on sn-devel-184
2022-12-04lib:compression: Initialize variablesAndreas Schneider1-2/+2
lib/compression/tests/test_lzx_huffman.c: In function ‘test_lzxpress_huffman_overlong_matches’: lib/compression/tests/test_lzx_huffman.c:1013:35: error: ‘j’ may be used uninitialized [-Werror=maybe-uninitialized] 1013 | assert_int_equal(score, i * j); | ^ lib/compression/tests/test_lzx_huffman.c:979:19: note: ‘j’ was declared here 979 | size_t i, j; | ^ lib/compression/tests/test_lzx_huffman.c: In function ‘test_lzxpress_huffman_overlong_matches_abc’: lib/compression/tests/test_lzx_huffman.c:1059:39: error: ‘k’ may be used uninitialized [-Werror=maybe-uninitialized] 1059 | assert_int_equal(score, i * j * k); | ^ lib/compression/tests/test_lzx_huffman.c:1020:22: note: ‘k’ was declared here 1020 | size_t i, j, k; | ^ lib/compression/tests/test_lzx_huffman.c:1059:35: error: ‘j’ may be used uninitialized [-Werror=maybe-uninitialized] 1059 | assert_int_equal(score, i * j * k); | ^ lib/compression/tests/test_lzx_huffman.c:1020:19: note: ‘j’ was declared here 1020 | size_t i, j, k; | ^ Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org> Autobuild-Date(master): Sun Dec 4 09:12:30 UTC 2022 on sn-devel-184
2022-12-01lib/compression: more tests for lzxpress plain compressionDouglas Bagnall1-0/+749
These are based on (i.e. copied and pasted from) the LZ77 + Huffman tests. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01testdata: move compression examples to re-use with lzxpress plainDouglas Bagnall1-3/+3
Everything that is in testdata/compression/lzxpress-huffman/ can also be used for lzxpress plain tests, which is something we really need. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression/lzx-plain: relax size requirements on long fileDouglas Bagnall1-2/+8
We are going to change from a slow exact match algorithm to a fast heuristic search that will not always get the same results as the exhaustive search. To be precise, a million zeros will compress to 112 rather than 93 bytes. We don't insist on an exact size, because that is not an issue here. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/comression: convert test_lzxpress_plain to cmockaDouglas Bagnall1-128/+69
Mainly so I can go make bin/test_lzxpress_plain && bin/test_lzxpress_plain valgrind bin/test_lzxpress_plain rr bin/test_lzxpress_plain rr replay in a tight loop. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression: add test scripts READMEDouglas Bagnall1-0/+19
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression: test util to generate fuzzing seedsDouglas Bagnall1-0/+45
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression: Windows utility to generate test vectorsDouglas Bagnall1-0/+206
If compiled on Windows using Cygwin, MSYS2, or similar, this will output compressed versions of files exactly as specified by MZ-XCA, if the following conditions are met: 1. The file > 300 bytes. 2. The compressed file is smaller than the decompressed file. Otherwise it returns the data unchanged. Without warning; that's just how the API works. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression: script to test 3 byte hashDouglas Bagnall1-0/+49
Compression uses a 3 byte hash remember LZ77 matches in a 14-bit table. This script runs the hash over all 16M combinations, then again over all ASCII combinations, counting collisions to find hot-spots. If you think you have a better hash, you are probably right, but you should try it here -- alter h() -- before committing to it. This one is literally the first one I thought of. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression: helper script to make unbalanced dataDouglas Bagnall1-0/+185
Huffman tree re-quantisation and perhaps other code paths are only triggered by pathological data like this. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression: add a debug script to describe headersDouglas Bagnall1-0/+54
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression/tests: add lzhuffman timer functionsDouglas Bagnall1-5/+36
With LZXHUFF_DEBUG_VERBOSE set, we measure the compression and decompression rate relative to the decompressed size. On reasonably long strings on my laptop, compiled with -O0, it turns out to between 20 and 500 MB/s, both ways, depending on the complexity of the string. Very short strings are of course dominated by overhead. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression: LZ77 + Huffman compressionDouglas Bagnall1-0/+705
This compresses files as described in MS-XCA 2.2, and as decompressed by the decompressor in the previous commit. As with the decompressor, there are two public functions -- one that uses a talloc context, and one that uses pre-allocated memory. The compressor requires a tightly bound amount of auxillary memory (>220kB) in a few different buffers, which is all gathered together in the public struct lzxhuff_compressor_mem. An instantiated but not initialised copy of this struct is required by the non-talloc function; it can be used over and over again. Our compression speed is about the same as the decompression speed (between 20 and 500 MB/s on this laptop, depending on the data), and our compression ratio is very similar to that of Windows. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression: add LZ77 + Huffman decompressionDouglas Bagnall1-0/+514
This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in many posts on the cifs-protocol list[1]. The two public functions are: ssize_t lzxpress_huffman_decompress(const uint8_t *input, size_t input_size, uint8_t *output, size_t output_size); uint8_t *lzxpress_huffman_decompress_talloc(TALLOC_CTX *mem_ctx, const uint8_t *input_bytes, size_t input_size, size_t output_size); In both cases the caller needs to know the *exact* decompressed size, which is essential for decompression. The _talloc version allocates the buffer for you, and uses the talloc context to allocate a 128k working buffer. THe non-talloc function will allocate the working buffer on the stack. This compression format gives better compression for messages of several kilobytes than the "plain" LXZPRESS compression, but is probably a bit slower to decompress and is certainly worse for very short messages, having a fixed 256 byte overhead for the first Huffman table. Experiments show decompression rates between 20 and 500 MB per second, depending on the compression ratio and data size, on an i5-1135G7 with no compiler optimisations. This compression format is used in AD claims and in SMB, but that doesn't happen with this commit. I will not try to describe LZ77 or Huffman encoding here. Don't expect an answer in MS-XCA either; instead read the code and/or Wikipedia. [1] Much of that starts here: https://lists.samba.org/archive/cifs-protocol/2022-October/ but there's more earlier, particularly in June/July 2020, when Aurélien Aptel was working on an implementation that ended up in Wireshark. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01lib/compression: move lzxpress_plain test into tests/Douglas Bagnall1-0/+483
We are going to add more tests for lib/compression, and they can't all be called "testsuite.c". Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>