linux.git/fs/unicode, branch v6.6.131

Revert "unicode: Don't special case ignorable code points"

2024-12-14T19:00:20+00:00

[ Upstream commit 231825b2e1ff6ba799c5eaf396d3ab2354e37c6b ]

This reverts commit 5c26d2f1d3f5e4be3e196526bead29ecb139cf91.

It turns out that we can't do this, because while the old behavior of
ignoring ignorable code points was most definitely wrong, we have
case-folding filesystems with on-disk hash values with that wrong
behavior.

So now you can't look up those names, because they hash to something
different.

Of course, it's also entirely possible that in the meantime people have
created *new* files with the new ("more correct") case folding logic,
and reverting will just make other things break.

The correct solution is to not do case folding in filesystems, but
sadly, people seem to never really understand that.  People still see it
as a feature, not a bug.

Reported-by: Qi Han 
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586
Cc: Gabriel Krisman Bertazi 
Requested-by: Jaegeuk Kim 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

unicode: Fix utf8_load() error path

2024-12-09T09:32:12+00:00

[ Upstream commit 156bb2c569cd869583c593d27a5bd69e7b2a4264 ]

utf8_load() requests the symbol "utf8_data_table" and then checks if the
requested UTF-8 version is supported. If it's unsupported, it tries to
put the data table using symbol_put(). If an unsupported version is
requested, symbol_put() fails like this:

 kernel BUG at kernel/module/main.c:786!
 RIP: 0010:__symbol_put+0x93/0xb0
 Call Trace:
  
  ? __die_body.cold+0x19/0x27
  ? die+0x2e/0x50
  ? do_trap+0xca/0x110
  ? do_error_trap+0x65/0x80
  ? __symbol_put+0x93/0xb0
  ? exc_invalid_op+0x51/0x70
  ? __symbol_put+0x93/0xb0
  ? asm_exc_invalid_op+0x1a/0x20
  ? __pfx_cmp_name+0x10/0x10
  ? __symbol_put+0x93/0xb0
  ? __symbol_put+0x62/0xb0
  utf8_load+0xf8/0x150

That happens because symbol_put() expects the unique string that
identify the symbol, instead of a pointer to the loaded symbol. Fix that
by using such string.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: André Almeida 
Reviewed-by: Theodore Ts'o 
Link: https://lore.kernel.org/r/20240902225511.757831-2-andrealmeid@igalia.com
Signed-off-by: Gabriel Krisman Bertazi 
Signed-off-by: Sasha Levin

unicode: Don't special case ignorable code points

2024-10-17T13:24:07+00:00

commit 5c26d2f1d3f5e4be3e196526bead29ecb139cf91 upstream.

We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.

Signed-off-by: Gabriel Krisman Bertazi 
Signed-off-by: Greg Kroah-Hartman

unicode: remove MODULE_LICENSE in non-modules

2023-04-13T20:13:54+00:00

Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock 
Suggested-by: Luis Chamberlain 
Acked-by: Gabriel Krisman Bertazi 
Cc: Luis Chamberlain 
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa 
Cc: Gabriel Krisman Bertazi 
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Luis Chamberlain

kbuild: unify cmd_copy and cmd_shipped

2022-02-14T01:37:32+00:00

cmd_copy and cmd_shipped have similar functionality. The difference is
that cmd_copy uses 'cp' while cmd_shipped 'cat'.

Unify them into cmd_copy because this macro name is more intuitive.

Going forward, cmd_copy will use 'cat' to avoid the permission issue.
I also thought of 'cp --no-preserve=mode' but this option is not
mentioned in the POSIX spec [1], so I am keeping the 'cat' command.

[1]: https://pubs.opengroup.org/onlinepubs/009695299/utilities/cp.html
Signed-off-by: Masahiro Yamada 
Reviewed-by: Nick Desaulniers 
Reviewed-by: Gabriel Krisman Bertazi

Merge tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

2022-02-01T19:13:24+00:00

Pull unicode cleanup from Gabriel Krisman Bertazi:
 "A fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into
  the previous CONFIG_UNICODE. It is -rc material since we don't want to
  expose the former symbol on 5.17.

  This has been living on linux-next for the past week"

* tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  unicode: clean up the Kconfig symbol confusion

unicode: clean up the Kconfig symbol confusion

2022-01-21T00:57:24+00:00

Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.

Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds 
Reviewed-by: Eric Biggers 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi

unicode: fix .gitignore for generated utfdata file

2022-01-17T05:26:43+00:00

Commit 2b3d04787012 ("unicode: Add utf8-data module") changed the
generated utf8data file from 'utf8data.h' to 'utf8data.c', but didn't
change the comments or the .gitignore to match.

The comments should be updated too, but at least they don't cause any
visible breakage.  But the gitignore file needs changing to avoid git
complaining about untracked files.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: Linus Torvalds

unicode: only export internal symbols for the selftests

2021-10-12T14:41:39+00:00

The exported symbols in utf8-norm.c are not needed for normal
file system consumers, so move them to conditional _GPL exports
just for the selftest.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi

unicode: Add utf8-data module

2021-10-12T14:41:39+00:00

utf8data.h contains a large database table which is an auto-generated
decodification trie for the unicode normalization functions.

Allow building it into a separate module.

Based on a patch from Shreeya Patel .

Signed-off-by: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi