<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/unicode, branch v6.6.131</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>Revert "unicode: Don't special case ignorable code points"</title>
<updated>2024-12-14T19:00:20+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-12-11T22:11:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=7535956ffe5bd9060402b4e67f91792c4cdf1ed2'/>
<id>7535956ffe5bd9060402b4e67f91792c4cdf1ed2</id>
<content type='text'>
[ Upstream commit 231825b2e1ff6ba799c5eaf396d3ab2354e37c6b ]

This reverts commit 5c26d2f1d3f5e4be3e196526bead29ecb139cf91.

It turns out that we can't do this, because while the old behavior of
ignoring ignorable code points was most definitely wrong, we have
case-folding filesystems with on-disk hash values with that wrong
behavior.

So now you can't look up those names, because they hash to something
different.

Of course, it's also entirely possible that in the meantime people have
created *new* files with the new ("more correct") case folding logic,
and reverting will just make other things break.

The correct solution is to not do case folding in filesystems, but
sadly, people seem to never really understand that.  People still see it
as a feature, not a bug.

Reported-by: Qi Han &lt;hanqi@vivo.com&gt;
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586
Cc: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Requested-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 231825b2e1ff6ba799c5eaf396d3ab2354e37c6b ]

This reverts commit 5c26d2f1d3f5e4be3e196526bead29ecb139cf91.

It turns out that we can't do this, because while the old behavior of
ignoring ignorable code points was most definitely wrong, we have
case-folding filesystems with on-disk hash values with that wrong
behavior.

So now you can't look up those names, because they hash to something
different.

Of course, it's also entirely possible that in the meantime people have
created *new* files with the new ("more correct") case folding logic,
and reverting will just make other things break.

The correct solution is to not do case folding in filesystems, but
sadly, people seem to never really understand that.  People still see it
as a feature, not a bug.

Reported-by: Qi Han &lt;hanqi@vivo.com&gt;
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586
Cc: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Requested-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: Fix utf8_load() error path</title>
<updated>2024-12-09T09:32:12+00:00</updated>
<author>
<name>André Almeida</name>
<email>andrealmeid@igalia.com</email>
</author>
<published>2024-09-02T22:55:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=c4b6c1781f6cc4e2283120ac8d873864b8056f21'/>
<id>c4b6c1781f6cc4e2283120ac8d873864b8056f21</id>
<content type='text'>
[ Upstream commit 156bb2c569cd869583c593d27a5bd69e7b2a4264 ]

utf8_load() requests the symbol "utf8_data_table" and then checks if the
requested UTF-8 version is supported. If it's unsupported, it tries to
put the data table using symbol_put(). If an unsupported version is
requested, symbol_put() fails like this:

 kernel BUG at kernel/module/main.c:786!
 RIP: 0010:__symbol_put+0x93/0xb0
 Call Trace:
  &lt;TASK&gt;
  ? __die_body.cold+0x19/0x27
  ? die+0x2e/0x50
  ? do_trap+0xca/0x110
  ? do_error_trap+0x65/0x80
  ? __symbol_put+0x93/0xb0
  ? exc_invalid_op+0x51/0x70
  ? __symbol_put+0x93/0xb0
  ? asm_exc_invalid_op+0x1a/0x20
  ? __pfx_cmp_name+0x10/0x10
  ? __symbol_put+0x93/0xb0
  ? __symbol_put+0x62/0xb0
  utf8_load+0xf8/0x150

That happens because symbol_put() expects the unique string that
identify the symbol, instead of a pointer to the loaded symbol. Fix that
by using such string.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Reviewed-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Link: https://lore.kernel.org/r/20240902225511.757831-2-andrealmeid@igalia.com
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 156bb2c569cd869583c593d27a5bd69e7b2a4264 ]

utf8_load() requests the symbol "utf8_data_table" and then checks if the
requested UTF-8 version is supported. If it's unsupported, it tries to
put the data table using symbol_put(). If an unsupported version is
requested, symbol_put() fails like this:

 kernel BUG at kernel/module/main.c:786!
 RIP: 0010:__symbol_put+0x93/0xb0
 Call Trace:
  &lt;TASK&gt;
  ? __die_body.cold+0x19/0x27
  ? die+0x2e/0x50
  ? do_trap+0xca/0x110
  ? do_error_trap+0x65/0x80
  ? __symbol_put+0x93/0xb0
  ? exc_invalid_op+0x51/0x70
  ? __symbol_put+0x93/0xb0
  ? asm_exc_invalid_op+0x1a/0x20
  ? __pfx_cmp_name+0x10/0x10
  ? __symbol_put+0x93/0xb0
  ? __symbol_put+0x62/0xb0
  utf8_load+0xf8/0x150

That happens because symbol_put() expects the unique string that
identify the symbol, instead of a pointer to the loaded symbol. Fix that
by using such string.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Reviewed-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Link: https://lore.kernel.org/r/20240902225511.757831-2-andrealmeid@igalia.com
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: Don't special case ignorable code points</title>
<updated>2024-10-17T13:24:07+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2024-10-08T22:43:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ac20736861f3c9c8e0a78273a4c57e9bcb0d8cc6'/>
<id>ac20736861f3c9c8e0a78273a4c57e9bcb0d8cc6</id>
<content type='text'>
commit 5c26d2f1d3f5e4be3e196526bead29ecb139cf91 upstream.

We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5c26d2f1d3f5e4be3e196526bead29ecb139cf91 upstream.

We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: remove MODULE_LICENSE in non-modules</title>
<updated>2023-04-13T20:13:54+00:00</updated>
<author>
<name>Nick Alcock</name>
<email>nick.alcock@oracle.com</email>
</author>
<published>2023-03-07T18:02:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=573858e85d7d390313fbc1f452ae764512f9819d'/>
<id>573858e85d7d390313fbc1f452ae764512f9819d</id>
<content type='text'>
Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock &lt;nick.alcock@oracle.com&gt;
Suggested-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Acked-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa &lt;hasegawa-hitomi@fujitsu.com&gt;
Cc: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock &lt;nick.alcock@oracle.com&gt;
Suggested-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Acked-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa &lt;hasegawa-hitomi@fujitsu.com&gt;
Cc: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kbuild: unify cmd_copy and cmd_shipped</title>
<updated>2022-02-14T01:37:32+00:00</updated>
<author>
<name>Masahiro Yamada</name>
<email>masahiroy@kernel.org</email>
</author>
<published>2022-01-25T06:40:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=a5575df58004e8444e5a2a307407c3f1a6ecf175'/>
<id>a5575df58004e8444e5a2a307407c3f1a6ecf175</id>
<content type='text'>
cmd_copy and cmd_shipped have similar functionality. The difference is
that cmd_copy uses 'cp' while cmd_shipped 'cat'.

Unify them into cmd_copy because this macro name is more intuitive.

Going forward, cmd_copy will use 'cat' to avoid the permission issue.
I also thought of 'cp --no-preserve=mode' but this option is not
mentioned in the POSIX spec [1], so I am keeping the 'cat' command.

[1]: https://pubs.opengroup.org/onlinepubs/009695299/utilities/cp.html
Signed-off-by: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Reviewed-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cmd_copy and cmd_shipped have similar functionality. The difference is
that cmd_copy uses 'cp' while cmd_shipped 'cat'.

Unify them into cmd_copy because this macro name is more intuitive.

Going forward, cmd_copy will use 'cat' to avoid the permission issue.
I also thought of 'cp --no-preserve=mode' but this option is not
mentioned in the POSIX spec [1], so I am keeping the 'cat' command.

[1]: https://pubs.opengroup.org/onlinepubs/009695299/utilities/cp.html
Signed-off-by: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Reviewed-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode</title>
<updated>2022-02-01T19:13:24+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-02-01T19:13:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=630c12862c21a312c15a494922cdbf9c1beb1733'/>
<id>630c12862c21a312c15a494922cdbf9c1beb1733</id>
<content type='text'>
Pull unicode cleanup from Gabriel Krisman Bertazi:
 "A fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into
  the previous CONFIG_UNICODE. It is -rc material since we don't want to
  expose the former symbol on 5.17.

  This has been living on linux-next for the past week"

* tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  unicode: clean up the Kconfig symbol confusion
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull unicode cleanup from Gabriel Krisman Bertazi:
 "A fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into
  the previous CONFIG_UNICODE. It is -rc material since we don't want to
  expose the former symbol on 5.17.

  This has been living on linux-next for the past week"

* tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  unicode: clean up the Kconfig symbol confusion
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: clean up the Kconfig symbol confusion</title>
<updated>2022-01-21T00:57:24+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-01-18T06:56:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=5298d4bfe80f6ae6ae2777bcd1357b0022d98573'/>
<id>5298d4bfe80f6ae6ae2777bcd1357b0022d98573</id>
<content type='text'>
Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.

Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.

Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: fix .gitignore for generated utfdata file</title>
<updated>2022-01-17T05:26:43+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-01-17T05:26:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=98f2345773f9ac739350230a85f9a7f7b1fe21a6'/>
<id>98f2345773f9ac739350230a85f9a7f7b1fe21a6</id>
<content type='text'>
Commit 2b3d04787012 ("unicode: Add utf8-data module") changed the
generated utf8data file from 'utf8data.h' to 'utf8data.c', but didn't
change the comments or the .gitignore to match.

The comments should be updated too, but at least they don't cause any
visible breakage.  But the gitignore file needs changing to avoid git
complaining about untracked files.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 2b3d04787012 ("unicode: Add utf8-data module") changed the
generated utf8data file from 'utf8data.h' to 'utf8data.c', but didn't
change the comments or the .gitignore to match.

The comments should be updated too, but at least they don't cause any
visible breakage.  But the gitignore file needs changing to avoid git
complaining about untracked files.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: only export internal symbols for the selftests</title>
<updated>2021-10-12T14:41:39+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-09-15T07:00:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=e2a58d2d3416aceeae63dfc7bf680dd390ff331d'/>
<id>e2a58d2d3416aceeae63dfc7bf680dd390ff331d</id>
<content type='text'>
The exported symbols in utf8-norm.c are not needed for normal
file system consumers, so move them to conditional _GPL exports
just for the selftest.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The exported symbols in utf8-norm.c are not needed for normal
file system consumers, so move them to conditional _GPL exports
just for the selftest.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: Add utf8-data module</title>
<updated>2021-10-12T14:41:39+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-09-15T07:00:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2b3d047870120bcd46d7cc257d19ff49328fd585'/>
<id>2b3d047870120bcd46d7cc257d19ff49328fd585</id>
<content type='text'>
utf8data.h contains a large database table which is an auto-generated
decodification trie for the unicode normalization functions.

Allow building it into a separate module.

Based on a patch from Shreeya Patel &lt;shreeya.patel@collabora.com&gt;.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
utf8data.h contains a large database table which is an auto-generated
decodification trie for the unicode normalization functions.

Allow building it into a separate module.

Based on a patch from Shreeya Patel &lt;shreeya.patel@collabora.com&gt;.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
