<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/unicode, branch v6.18.21</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>unicode: kunit: change tests filename and path</title>
<updated>2025-02-12T22:00:11+00:00</updated>
<author>
<name>Gabriela Bittencourt</name>
<email>gbittencourt@lkcamp.dev</email>
</author>
<published>2024-12-02T07:55:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2be6ce9d9bd040780dda8d456fe8696a48d805be'/>
<id>2be6ce9d9bd040780dda8d456fe8696a48d805be</id>
<content type='text'>
Change utf8 kunit test filename and path to follow the style
convention on Documentation/dev-tools/kunit/style.rst

Co-developed-by: Pedro Orlando &lt;porlando@lkcamp.dev&gt;
Signed-off-by: Pedro Orlando &lt;porlando@lkcamp.dev&gt;
Co-developed-by: Danilo Pereira &lt;dpereira@lkcamp.dev&gt;
Signed-off-by: Danilo Pereira &lt;dpereira@lkcamp.dev&gt;
Signed-off-by: Gabriela Bittencourt &lt;gbittencourt@lkcamp.dev&gt;
Reviewed-by: David Gow &lt;davidgow@google.com&gt;
Acked-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Reviewed-by: Shuah Khan &lt;skhan@linuxfoundation.org&gt;
Reviewed-by: Rae Moar &lt;rmoar@google.com&gt;
Link: https://lore.kernel.org/r/20241202075545.3648096-7-davidgow@google.com
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change utf8 kunit test filename and path to follow the style
convention on Documentation/dev-tools/kunit/style.rst

Co-developed-by: Pedro Orlando &lt;porlando@lkcamp.dev&gt;
Signed-off-by: Pedro Orlando &lt;porlando@lkcamp.dev&gt;
Co-developed-by: Danilo Pereira &lt;dpereira@lkcamp.dev&gt;
Signed-off-by: Danilo Pereira &lt;dpereira@lkcamp.dev&gt;
Signed-off-by: Gabriela Bittencourt &lt;gbittencourt@lkcamp.dev&gt;
Reviewed-by: David Gow &lt;davidgow@google.com&gt;
Acked-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Reviewed-by: Shuah Khan &lt;skhan@linuxfoundation.org&gt;
Reviewed-by: Rae Moar &lt;rmoar@google.com&gt;
Link: https://lore.kernel.org/r/20241202075545.3648096-7-davidgow@google.com
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: kunit: refactor selftest to kunit tests</title>
<updated>2025-02-11T02:25:39+00:00</updated>
<author>
<name>Gabriela Bittencourt</name>
<email>gbittencourt@lkcamp.dev</email>
</author>
<published>2024-12-02T07:55:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=62b9ef504e7f89d6ae3e9ab704cc4befab1d37f0'/>
<id>62b9ef504e7f89d6ae3e9ab704cc4befab1d37f0</id>
<content type='text'>
Refactoring 'test' functions into kunit tests, to test utf-8 support in
unicode subsystem.

This allows the utf8 tests to be run alongside the KUnit test suite
using kunit-tool, quickly compiling and running all desired tests as
part of the KUnit test suite, instead of compiling the selftest module
and loading it.

The refactoring kept the original testing logic intact, while adopting a
testing pattern across different kernel modules and leveraging KUnit's
benefits.

Co-developed-by: Pedro Orlando &lt;porlando@lkcamp.dev&gt;
Signed-off-by: Pedro Orlando &lt;porlando@lkcamp.dev&gt;
Co-developed-by: Danilo Pereira &lt;dpereira@lkcamp.dev&gt;
Signed-off-by: Danilo Pereira &lt;dpereira@lkcamp.dev&gt;
Signed-off-by: Gabriela Bittencourt &lt;gbittencourt@lkcamp.dev&gt;
Reviewed-by: David Gow &lt;davidgow@google.com&gt;
Acked-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Reviewed-by: Rae Moar &lt;rmoar@google.com&gt;
Link: https://lore.kernel.org/r/20241202075545.3648096-6-davidgow@google.com
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Refactoring 'test' functions into kunit tests, to test utf-8 support in
unicode subsystem.

This allows the utf8 tests to be run alongside the KUnit test suite
using kunit-tool, quickly compiling and running all desired tests as
part of the KUnit test suite, instead of compiling the selftest module
and loading it.

The refactoring kept the original testing logic intact, while adopting a
testing pattern across different kernel modules and leveraging KUnit's
benefits.

Co-developed-by: Pedro Orlando &lt;porlando@lkcamp.dev&gt;
Signed-off-by: Pedro Orlando &lt;porlando@lkcamp.dev&gt;
Co-developed-by: Danilo Pereira &lt;dpereira@lkcamp.dev&gt;
Signed-off-by: Danilo Pereira &lt;dpereira@lkcamp.dev&gt;
Signed-off-by: Gabriela Bittencourt &lt;gbittencourt@lkcamp.dev&gt;
Reviewed-by: David Gow &lt;davidgow@google.com&gt;
Acked-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Reviewed-by: Rae Moar &lt;rmoar@google.com&gt;
Link: https://lore.kernel.org/r/20241202075545.3648096-6-davidgow@google.com
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "unicode: Don't special case ignorable code points"</title>
<updated>2024-12-11T22:11:23+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-12-11T22:11:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=231825b2e1ff6ba799c5eaf396d3ab2354e37c6b'/>
<id>231825b2e1ff6ba799c5eaf396d3ab2354e37c6b</id>
<content type='text'>
This reverts commit 5c26d2f1d3f5e4be3e196526bead29ecb139cf91.

It turns out that we can't do this, because while the old behavior of
ignoring ignorable code points was most definitely wrong, we have
case-folding filesystems with on-disk hash values with that wrong
behavior.

So now you can't look up those names, because they hash to something
different.

Of course, it's also entirely possible that in the meantime people have
created *new* files with the new ("more correct") case folding logic,
and reverting will just make other things break.

The correct solution is to not do case folding in filesystems, but
sadly, people seem to never really understand that.  People still see it
as a feature, not a bug.

Reported-by: Qi Han &lt;hanqi@vivo.com&gt;
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586
Cc: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Requested-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 5c26d2f1d3f5e4be3e196526bead29ecb139cf91.

It turns out that we can't do this, because while the old behavior of
ignoring ignorable code points was most definitely wrong, we have
case-folding filesystems with on-disk hash values with that wrong
behavior.

So now you can't look up those names, because they hash to something
different.

Of course, it's also entirely possible that in the meantime people have
created *new* files with the new ("more correct") case folding logic,
and reverting will just make other things break.

The correct solution is to not do case folding in filesystems, but
sadly, people seem to never really understand that.  People still see it
as a feature, not a bug.

Reported-by: Qi Han &lt;hanqi@vivo.com&gt;
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586
Cc: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Requested-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode</title>
<updated>2024-11-23T04:50:55+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-23T04:50:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=060fc106b6854d3289d838ac3c98eb17afb261d7'/>
<id>060fc106b6854d3289d838ac3c98eb17afb261d7</id>
<content type='text'>
Pull unicode updates from Gabriel Krisman Bertazi:

 - constify a read-only struct (Thomas Weißschuh)

 - fix the error path of unicode_load, avoiding a possible kernel oops
   if it fails to find the unicode module (André Almeida)

 - documentation fix, updating a filename in the README (Gan Jie)

 - add the link of my tree to MAINTAINERS (André Almeida)

* tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  MAINTAINERS: Add Unicode tree
  unicode: change the reference of database file
  unicode: Fix utf8_load() error path
  unicode: constify utf8 data table
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull unicode updates from Gabriel Krisman Bertazi:

 - constify a read-only struct (Thomas Weißschuh)

 - fix the error path of unicode_load, avoiding a possible kernel oops
   if it fails to find the unicode module (André Almeida)

 - documentation fix, updating a filename in the README (Gan Jie)

 - add the link of my tree to MAINTAINERS (André Almeida)

* tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  MAINTAINERS: Add Unicode tree
  unicode: change the reference of database file
  unicode: Fix utf8_load() error path
  unicode: constify utf8 data table
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: Recreate utf8_parse_version()</title>
<updated>2024-10-28T12:36:54+00:00</updated>
<author>
<name>André Almeida</name>
<email>andrealmeid@igalia.com</email>
</author>
<published>2024-10-21T16:37:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=142fa60f61f93805471012f24e029af6d113c5cc'/>
<id>142fa60f61f93805471012f24e029af6d113c5cc</id>
<content type='text'>
All filesystems that currently support UTF-8 casefold can fetch the
UTF-8 version from the filesystem metadata stored on disk. They can get
the data stored and directly match it to a integer, so they can skip the
string parsing step, which motivated the removal of this function in the
first place.

However, for tmpfs, the only way to tell the kernel which UTF-8 version
we are about to use is via mount options, using a string. Re-introduce
utf8_parse_version() to be used by tmpfs.

This version differs from the original by skipping the intermediate step
of copying the version string to an auxiliary string before calling
match_token(). This versions calls match_token() in the argument string.
The paramenters are simpler now as well.

utf8_parse_version() was created by 9d53690f0d4 ("unicode: implement
higher level API for string handling") and later removed by 49bd03cc7e9
("unicode: pass a UNICODE_AGE() tripple to utf8_load").

Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-4-f443d5814194@igalia.com
Reviewed-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All filesystems that currently support UTF-8 casefold can fetch the
UTF-8 version from the filesystem metadata stored on disk. They can get
the data stored and directly match it to a integer, so they can skip the
string parsing step, which motivated the removal of this function in the
first place.

However, for tmpfs, the only way to tell the kernel which UTF-8 version
we are about to use is via mount options, using a string. Re-introduce
utf8_parse_version() to be used by tmpfs.

This version differs from the original by skipping the intermediate step
of copying the version string to an auxiliary string before calling
match_token(). This versions calls match_token() in the argument string.
The paramenters are simpler now as well.

utf8_parse_version() was created by 9d53690f0d4 ("unicode: implement
higher level API for string handling") and later removed by 49bd03cc7e9
("unicode: pass a UNICODE_AGE() tripple to utf8_load").

Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-4-f443d5814194@igalia.com
Reviewed-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: Export latest available UTF-8 version number</title>
<updated>2024-10-28T12:36:54+00:00</updated>
<author>
<name>André Almeida</name>
<email>andrealmeid@igalia.com</email>
</author>
<published>2024-10-21T16:37:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=04dad6c6d37d741bad9946a92171bfa637e989f0'/>
<id>04dad6c6d37d741bad9946a92171bfa637e989f0</id>
<content type='text'>
Export latest available UTF-8 version number so filesystems can easily
load the newest one.

Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-3-f443d5814194@igalia.com
Acked-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Export latest available UTF-8 version number so filesystems can easily
load the newest one.

Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-3-f443d5814194@igalia.com
Acked-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: Don't special case ignorable code points</title>
<updated>2024-10-09T17:34:01+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2024-10-08T22:43:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=5c26d2f1d3f5e4be3e196526bead29ecb139cf91'/>
<id>5c26d2f1d3f5e4be3e196526bead29ecb139cf91</id>
<content type='text'>
We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: change the reference of database file</title>
<updated>2024-09-13T15:23:01+00:00</updated>
<author>
<name>Gan Jie</name>
<email>ganjie182@gmail.com</email>
</author>
<published>2024-09-12T03:19:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=66715f005bdea3b58393ffe8c9be7d692b240558'/>
<id>66715f005bdea3b58393ffe8c9be7d692b240558</id>
<content type='text'>
Commit 2b3d04787012 ("unicode: Add utf8-data module") changed
the database file from 'utf8data.h' to 'utf8data.c' to build
separate module, but it seems forgot to update README.utf8data
, which may causes confusion. Update the README.utf8data and
the default 'UTF8_NAME' in 'mkutf8data.c'.

Signed-off-by: Gan Jie &lt;ganjie182@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20240912031932.1161-1-ganjie182@gmail.com
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 2b3d04787012 ("unicode: Add utf8-data module") changed
the database file from 'utf8data.h' to 'utf8data.c' to build
separate module, but it seems forgot to update README.utf8data
, which may causes confusion. Update the README.utf8data and
the default 'UTF8_NAME' in 'mkutf8data.c'.

Signed-off-by: Gan Jie &lt;ganjie182@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20240912031932.1161-1-ganjie182@gmail.com
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: Fix utf8_load() error path</title>
<updated>2024-09-03T16:45:07+00:00</updated>
<author>
<name>André Almeida</name>
<email>andrealmeid@igalia.com</email>
</author>
<published>2024-09-02T22:55:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=156bb2c569cd869583c593d27a5bd69e7b2a4264'/>
<id>156bb2c569cd869583c593d27a5bd69e7b2a4264</id>
<content type='text'>
utf8_load() requests the symbol "utf8_data_table" and then checks if the
requested UTF-8 version is supported. If it's unsupported, it tries to
put the data table using symbol_put(). If an unsupported version is
requested, symbol_put() fails like this:

 kernel BUG at kernel/module/main.c:786!
 RIP: 0010:__symbol_put+0x93/0xb0
 Call Trace:
  &lt;TASK&gt;
  ? __die_body.cold+0x19/0x27
  ? die+0x2e/0x50
  ? do_trap+0xca/0x110
  ? do_error_trap+0x65/0x80
  ? __symbol_put+0x93/0xb0
  ? exc_invalid_op+0x51/0x70
  ? __symbol_put+0x93/0xb0
  ? asm_exc_invalid_op+0x1a/0x20
  ? __pfx_cmp_name+0x10/0x10
  ? __symbol_put+0x93/0xb0
  ? __symbol_put+0x62/0xb0
  utf8_load+0xf8/0x150

That happens because symbol_put() expects the unique string that
identify the symbol, instead of a pointer to the loaded symbol. Fix that
by using such string.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Reviewed-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Link: https://lore.kernel.org/r/20240902225511.757831-2-andrealmeid@igalia.com
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
utf8_load() requests the symbol "utf8_data_table" and then checks if the
requested UTF-8 version is supported. If it's unsupported, it tries to
put the data table using symbol_put(). If an unsupported version is
requested, symbol_put() fails like this:

 kernel BUG at kernel/module/main.c:786!
 RIP: 0010:__symbol_put+0x93/0xb0
 Call Trace:
  &lt;TASK&gt;
  ? __die_body.cold+0x19/0x27
  ? die+0x2e/0x50
  ? do_trap+0xca/0x110
  ? do_error_trap+0x65/0x80
  ? __symbol_put+0x93/0xb0
  ? exc_invalid_op+0x51/0x70
  ? __symbol_put+0x93/0xb0
  ? asm_exc_invalid_op+0x1a/0x20
  ? __pfx_cmp_name+0x10/0x10
  ? __symbol_put+0x93/0xb0
  ? __symbol_put+0x62/0xb0
  utf8_load+0xf8/0x150

That happens because symbol_put() expects the unique string that
identify the symbol, instead of a pointer to the loaded symbol. Fix that
by using such string.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Reviewed-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Link: https://lore.kernel.org/r/20240902225511.757831-2-andrealmeid@igalia.com
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unicode: constify utf8 data table</title>
<updated>2024-08-13T19:21:50+00:00</updated>
<author>
<name>Thomas Weißschuh</name>
<email>linux@weissschuh.net</email>
</author>
<published>2024-08-09T15:38:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=43bf9d9755bd21970d8382dc88f071f74fc18fbf'/>
<id>43bf9d9755bd21970d8382dc88f071f74fc18fbf</id>
<content type='text'>
All users already handle the table as const data.
Move the table itself into .rodata to guard against accidental or
malicious modifications.

Signed-off-by: Thomas Weißschuh &lt;linux@weissschuh.net&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20240809-unicode-const-v1-1-69968a258092@weissschuh.net
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All users already handle the table as const data.
Move the table itself into .rodata to guard against accidental or
malicious modifications.

Signed-off-by: Thomas Weißschuh &lt;linux@weissschuh.net&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20240809-unicode-const-v1-1-69968a258092@weissschuh.net
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
