<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/fork.c, branch v4.13.13</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>mm, uprobes: fix multiple free of -&gt;uprobes_state.xol_area</title>
<updated>2017-08-31T23:33:15+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2017-08-31T23:15:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=355627f518978b5167256d27492fe0b343aaf2f2'/>
<id>355627f518978b5167256d27492fe0b343aaf2f2</id>
<content type='text'>
Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for
write killable") made it possible to kill a forking task while it is
waiting to acquire its -&gt;mmap_sem for write, in dup_mmap().

However, it was overlooked that this introduced an new error path before
the new mm_struct's -&gt;uprobes_state.xol_area has been set to NULL after
being copied from the old mm_struct by the memcpy in dup_mm().  For a
task that has previously hit a uprobe tracepoint, this resulted in the
'struct xol_area' being freed multiple times if the task was killed at
just the right time while forking.

Fix it by setting -&gt;uprobes_state.xol_area to NULL in mm_init() rather
than in uprobe_dup_mmap().

With CONFIG_UPROBE_EVENTS=y, the bug can be reproduced by the same C
program given by commit 2b7e8665b4ff ("fork: fix incorrect fput of
-&gt;exe_file causing use-after-free"), provided that a uprobe tracepoint
has been set on the fork_thread() function.  For example:

    $ gcc reproducer.c -o reproducer -lpthread
    $ nm reproducer | grep fork_thread
    0000000000400719 t fork_thread
    $ echo "p $PWD/reproducer:0x719" &gt; /sys/kernel/debug/tracing/uprobe_events
    $ echo 1 &gt; /sys/kernel/debug/tracing/events/uprobes/enable
    $ ./reproducer

Here is the use-after-free reported by KASAN:

    BUG: KASAN: use-after-free in uprobe_clear_state+0x1c4/0x200
    Read of size 8 at addr ffff8800320a8b88 by task reproducer/198

    CPU: 1 PID: 198 Comm: reproducer Not tainted 4.13.0-rc7-00015-g36fde05f3fb5 #255
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
    Call Trace:
     dump_stack+0xdb/0x185
     print_address_description+0x7e/0x290
     kasan_report+0x23b/0x350
     __asan_report_load8_noabort+0x19/0x20
     uprobe_clear_state+0x1c4/0x200
     mmput+0xd6/0x360
     do_exit+0x740/0x1670
     do_group_exit+0x13f/0x380
     get_signal+0x597/0x17d0
     do_signal+0x99/0x1df0
     exit_to_usermode_loop+0x166/0x1e0
     syscall_return_slowpath+0x258/0x2c0
     entry_SYSCALL_64_fastpath+0xbc/0xbe

    ...

    Allocated by task 199:
     save_stack_trace+0x1b/0x20
     kasan_kmalloc+0xfc/0x180
     kmem_cache_alloc_trace+0xf3/0x330
     __create_xol_area+0x10f/0x780
     uprobe_notify_resume+0x1674/0x2210
     exit_to_usermode_loop+0x150/0x1e0
     prepare_exit_to_usermode+0x14b/0x180
     retint_user+0x8/0x20

    Freed by task 199:
     save_stack_trace+0x1b/0x20
     kasan_slab_free+0xa8/0x1a0
     kfree+0xba/0x210
     uprobe_clear_state+0x151/0x200
     mmput+0xd6/0x360
     copy_process.part.8+0x605f/0x65d0
     _do_fork+0x1a5/0xbd0
     SyS_clone+0x19/0x20
     do_syscall_64+0x22f/0x660
     return_from_SYSCALL_64+0x0/0x7a

Note: without KASAN, you may instead see a "Bad page state" message, or
simply a general protection fault.

Link: http://lkml.kernel.org/r/20170830033303.17927-1-ebiggers3@gmail.com
Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;    [4.7+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for
write killable") made it possible to kill a forking task while it is
waiting to acquire its -&gt;mmap_sem for write, in dup_mmap().

However, it was overlooked that this introduced an new error path before
the new mm_struct's -&gt;uprobes_state.xol_area has been set to NULL after
being copied from the old mm_struct by the memcpy in dup_mm().  For a
task that has previously hit a uprobe tracepoint, this resulted in the
'struct xol_area' being freed multiple times if the task was killed at
just the right time while forking.

Fix it by setting -&gt;uprobes_state.xol_area to NULL in mm_init() rather
than in uprobe_dup_mmap().

With CONFIG_UPROBE_EVENTS=y, the bug can be reproduced by the same C
program given by commit 2b7e8665b4ff ("fork: fix incorrect fput of
-&gt;exe_file causing use-after-free"), provided that a uprobe tracepoint
has been set on the fork_thread() function.  For example:

    $ gcc reproducer.c -o reproducer -lpthread
    $ nm reproducer | grep fork_thread
    0000000000400719 t fork_thread
    $ echo "p $PWD/reproducer:0x719" &gt; /sys/kernel/debug/tracing/uprobe_events
    $ echo 1 &gt; /sys/kernel/debug/tracing/events/uprobes/enable
    $ ./reproducer

Here is the use-after-free reported by KASAN:

    BUG: KASAN: use-after-free in uprobe_clear_state+0x1c4/0x200
    Read of size 8 at addr ffff8800320a8b88 by task reproducer/198

    CPU: 1 PID: 198 Comm: reproducer Not tainted 4.13.0-rc7-00015-g36fde05f3fb5 #255
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
    Call Trace:
     dump_stack+0xdb/0x185
     print_address_description+0x7e/0x290
     kasan_report+0x23b/0x350
     __asan_report_load8_noabort+0x19/0x20
     uprobe_clear_state+0x1c4/0x200
     mmput+0xd6/0x360
     do_exit+0x740/0x1670
     do_group_exit+0x13f/0x380
     get_signal+0x597/0x17d0
     do_signal+0x99/0x1df0
     exit_to_usermode_loop+0x166/0x1e0
     syscall_return_slowpath+0x258/0x2c0
     entry_SYSCALL_64_fastpath+0xbc/0xbe

    ...

    Allocated by task 199:
     save_stack_trace+0x1b/0x20
     kasan_kmalloc+0xfc/0x180
     kmem_cache_alloc_trace+0xf3/0x330
     __create_xol_area+0x10f/0x780
     uprobe_notify_resume+0x1674/0x2210
     exit_to_usermode_loop+0x150/0x1e0
     prepare_exit_to_usermode+0x14b/0x180
     retint_user+0x8/0x20

    Freed by task 199:
     save_stack_trace+0x1b/0x20
     kasan_slab_free+0xa8/0x1a0
     kfree+0xba/0x210
     uprobe_clear_state+0x151/0x200
     mmput+0xd6/0x360
     copy_process.part.8+0x605f/0x65d0
     _do_fork+0x1a5/0xbd0
     SyS_clone+0x19/0x20
     do_syscall_64+0x22f/0x660
     return_from_SYSCALL_64+0x0/0x7a

Note: without KASAN, you may instead see a "Bad page state" message, or
simply a general protection fault.

Link: http://lkml.kernel.org/r/20170830033303.17927-1-ebiggers3@gmail.com
Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;    [4.7+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fork: fix incorrect fput of -&gt;exe_file causing use-after-free</title>
<updated>2017-08-25T23:12:46+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2017-08-25T22:55:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2b7e8665b4ff51c034c55df3cff76518d1a9ee3a'/>
<id>2b7e8665b4ff51c034c55df3cff76518d1a9ee3a</id>
<content type='text'>
Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for
write killable") made it possible to kill a forking task while it is
waiting to acquire its -&gt;mmap_sem for write, in dup_mmap().

However, it was overlooked that this introduced an new error path before
a reference is taken on the mm_struct's -&gt;exe_file.  Since the
-&gt;exe_file of the new mm_struct was already set to the old -&gt;exe_file by
the memcpy() in dup_mm(), it was possible for the mmput() in the error
path of dup_mm() to drop a reference to -&gt;exe_file which was never
taken.

This caused the struct file to later be freed prematurely.

Fix it by updating mm_init() to NULL out the -&gt;exe_file, in the same
place it clears other things like the list of mmaps.

This bug was found by syzkaller.  It can be reproduced using the
following C program:

    #define _GNU_SOURCE
    #include &lt;pthread.h&gt;
    #include &lt;stdlib.h&gt;
    #include &lt;sys/mman.h&gt;
    #include &lt;sys/syscall.h&gt;
    #include &lt;sys/wait.h&gt;
    #include &lt;unistd.h&gt;

    static void *mmap_thread(void *_arg)
    {
        for (;;) {
            mmap(NULL, 0x1000000, PROT_READ,
                 MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
        }
    }

    static void *fork_thread(void *_arg)
    {
        usleep(rand() % 10000);
        fork();
    }

    int main(void)
    {
        fork();
        fork();
        fork();
        for (;;) {
            if (fork() == 0) {
                pthread_t t;

                pthread_create(&amp;t, NULL, mmap_thread, NULL);
                pthread_create(&amp;t, NULL, fork_thread, NULL);
                usleep(rand() % 10000);
                syscall(__NR_exit_group, 0);
            }
            wait(NULL);
        }
    }

No special kernel config options are needed.  It usually causes a NULL
pointer dereference in __remove_shared_vm_struct() during exit, or in
dup_mmap() (which is usually inlined into copy_process()) during fork.
Both are due to a vm_area_struct's -&gt;vm_file being used after it's
already been freed.

Google Bug Id: 64772007

Link: http://lkml.kernel.org/r/20170823211408.31198-1-ebiggers3@gmail.com
Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Tested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[v4.7+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for
write killable") made it possible to kill a forking task while it is
waiting to acquire its -&gt;mmap_sem for write, in dup_mmap().

However, it was overlooked that this introduced an new error path before
a reference is taken on the mm_struct's -&gt;exe_file.  Since the
-&gt;exe_file of the new mm_struct was already set to the old -&gt;exe_file by
the memcpy() in dup_mm(), it was possible for the mmput() in the error
path of dup_mm() to drop a reference to -&gt;exe_file which was never
taken.

This caused the struct file to later be freed prematurely.

Fix it by updating mm_init() to NULL out the -&gt;exe_file, in the same
place it clears other things like the list of mmaps.

This bug was found by syzkaller.  It can be reproduced using the
following C program:

    #define _GNU_SOURCE
    #include &lt;pthread.h&gt;
    #include &lt;stdlib.h&gt;
    #include &lt;sys/mman.h&gt;
    #include &lt;sys/syscall.h&gt;
    #include &lt;sys/wait.h&gt;
    #include &lt;unistd.h&gt;

    static void *mmap_thread(void *_arg)
    {
        for (;;) {
            mmap(NULL, 0x1000000, PROT_READ,
                 MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
        }
    }

    static void *fork_thread(void *_arg)
    {
        usleep(rand() % 10000);
        fork();
    }

    int main(void)
    {
        fork();
        fork();
        fork();
        for (;;) {
            if (fork() == 0) {
                pthread_t t;

                pthread_create(&amp;t, NULL, mmap_thread, NULL);
                pthread_create(&amp;t, NULL, fork_thread, NULL);
                usleep(rand() % 10000);
                syscall(__NR_exit_group, 0);
            }
            wait(NULL);
        }
    }

No special kernel config options are needed.  It usually causes a NULL
pointer dereference in __remove_shared_vm_struct() during exit, or in
dup_mmap() (which is usually inlined into copy_process()) during fork.
Both are due to a vm_area_struct's -&gt;vm_file being used after it's
already been freed.

Google Bug Id: 64772007

Link: http://lkml.kernel.org/r/20170823211408.31198-1-ebiggers3@gmail.com
Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Tested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[v4.7+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: migrate: prevent racy access to tlb_flush_pending</title>
<updated>2017-08-10T22:54:07+00:00</updated>
<author>
<name>Nadav Amit</name>
<email>nadav.amit@gmail.com</email>
</author>
<published>2017-08-10T22:23:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=16af97dc5a8975371a83d9e30a64038b48f40a2d'/>
<id>16af97dc5a8975371a83d9e30a64038b48f40a2d</id>
<content type='text'>
Patch series "fixes of TLB batching races", v6.

It turns out that Linux TLB batching mechanism suffers from various
races.  Races that are caused due to batching during reclamation were
recently handled by Mel and this patch-set deals with others.  The more
fundamental issue is that concurrent updates of the page-tables allow
for TLB flushes to be batched on one core, while another core changes
the page-tables.  This other core may assume a PTE change does not
require a flush based on the updated PTE value, while it is unaware that
TLB flushes are still pending.

This behavior affects KSM (which may result in memory corruption) and
MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior).  A
proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED.
Memory corruption in KSM is harder to produce in practice, but was
observed by hacking the kernel and adding a delay before flushing and
replacing the KSM page.

Finally, there is also one memory barrier missing, which may affect
architectures with weak memory model.

This patch (of 7):

Setting and clearing mm-&gt;tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work().  If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.

This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending.  The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.

An actual data corruption was not observed, yet the race was was
confirmed by adding assertion to check tlb_flush_pending is not set by
two threads, adding artificial latency in change_protection_range() and
using sysctl to reduce kernel.numa_balancing_scan_delay_ms.

Link: http://lkml.kernel.org/r/20170802000818.4760-2-namit@vmware.com
Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
change_protection_range")
Signed-off-by: Nadav Amit &lt;namit@vmware.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "fixes of TLB batching races", v6.

It turns out that Linux TLB batching mechanism suffers from various
races.  Races that are caused due to batching during reclamation were
recently handled by Mel and this patch-set deals with others.  The more
fundamental issue is that concurrent updates of the page-tables allow
for TLB flushes to be batched on one core, while another core changes
the page-tables.  This other core may assume a PTE change does not
require a flush based on the updated PTE value, while it is unaware that
TLB flushes are still pending.

This behavior affects KSM (which may result in memory corruption) and
MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior).  A
proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED.
Memory corruption in KSM is harder to produce in practice, but was
observed by hacking the kernel and adding a delay before flushing and
replacing the KSM page.

Finally, there is also one memory barrier missing, which may affect
architectures with weak memory model.

This patch (of 7):

Setting and clearing mm-&gt;tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work().  If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.

This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending.  The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.

An actual data corruption was not observed, yet the race was was
confirmed by adding assertion to check tlb_flush_pending is not set by
two threads, adding artificial latency in change_protection_range() and
using sysctl to reduce kernel.numa_balancing_scan_delay_ms.

Link: http://lkml.kernel.org/r/20170802000818.4760-2-namit@vmware.com
Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
change_protection_range")
Signed-off-by: Nadav Amit &lt;namit@vmware.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fork,random: use get_random_canary() to set tsk-&gt;stack_canary</title>
<updated>2017-07-12T23:26:03+00:00</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2017-07-12T21:36:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=7cd815bce828220deffd1654265f0ef891567774'/>
<id>7cd815bce828220deffd1654265f0ef891567774</id>
<content type='text'>
Use the ascii-armor canary to prevent unterminated C string overflows
from being able to successfully overwrite the canary, even if they
somehow obtain the canary value.

Inspired by execshield ascii-armor and Daniel Micay's linux-hardened
tree.

Link: http://lkml.kernel.org/r/20170524155751.424-3-riel@redhat.com
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Daniel Micay &lt;danielmicay@gmail.com&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use the ascii-armor canary to prevent unterminated C string overflows
from being able to successfully overwrite the canary, even if they
somehow obtain the canary value.

Inspired by execshield ascii-armor and Daniel Micay's linux-hardened
tree.

Link: http://lkml.kernel.org/r/20170524155751.424-3-riel@redhat.com
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Daniel Micay &lt;danielmicay@gmail.com&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fault-inject: support systematic fault injection</title>
<updated>2017-07-12T23:26:01+00:00</updated>
<author>
<name>Dmitry Vyukov</name>
<email>dvyukov@google.com</email>
</author>
<published>2017-07-12T21:34:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=e41d58185f1444368873d4d7422f7664a68be61d'/>
<id>e41d58185f1444368873d4d7422f7664a68be61d</id>
<content type='text'>
Add /proc/self/task/&lt;current-tid&gt;/fail-nth file that allows failing
0-th, 1-st, 2-nd and so on calls systematically.
Excerpt from the added documentation:

 "Write to this file of integer N makes N-th call in the current task
  fail (N is 0-based). Read from this file returns a single char 'Y' or
  'N' that says if the fault setup with a previous write to this file
  was injected or not, and disables the fault if it wasn't yet injected.
  Note that this file enables all types of faults (slab, futex, etc).
  This setting takes precedence over all other generic settings like
  probability, interval, times, etc. But per-capability settings (e.g.
  fail_futex/ignore-private) take precedence over it. This feature is
  intended for systematic testing of faults in a single system call. See
  an example below"

Why add a new setting:
1. Existing settings are global rather than per-task.
   So parallel testing is not possible.
2. attr-&gt;interval is close but it depends on attr-&gt;count
   which is non reset to 0, so interval does not work as expected.
3. Trying to model this with existing settings requires manipulations
   of all of probability, interval, times, space, task-filter and
   unexposed count and per-task make-it-fail files.
4. Existing settings are per-failure-type, and the set of failure
   types is potentially expanding.
5. make-it-fail can't be changed by unprivileged user and aggressive
   stress testing better be done from an unprivileged user.
   Similarly, this would require opening the debugfs files to the
   unprivileged user, as he would need to reopen at least times file
   (not possible to pre-open before dropping privs).

The proposed interface solves all of the above (see the example).

We want to integrate this into syzkaller fuzzer.  A prototype has found
10 bugs in kernel in first day of usage:

  https://groups.google.com/forum/#!searchin/syzkaller/%22FAULT_INJECTION%22%7Csort:relevance

I've made the current interface work with all types of our sandboxes.
For setuid the secret sauce was prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) to
make /proc entries non-root owned.  So I am fine with the current
version of the code.

[akpm@linux-foundation.org: fix build]
Link: http://lkml.kernel.org/r/20170328130128.101773-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add /proc/self/task/&lt;current-tid&gt;/fail-nth file that allows failing
0-th, 1-st, 2-nd and so on calls systematically.
Excerpt from the added documentation:

 "Write to this file of integer N makes N-th call in the current task
  fail (N is 0-based). Read from this file returns a single char 'Y' or
  'N' that says if the fault setup with a previous write to this file
  was injected or not, and disables the fault if it wasn't yet injected.
  Note that this file enables all types of faults (slab, futex, etc).
  This setting takes precedence over all other generic settings like
  probability, interval, times, etc. But per-capability settings (e.g.
  fail_futex/ignore-private) take precedence over it. This feature is
  intended for systematic testing of faults in a single system call. See
  an example below"

Why add a new setting:
1. Existing settings are global rather than per-task.
   So parallel testing is not possible.
2. attr-&gt;interval is close but it depends on attr-&gt;count
   which is non reset to 0, so interval does not work as expected.
3. Trying to model this with existing settings requires manipulations
   of all of probability, interval, times, space, task-filter and
   unexposed count and per-task make-it-fail files.
4. Existing settings are per-failure-type, and the set of failure
   types is potentially expanding.
5. make-it-fail can't be changed by unprivileged user and aggressive
   stress testing better be done from an unprivileged user.
   Similarly, this would require opening the debugfs files to the
   unprivileged user, as he would need to reopen at least times file
   (not possible to pre-open before dropping privs).

The proposed interface solves all of the above (see the example).

We want to integrate this into syzkaller fuzzer.  A prototype has found
10 bugs in kernel in first day of usage:

  https://groups.google.com/forum/#!searchin/syzkaller/%22FAULT_INJECTION%22%7Csort:relevance

I've made the current interface work with all types of our sandboxes.
For setuid the secret sauce was prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) to
make /proc entries non-root owned.  So I am fine with the current
version of the code.

[akpm@linux-foundation.org: fix build]
Link: http://lkml.kernel.org/r/20170328130128.101773-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel/fork.c: virtually mapped stacks: do not disable interrupts</title>
<updated>2017-07-12T23:25:59+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>cl@linux.com</email>
</author>
<published>2017-07-12T21:33:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=112166f88cf83dd11486cf1818672d42b540865b'/>
<id>112166f88cf83dd11486cf1818672d42b540865b</id>
<content type='text'>
The reason to disable interrupts seems to be to avoid switching to a
different processor while handling per cpu data using individual loads and
stores.  If we use per cpu RMV primitives we will not have to disable
interrupts.

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705171055130.5898@east.gentwo.org
Signed-off-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The reason to disable interrupts seems to be to avoid switching to a
different processor while handling per cpu data using individual loads and
stores.  If we use per cpu RMV primitives we will not have to disable
interrupts.

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705171055130.5898@east.gentwo.org
Signed-off-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2017-07-09T17:52:16+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-07-09T17:52:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=4fde846ac0f019b7c877da35e1c1517d79e17ffc'/>
<id>4fde846ac0f019b7c877da35e1c1517d79e17ffc</id>
<content type='text'>
Pull scheduler fixes from Thomas Gleixner:
 "This scheduler update provides:

   - The (hopefully) final fix for the vtime accounting issues which
     were around for quite some time

   - Use types known to user space in UAPI headers to unbreak user space
     builds

   - Make load balancing respect the current scheduling domain again
     instead of evaluating unrelated CPUs"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/headers/uapi: Fix linux/sched/types.h userspace compilation errors
  sched/fair: Fix load_balance() affinity redo path
  sched/cputime: Accumulate vtime on top of nsec clocksource
  sched/cputime: Move the vtime task fields to their own struct
  sched/cputime: Rename vtime fields
  sched/cputime: Always set tsk-&gt;vtime_snap_whence after accounting vtime
  vtime, sched/cputime: Remove vtime_account_user()
  Revert "sched/cputime: Refactor the cputime_adjust() code"
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull scheduler fixes from Thomas Gleixner:
 "This scheduler update provides:

   - The (hopefully) final fix for the vtime accounting issues which
     were around for quite some time

   - Use types known to user space in UAPI headers to unbreak user space
     builds

   - Make load balancing respect the current scheduling domain again
     instead of evaluating unrelated CPUs"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/headers/uapi: Fix linux/sched/types.h userspace compilation errors
  sched/fair: Fix load_balance() affinity redo path
  sched/cputime: Accumulate vtime on top of nsec clocksource
  sched/cputime: Move the vtime task fields to their own struct
  sched/cputime: Rename vtime fields
  sched/cputime: Always set tsk-&gt;vtime_snap_whence after accounting vtime
  vtime, sched/cputime: Remove vtime_account_user()
  Revert "sched/cputime: Refactor the cputime_adjust() code"
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memcontrol: use generic mod_memcg_page_state for kmem pages</title>
<updated>2017-07-06T23:24:35+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2017-07-06T22:40:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ed52be7bfd45533b194b429f43361493d24599a7'/>
<id>ed52be7bfd45533b194b429f43361493d24599a7</id>
<content type='text'>
The kmem-specific functions do the same thing.  Switch and drop.

Link: http://lkml.kernel.org/r/20170530181724.27197-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The kmem-specific functions do the same thing.  Switch and drop.

Link: http://lkml.kernel.org/r/20170530181724.27197-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/cputime: Move the vtime task fields to their own struct</title>
<updated>2017-07-05T07:54:15+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2017-06-29T17:15:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=bac5b6b6b11560f323e71d0ebac4061cfe5f56c0'/>
<id>bac5b6b6b11560f323e71d0ebac4061cfe5f56c0</id>
<content type='text'>
We are about to add vtime accumulation fields to the task struct. Let's
avoid more bloatification and gather vtime information to their own
struct.

Tested-by: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1498756511-11714-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We are about to add vtime accumulation fields to the task struct. Let's
avoid more bloatification and gather vtime information to their own
struct.

Tested-by: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1498756511-11714-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/cputime: Rename vtime fields</title>
<updated>2017-07-05T07:54:14+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2017-06-29T17:15:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=60a9ce57e7c5ac1df3a39fb941022bbfa40c0862'/>
<id>60a9ce57e7c5ac1df3a39fb941022bbfa40c0862</id>
<content type='text'>
The current "snapshot" based naming on vtime fields suggests we record
some past event but that's a low level picture of their actual purpose
which comes out blurry. The real point of these fields is to run a basic
state machine that tracks down cputime entry while switching between
contexts.

So lets reflect that with more meaningful names.

Tested-by: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1498756511-11714-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current "snapshot" based naming on vtime fields suggests we record
some past event but that's a low level picture of their actual purpose
which comes out blurry. The real point of these fields is to run a basic
state machine that tracks down cputime entry while switching between
contexts.

So lets reflect that with more meaningful names.

Tested-by: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1498756511-11714-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
