<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/slab.c, branch v4.9.4</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>mm/slab: improve performance of gathering slabinfo stats</title>
<updated>2016-10-28T01:43:43+00:00</updated>
<author>
<name>Aruna Ramakrishna</name>
<email>aruna.ramakrishna@oracle.com</email>
</author>
<published>2016-10-28T00:46:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=07a63c41fa1f6533f5668e5b33a295bfd63aa534'/>
<id>07a63c41fa1f6533f5668e5b33a295bfd63aa534</id>
<content type='text'>
On large systems, when some slab caches grow to millions of objects (and
many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2
seconds.  During this time, interrupts are disabled while walking the
slab lists (slabs_full, slabs_partial, and slabs_free) for each node,
and this sometimes causes timeouts in other drivers (for instance,
Infiniband).

This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for
total number of allocated slabs per node, per cache.  This counter is
updated when a slab is created or destroyed.  This enables us to skip
traversing the slabs_full list while gathering slabinfo statistics, and
since slabs_full tends to be the biggest list when the cache is large,
it results in a dramatic performance improvement.  Getting slabinfo
statistics now only requires walking the slabs_free and slabs_partial
lists, and those lists are usually much smaller than slabs_full.

We tested this after growing the dentry cache to 70GB, and the
performance improved from 2s to 5ms.

Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.com
Signed-off-by: Aruna Ramakrishna &lt;aruna.ramakrishna@oracle.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On large systems, when some slab caches grow to millions of objects (and
many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2
seconds.  During this time, interrupts are disabled while walking the
slab lists (slabs_full, slabs_partial, and slabs_free) for each node,
and this sometimes causes timeouts in other drivers (for instance,
Infiniband).

This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for
total number of allocated slabs per node, per cache.  This counter is
updated when a slab is created or destroyed.  This enables us to skip
traversing the slabs_full list while gathering slabinfo statistics, and
since slabs_full tends to be the biggest list when the cache is large,
it results in a dramatic performance improvement.  Getting slabinfo
statistics now only requires walking the slabs_free and slabs_partial
lists, and those lists are usually much smaller than slabs_full.

We tested this after growing the dentry cache to 70GB, and the
performance improved from 2s to 5ms.

Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.com
Signed-off-by: Aruna Ramakrishna &lt;aruna.ramakrishna@oracle.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/slab: fix kmemcg cache creation delayed issue</title>
<updated>2016-10-28T01:43:42+00:00</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2016-10-28T00:46:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6'/>
<id>86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6</id>
<content type='text'>
There is a bug report that SLAB makes extreme load average due to over
2000 kworker thread.

  https://bugzilla.kernel.org/show_bug.cgi?id=172981

This issue is caused by kmemcg feature that try to create new set of
kmem_caches for each memcg.  Recently, kmem_cache creation is slowed by
synchronize_sched() and futher kmem_cache creation is also delayed since
kmem_cache creation is synchronized by a global slab_mutex lock.  So,
the number of kworker that try to create kmem_cache increases quietly.

synchronize_sched() is for lockless access to node's shared array but
it's not needed when a new kmem_cache is created.  So, this patch rules
out that case.

Fixes: 801faf0db894 ("mm/slab: lockless decision to grow cache")
Link: http://lkml.kernel.org/r/1475734855-4837-1-git-send-email-iamjoonsoo.kim@lge.com
Reported-by: Doug Smythies &lt;dsmythies@telus.net&gt;
Tested-by: Doug Smythies &lt;dsmythies@telus.net&gt;
Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a bug report that SLAB makes extreme load average due to over
2000 kworker thread.

  https://bugzilla.kernel.org/show_bug.cgi?id=172981

This issue is caused by kmemcg feature that try to create new set of
kmem_caches for each memcg.  Recently, kmem_cache creation is slowed by
synchronize_sched() and futher kmem_cache creation is also delayed since
kmem_cache creation is synchronized by a global slab_mutex lock.  So,
the number of kworker that try to create kmem_cache increases quietly.

synchronize_sched() is for lockless access to node's shared array but
it's not needed when a new kmem_cache is created.  So, this patch rules
out that case.

Fixes: 801faf0db894 ("mm/slab: lockless decision to grow cache")
Link: http://lkml.kernel.org/r/1475734855-4837-1-git-send-email-iamjoonsoo.kim@lge.com
Reported-by: Doug Smythies &lt;dsmythies@telus.net&gt;
Tested-by: Doug Smythies &lt;dsmythies@telus.net&gt;
Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>slab: Convert to hotplug state machine</title>
<updated>2016-09-06T16:30:20+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2016-08-23T12:53:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=6731d4f12315aed5f7eefc52dac30428e382d7d0'/>
<id>6731d4f12315aed5f7eefc52dac30428e382d7d0</id>
<content type='text'>
Install the callbacks via the state machine.

Signed-off-by: Richard Weinberger &lt;richard@nod.at&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: linux-mm@kvack.org
Cc: rt@linutronix.de
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Link: http://lkml.kernel.org/r/20160823125319.abeapfjapf2kfezp@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Install the callbacks via the state machine.

Signed-off-by: Richard Weinberger &lt;richard@nod.at&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: linux-mm@kvack.org
Cc: rt@linutronix.de
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Link: http://lkml.kernel.org/r/20160823125319.abeapfjapf2kfezp@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux</title>
<updated>2016-08-08T21:48:14+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-08-08T21:48:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=1eccfa090eaea22558570054bbdc147817e1df5e'/>
<id>1eccfa090eaea22558570054bbdc147817e1df5e</id>
<content type='text'>
Pull usercopy protection from Kees Cook:
 "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
  copy_from_user bounds checking for most architectures on SLAB and
  SLUB"

* tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  mm: SLUB hardened usercopy support
  mm: SLAB hardened usercopy support
  s390/uaccess: Enable hardened usercopy
  sparc/uaccess: Enable hardened usercopy
  powerpc/uaccess: Enable hardened usercopy
  ia64/uaccess: Enable hardened usercopy
  arm64/uaccess: Enable hardened usercopy
  ARM: uaccess: Enable hardened usercopy
  x86/uaccess: Enable hardened usercopy
  mm: Hardened usercopy
  mm: Implement stack frame object validation
  mm: Add is_migrate_cma_page
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull usercopy protection from Kees Cook:
 "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
  copy_from_user bounds checking for most architectures on SLAB and
  SLUB"

* tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  mm: SLUB hardened usercopy support
  mm: SLAB hardened usercopy support
  s390/uaccess: Enable hardened usercopy
  sparc/uaccess: Enable hardened usercopy
  powerpc/uaccess: Enable hardened usercopy
  ia64/uaccess: Enable hardened usercopy
  arm64/uaccess: Enable hardened usercopy
  ARM: uaccess: Enable hardened usercopy
  x86/uaccess: Enable hardened usercopy
  mm: Hardened usercopy
  mm: Implement stack frame object validation
  mm: Add is_migrate_cma_page
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: replace obsolete _refok by __ref</title>
<updated>2016-08-02T21:31:41+00:00</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2016-08-02T21:03:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=bd721ea73e1f965569b40620538c942001f76294'/>
<id>bd721ea73e1f965569b40620538c942001f76294</id>
<content type='text'>
There was only one use of __initdata_refok and __exit_refok

__init_refok was used 46 times against 82 for __ref.

Those definitions are obsolete since commit 312b1485fb50 ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")

This patch removes the following compatibility definitions and replaces
them treewide.

/* compatibility defines */
#define __init_refok     __ref
#define __initdata_refok __refdata
#define __exit_refok     __ref

I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Sam Ravnborg &lt;sam@ravnborg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There was only one use of __initdata_refok and __exit_refok

__init_refok was used 46 times against 82 for __ref.

Those definitions are obsolete since commit 312b1485fb50 ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")

This patch removes the following compatibility definitions and replaces
them treewide.

/* compatibility defines */
#define __init_refok     __ref
#define __initdata_refok __refdata
#define __exit_refok     __ref

I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Sam Ravnborg &lt;sam@ravnborg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/kasan: get rid of -&gt;state in struct kasan_alloc_meta</title>
<updated>2016-08-02T21:31:41+00:00</updated>
<author>
<name>Andrey Ryabinin</name>
<email>aryabinin@virtuozzo.com</email>
</author>
<published>2016-08-02T21:02:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=b3cbd9bf77cd1888114dbee1653e79aa23fd4068'/>
<id>b3cbd9bf77cd1888114dbee1653e79aa23fd4068</id>
<content type='text'>
The state of object currently tracked in two places - shadow memory, and
the -&gt;state field in struct kasan_alloc_meta.  We can get rid of the
latter.  The will save us a little bit of memory.  Also, this allow us
to move free stack into struct kasan_alloc_meta, without increasing
memory consumption.  So now we should always know when the last time the
object was freed.  This may be useful for long delayed use-after-free
bugs.

As a side effect this fixes following UBSAN warning:
	UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13
	member access within misaligned address ffff88000d1efebc for type 'struct qlist_node'
	which requires 8 byte alignment

Link: http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabinin@virtuozzo.com
Reported-by: kernel test robot &lt;xiaolong.ye@intel.com&gt;
Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The state of object currently tracked in two places - shadow memory, and
the -&gt;state field in struct kasan_alloc_meta.  We can get rid of the
latter.  The will save us a little bit of memory.  Also, this allow us
to move free stack into struct kasan_alloc_meta, without increasing
memory consumption.  So now we should always know when the last time the
object was freed.  This may be useful for long delayed use-after-free
bugs.

As a side effect this fixes following UBSAN warning:
	UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13
	member access within misaligned address ffff88000d1efebc for type 'struct qlist_node'
	which requires 8 byte alignment

Link: http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabinin@virtuozzo.com
Reported-by: kernel test robot &lt;xiaolong.ye@intel.com&gt;
Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/slab: use list_move instead of list_del/list_add</title>
<updated>2016-07-26T23:19:19+00:00</updated>
<author>
<name>Wei Yongjun</name>
<email>yongjun_wei@trendmicro.com.cn</email>
</author>
<published>2016-07-26T22:22:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=de24baecd7628aa19e8b53530bb33f8ffbaf5220'/>
<id>de24baecd7628aa19e8b53530bb33f8ffbaf5220</id>
<content type='text'>
Using list_move() instead of list_del() + list_add() to avoid needlessly
poisoning the next and prev values.

Link: http://lkml.kernel.org/r/1468929772-9174-1-git-send-email-weiyj_lk@163.com
Signed-off-by: Wei Yongjun &lt;yongjun_wei@trendmicro.com.cn&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using list_move() instead of list_del() + list_add() to avoid needlessly
poisoning the next and prev values.

Link: http://lkml.kernel.org/r/1468929772-9174-1-git-send-email-weiyj_lk@163.com
Signed-off-by: Wei Yongjun &lt;yongjun_wei@trendmicro.com.cn&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>slab: do not panic on invalid gfp_mask</title>
<updated>2016-07-26T23:19:19+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-07-26T22:22:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=72baeef0c2710a9ac99670e59d4865b24ffd2d18'/>
<id>72baeef0c2710a9ac99670e59d4865b24ffd2d18</id>
<content type='text'>
Both SLAB and SLUB BUG() when a caller provides an invalid gfp_mask.
This is a rather harsh way to announce a non-critical issue.  Allocator
is free to ignore invalid flags.  Let's simply replace BUG() by
dump_stack to tell the offender and fixup the mask to move on with the
allocation request.

This is an example for kmalloc(GFP_KERNEL|__GFP_HIGHMEM) from a test
module:

  Unexpected gfp: 0x2 (__GFP_HIGHMEM). Fixing up to gfp: 0x24000c0 (GFP_KERNEL). Fix your code!
  CPU: 0 PID: 2916 Comm: insmod Tainted: G           O    4.6.0-slabgfp2-00002-g4cdfc2ef4892-dirty #936
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
  Call Trace:
    dump_stack+0x67/0x90
    cache_alloc_refill+0x201/0x617
    kmem_cache_alloc_trace+0xa7/0x24a
    ? 0xffffffffa0005000
    mymodule_init+0x20/0x1000 [test_slab]
    do_one_initcall+0xe7/0x16c
    ? rcu_read_lock_sched_held+0x61/0x69
    ? kmem_cache_alloc_trace+0x197/0x24a
    do_init_module+0x5f/0x1d9
    load_module+0x1a3d/0x1f21
    ? retint_kernel+0x2d/0x2d
    SyS_init_module+0xe8/0x10e
    ? SyS_init_module+0xe8/0x10e
    do_syscall_64+0x68/0x13f
    entry_SYSCALL64_slow_path+0x25/0x25

Link: http://lkml.kernel.org/r/1465548200-11384-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky.work@gmail.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Both SLAB and SLUB BUG() when a caller provides an invalid gfp_mask.
This is a rather harsh way to announce a non-critical issue.  Allocator
is free to ignore invalid flags.  Let's simply replace BUG() by
dump_stack to tell the offender and fixup the mask to move on with the
allocation request.

This is an example for kmalloc(GFP_KERNEL|__GFP_HIGHMEM) from a test
module:

  Unexpected gfp: 0x2 (__GFP_HIGHMEM). Fixing up to gfp: 0x24000c0 (GFP_KERNEL). Fix your code!
  CPU: 0 PID: 2916 Comm: insmod Tainted: G           O    4.6.0-slabgfp2-00002-g4cdfc2ef4892-dirty #936
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
  Call Trace:
    dump_stack+0x67/0x90
    cache_alloc_refill+0x201/0x617
    kmem_cache_alloc_trace+0xa7/0x24a
    ? 0xffffffffa0005000
    mymodule_init+0x20/0x1000 [test_slab]
    do_one_initcall+0xe7/0x16c
    ? rcu_read_lock_sched_held+0x61/0x69
    ? kmem_cache_alloc_trace+0x197/0x24a
    do_init_module+0x5f/0x1d9
    load_module+0x1a3d/0x1f21
    ? retint_kernel+0x2d/0x2d
    SyS_init_module+0xe8/0x10e
    ? SyS_init_module+0xe8/0x10e
    do_syscall_64+0x68/0x13f
    entry_SYSCALL64_slow_path+0x25/0x25

Link: http://lkml.kernel.org/r/1465548200-11384-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky.work@gmail.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>slab: make GFP_SLAB_BUG_MASK information more human readable</title>
<updated>2016-07-26T23:19:19+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-07-26T22:22:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=bacdcb346093794f292c2c9c67ae350895e8b7ef'/>
<id>bacdcb346093794f292c2c9c67ae350895e8b7ef</id>
<content type='text'>
printk offers %pGg for quite some time so let's use it to get a human
readable list of invalid flags.

The original output would be
  [  429.191962] gfp: 2

after the change
  [  429.191962] Unexpected gfp: 0x2 (__GFP_HIGHMEM)

Link: http://lkml.kernel.org/r/1465548200-11384-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky.work@gmail.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
printk offers %pGg for quite some time so let's use it to get a human
readable list of invalid flags.

The original output would be
  [  429.191962] gfp: 2

after the change
  [  429.191962] Unexpected gfp: 0x2 (__GFP_HIGHMEM)

Link: http://lkml.kernel.org/r/1465548200-11384-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky.work@gmail.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: reorganize SLAB freelist randomization</title>
<updated>2016-07-26T23:19:19+00:00</updated>
<author>
<name>Thomas Garnier</name>
<email>thgarnie@google.com</email>
</author>
<published>2016-07-26T22:21:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=7c00fce98c3e15334a603925b41aa49f76e83227'/>
<id>7c00fce98c3e15334a603925b41aa49f76e83227</id>
<content type='text'>
The kernel heap allocators are using a sequential freelist making their
allocation predictable.  This predictability makes kernel heap overflow
easier to exploit.  An attacker can careful prepare the kernel heap to
control the following chunk overflowed.

For example these attacks exploit the predictability of the heap:
 - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
 - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

***Problems that needed solving:
 - Randomize the Freelist (singled linked) used in the SLUB allocator.
 - Ensure good performance to encourage usage.
 - Get best entropy in early boot stage.

***Parts:
 - 01/02 Reorganize the SLAB Freelist randomization to share elements
   with the SLUB implementation.
 - 02/02 The SLUB Freelist randomization implementation. Similar approach
   than the SLAB but tailored to the singled freelist used in SLUB.

***Performance data:

slab_test impact is between 3% to 4% on average for 100000 attempts
without smp.  It is a very focused testing, kernbench show the overall
impact on the system is way lower.

Before:

  Single thread testing
  =====================
  1. Kmalloc: Repeatedly allocate then free test
  100000 times kmalloc(8) -&gt; 49 cycles kfree -&gt; 77 cycles
  100000 times kmalloc(16) -&gt; 51 cycles kfree -&gt; 79 cycles
  100000 times kmalloc(32) -&gt; 53 cycles kfree -&gt; 83 cycles
  100000 times kmalloc(64) -&gt; 62 cycles kfree -&gt; 90 cycles
  100000 times kmalloc(128) -&gt; 81 cycles kfree -&gt; 97 cycles
  100000 times kmalloc(256) -&gt; 98 cycles kfree -&gt; 121 cycles
  100000 times kmalloc(512) -&gt; 95 cycles kfree -&gt; 122 cycles
  100000 times kmalloc(1024) -&gt; 96 cycles kfree -&gt; 126 cycles
  100000 times kmalloc(2048) -&gt; 115 cycles kfree -&gt; 140 cycles
  100000 times kmalloc(4096) -&gt; 149 cycles kfree -&gt; 171 cycles
  2. Kmalloc: alloc/free test
  100000 times kmalloc(8)/kfree -&gt; 70 cycles
  100000 times kmalloc(16)/kfree -&gt; 70 cycles
  100000 times kmalloc(32)/kfree -&gt; 70 cycles
  100000 times kmalloc(64)/kfree -&gt; 70 cycles
  100000 times kmalloc(128)/kfree -&gt; 70 cycles
  100000 times kmalloc(256)/kfree -&gt; 69 cycles
  100000 times kmalloc(512)/kfree -&gt; 70 cycles
  100000 times kmalloc(1024)/kfree -&gt; 73 cycles
  100000 times kmalloc(2048)/kfree -&gt; 72 cycles
  100000 times kmalloc(4096)/kfree -&gt; 71 cycles

After:

  Single thread testing
  =====================
  1. Kmalloc: Repeatedly allocate then free test
  100000 times kmalloc(8) -&gt; 57 cycles kfree -&gt; 78 cycles
  100000 times kmalloc(16) -&gt; 61 cycles kfree -&gt; 81 cycles
  100000 times kmalloc(32) -&gt; 76 cycles kfree -&gt; 93 cycles
  100000 times kmalloc(64) -&gt; 83 cycles kfree -&gt; 94 cycles
  100000 times kmalloc(128) -&gt; 106 cycles kfree -&gt; 107 cycles
  100000 times kmalloc(256) -&gt; 118 cycles kfree -&gt; 117 cycles
  100000 times kmalloc(512) -&gt; 114 cycles kfree -&gt; 116 cycles
  100000 times kmalloc(1024) -&gt; 115 cycles kfree -&gt; 118 cycles
  100000 times kmalloc(2048) -&gt; 147 cycles kfree -&gt; 131 cycles
  100000 times kmalloc(4096) -&gt; 214 cycles kfree -&gt; 161 cycles
  2. Kmalloc: alloc/free test
  100000 times kmalloc(8)/kfree -&gt; 66 cycles
  100000 times kmalloc(16)/kfree -&gt; 66 cycles
  100000 times kmalloc(32)/kfree -&gt; 66 cycles
  100000 times kmalloc(64)/kfree -&gt; 66 cycles
  100000 times kmalloc(128)/kfree -&gt; 65 cycles
  100000 times kmalloc(256)/kfree -&gt; 67 cycles
  100000 times kmalloc(512)/kfree -&gt; 67 cycles
  100000 times kmalloc(1024)/kfree -&gt; 64 cycles
  100000 times kmalloc(2048)/kfree -&gt; 67 cycles
  100000 times kmalloc(4096)/kfree -&gt; 67 cycles

Kernbench, before:

  Average Optimal load -j 12 Run (std deviation):
  Elapsed Time 101.873 (1.16069)
  User Time 1045.22 (1.60447)
  System Time 88.969 (0.559195)
  Percent CPU 1112.9 (13.8279)
  Context Switches 189140 (2282.15)
  Sleeps 99008.6 (768.091)

After:

  Average Optimal load -j 12 Run (std deviation):
  Elapsed Time 102.47 (0.562732)
  User Time 1045.3 (1.34263)
  System Time 88.311 (0.342554)
  Percent CPU 1105.8 (6.49444)
  Context Switches 189081 (2355.78)
  Sleeps 99231.5 (800.358)

This patch (of 2):

This commit reorganizes the previous SLAB freelist randomization to
prepare for the SLUB implementation.  It moves functions that will be
shared to slab_common.

The entropy functions are changed to align with the SLUB implementation,
now using get_random_(int|long) functions.  These functions were chosen
because they provide a bit more entropy early on boot and better
performance when specific arch instructions are not available.

[akpm@linux-foundation.org: fix build]
Link: http://lkml.kernel.org/r/1464295031-26375-2-git-send-email-thgarnie@google.com
Signed-off-by: Thomas Garnier &lt;thgarnie@google.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The kernel heap allocators are using a sequential freelist making their
allocation predictable.  This predictability makes kernel heap overflow
easier to exploit.  An attacker can careful prepare the kernel heap to
control the following chunk overflowed.

For example these attacks exploit the predictability of the heap:
 - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
 - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

***Problems that needed solving:
 - Randomize the Freelist (singled linked) used in the SLUB allocator.
 - Ensure good performance to encourage usage.
 - Get best entropy in early boot stage.

***Parts:
 - 01/02 Reorganize the SLAB Freelist randomization to share elements
   with the SLUB implementation.
 - 02/02 The SLUB Freelist randomization implementation. Similar approach
   than the SLAB but tailored to the singled freelist used in SLUB.

***Performance data:

slab_test impact is between 3% to 4% on average for 100000 attempts
without smp.  It is a very focused testing, kernbench show the overall
impact on the system is way lower.

Before:

  Single thread testing
  =====================
  1. Kmalloc: Repeatedly allocate then free test
  100000 times kmalloc(8) -&gt; 49 cycles kfree -&gt; 77 cycles
  100000 times kmalloc(16) -&gt; 51 cycles kfree -&gt; 79 cycles
  100000 times kmalloc(32) -&gt; 53 cycles kfree -&gt; 83 cycles
  100000 times kmalloc(64) -&gt; 62 cycles kfree -&gt; 90 cycles
  100000 times kmalloc(128) -&gt; 81 cycles kfree -&gt; 97 cycles
  100000 times kmalloc(256) -&gt; 98 cycles kfree -&gt; 121 cycles
  100000 times kmalloc(512) -&gt; 95 cycles kfree -&gt; 122 cycles
  100000 times kmalloc(1024) -&gt; 96 cycles kfree -&gt; 126 cycles
  100000 times kmalloc(2048) -&gt; 115 cycles kfree -&gt; 140 cycles
  100000 times kmalloc(4096) -&gt; 149 cycles kfree -&gt; 171 cycles
  2. Kmalloc: alloc/free test
  100000 times kmalloc(8)/kfree -&gt; 70 cycles
  100000 times kmalloc(16)/kfree -&gt; 70 cycles
  100000 times kmalloc(32)/kfree -&gt; 70 cycles
  100000 times kmalloc(64)/kfree -&gt; 70 cycles
  100000 times kmalloc(128)/kfree -&gt; 70 cycles
  100000 times kmalloc(256)/kfree -&gt; 69 cycles
  100000 times kmalloc(512)/kfree -&gt; 70 cycles
  100000 times kmalloc(1024)/kfree -&gt; 73 cycles
  100000 times kmalloc(2048)/kfree -&gt; 72 cycles
  100000 times kmalloc(4096)/kfree -&gt; 71 cycles

After:

  Single thread testing
  =====================
  1. Kmalloc: Repeatedly allocate then free test
  100000 times kmalloc(8) -&gt; 57 cycles kfree -&gt; 78 cycles
  100000 times kmalloc(16) -&gt; 61 cycles kfree -&gt; 81 cycles
  100000 times kmalloc(32) -&gt; 76 cycles kfree -&gt; 93 cycles
  100000 times kmalloc(64) -&gt; 83 cycles kfree -&gt; 94 cycles
  100000 times kmalloc(128) -&gt; 106 cycles kfree -&gt; 107 cycles
  100000 times kmalloc(256) -&gt; 118 cycles kfree -&gt; 117 cycles
  100000 times kmalloc(512) -&gt; 114 cycles kfree -&gt; 116 cycles
  100000 times kmalloc(1024) -&gt; 115 cycles kfree -&gt; 118 cycles
  100000 times kmalloc(2048) -&gt; 147 cycles kfree -&gt; 131 cycles
  100000 times kmalloc(4096) -&gt; 214 cycles kfree -&gt; 161 cycles
  2. Kmalloc: alloc/free test
  100000 times kmalloc(8)/kfree -&gt; 66 cycles
  100000 times kmalloc(16)/kfree -&gt; 66 cycles
  100000 times kmalloc(32)/kfree -&gt; 66 cycles
  100000 times kmalloc(64)/kfree -&gt; 66 cycles
  100000 times kmalloc(128)/kfree -&gt; 65 cycles
  100000 times kmalloc(256)/kfree -&gt; 67 cycles
  100000 times kmalloc(512)/kfree -&gt; 67 cycles
  100000 times kmalloc(1024)/kfree -&gt; 64 cycles
  100000 times kmalloc(2048)/kfree -&gt; 67 cycles
  100000 times kmalloc(4096)/kfree -&gt; 67 cycles

Kernbench, before:

  Average Optimal load -j 12 Run (std deviation):
  Elapsed Time 101.873 (1.16069)
  User Time 1045.22 (1.60447)
  System Time 88.969 (0.559195)
  Percent CPU 1112.9 (13.8279)
  Context Switches 189140 (2282.15)
  Sleeps 99008.6 (768.091)

After:

  Average Optimal load -j 12 Run (std deviation):
  Elapsed Time 102.47 (0.562732)
  User Time 1045.3 (1.34263)
  System Time 88.311 (0.342554)
  Percent CPU 1105.8 (6.49444)
  Context Switches 189081 (2355.78)
  Sleeps 99231.5 (800.358)

This patch (of 2):

This commit reorganizes the previous SLAB freelist randomization to
prepare for the SLUB implementation.  It moves functions that will be
shared to slab_common.

The entropy functions are changed to align with the SLUB implementation,
now using get_random_(int|long) functions.  These functions were chosen
because they provide a bit more entropy early on boot and better
performance when specific arch instructions are not available.

[akpm@linux-foundation.org: fix build]
Link: http://lkml.kernel.org/r/1464295031-26375-2-git-send-email-thgarnie@google.com
Signed-off-by: Thomas Garnier &lt;thgarnie@google.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
