<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/rmap.c, branch v2.6.26-rc7</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>mm: remove nopage</title>
<updated>2008-04-28T15:58:18+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-04-28T09:12:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=3c18ddd160d1fcd46d1131d9ad6c594dd8e9af99'/>
<id>3c18ddd160d1fcd46d1131d9ad6c594dd8e9af99</id>
<content type='text'>
Nothing in the tree uses nopage any more.  Remove support for it in the
core mm code and documentation (and a few stray references to it in
comments).

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Nothing in the tree uses nopage any more.  Remove support for it in the
core mm code and documentation (and a few stray references to it in
comments).

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>s390: KVM preparation: host memory management changes for s390 kvm</title>
<updated>2008-04-27T09:00:40+00:00</updated>
<author>
<name>Christian Borntraeger</name>
<email>borntraeger@de.ibm.com</email>
</author>
<published>2008-03-25T17:47:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=5b7baf05783b1ac97a510243d7e82293416a7cf6'/>
<id>5b7baf05783b1ac97a510243d7e82293416a7cf6</id>
<content type='text'>
This patch changes the s390 memory management defintions to use the pgste field
for dirty and reference bit tracking of host and guest code. Usually on s390,
dirty and referenced are tracked in storage keys, which belong to the physical
page. This changes with virtualization: The guest and host dirty/reference bits
are defined to be the logical OR of the values for the mapping and the physical
page. This patch implements the necessary changes in pgtable.h for s390.

There is a common code change in mm/rmap.c, the call to
page_test_and_clear_young must be moved. This is a no-op for all
architecture but s390. page_referenced checks the referenced bits for
the physiscal page and for all mappings:
o The physical page is checked with page_test_and_clear_young.
o The mappings are checked with ptep_test_and_clear_young and friends.

Without pgstes (the current implementation on Linux s390) the physical page
check is implemented but the mapping callbacks are no-ops because dirty
and referenced are not tracked in the s390 page tables. The pgstes introduces
guest and host dirty and reference bits for s390 in the host mapping. These
mapping must be checked before page_test_and_clear_young resets the reference
bit.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Acked-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Carsten Otte &lt;cotte@de.ibm.com&gt;
Signed-off-by: Avi Kivity &lt;avi@qumranet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch changes the s390 memory management defintions to use the pgste field
for dirty and reference bit tracking of host and guest code. Usually on s390,
dirty and referenced are tracked in storage keys, which belong to the physical
page. This changes with virtualization: The guest and host dirty/reference bits
are defined to be the logical OR of the values for the mapping and the physical
page. This patch implements the necessary changes in pgtable.h for s390.

There is a common code change in mm/rmap.c, the call to
page_test_and_clear_young must be moved. This is a no-op for all
architecture but s390. page_referenced checks the referenced bits for
the physiscal page and for all mappings:
o The physical page is checked with page_test_and_clear_young.
o The mappings are checked with ptep_test_and_clear_young and friends.

Without pgstes (the current implementation on Linux s390) the physical page
check is implemented but the mapping callbacks are no-ops because dirty
and referenced are not tracked in the s390 page tables. The pgstes introduces
guest and host dirty and reference bits for s390 in the host mapping. These
mapping must be checked before page_test_and_clear_young resets the reference
bit.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Acked-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Carsten Otte &lt;cotte@de.ibm.com&gt;
Signed-off-by: Avi Kivity &lt;avi@qumranet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: rmap kernel-doc fixes</title>
<updated>2008-03-20T01:53:35+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>randy.dunlap@oracle.com</email>
</author>
<published>2008-03-20T00:00:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=43d8eac44f28d384d2377dcdd1407f51f79dda55'/>
<id>43d8eac44f28d384d2377dcdd1407f51f79dda55</id>
<content type='text'>
Correct kernel-doc function names and parameters in rmap.c.

Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Correct kernel-doc function names and parameters in rmap.c.

Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memcg: mm_match_cgroup not vm_match_cgroup</title>
<updated>2008-03-05T00:35:14+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hugh@veritas.com</email>
</author>
<published>2008-03-04T22:29:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=bd845e38c7a7251a95a8f2c38aa7fb87140b771d'/>
<id>bd845e38c7a7251a95a8f2c38aa7fb87140b771d</id>
<content type='text'>
vm_match_cgroup is a perverse name for a macro to match mm with cgroup: rename
it mm_match_cgroup, matching mm_init_cgroup and mm_free_cgroup.

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Hirokazu Takahashi &lt;taka@valinux.co.jp&gt;
Cc: YAMAMOTO Takashi &lt;yamamoto@valinux.co.jp&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
vm_match_cgroup is a perverse name for a macro to match mm with cgroup: rename
it mm_match_cgroup, matching mm_init_cgroup and mm_free_cgroup.

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Hirokazu Takahashi &lt;taka@valinux.co.jp&gt;
Cc: YAMAMOTO Takashi &lt;yamamoto@valinux.co.jp&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memcontrol: add vm_match_cgroup()</title>
<updated>2008-02-09T19:08:33+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2008-02-09T08:10:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=60c12b1202a60eabb1c61317e5d2678fcea9893f'/>
<id>60c12b1202a60eabb1c61317e5d2678fcea9893f</id>
<content type='text'>
mm_cgroup() is exclusively used to test whether an mm's mem_cgroup pointer
is pointing to a specific cgroup.  Instead of returning the pointer, we can
just do the test itself in a new macro:

	vm_match_cgroup(mm, cgroup)

returns non-zero if the mm's mem_cgroup points to cgroup.  Otherwise it
returns zero.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Adrian Bunk &lt;bunk@stusta.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
mm_cgroup() is exclusively used to test whether an mm's mem_cgroup pointer
is pointing to a specific cgroup.  Instead of returning the pointer, we can
just do the test itself in a new macro:

	vm_match_cgroup(mm, cgroup)

returns non-zero if the mm's mem_cgroup points to cgroup.  Otherwise it
returns zero.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Adrian Bunk &lt;bunk@stusta.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Memory controller: make page_referenced() cgroup aware</title>
<updated>2008-02-07T16:42:19+00:00</updated>
<author>
<name>Balbir Singh</name>
<email>balbir@linux.vnet.ibm.com</email>
</author>
<published>2008-02-07T08:14:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=bed7161a519a2faef53e1bce1b47595e297c1d14'/>
<id>bed7161a519a2faef53e1bce1b47595e297c1d14</id>
<content type='text'>
Make page_referenced() cgroup aware.  Without this patch, page_referenced()
can cause a page to be skipped while reclaiming pages.  This patch ensures
that other cgroups do not hold pages in a particular cgroup hostage.  It
is required to ensure that shared pages are freed from a cgroup when they
are not actively referenced from the cgroup that brought them in

Signed-off-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelianov &lt;xemul@openvz.org&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Kirill Korotaev &lt;dev@sw.ru&gt;
Cc: Herbert Poetzl &lt;herbert@13thfloor.at&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Vaidyanathan Srinivasan &lt;svaidy@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make page_referenced() cgroup aware.  Without this patch, page_referenced()
can cause a page to be skipped while reclaiming pages.  This patch ensures
that other cgroups do not hold pages in a particular cgroup hostage.  It
is required to ensure that shared pages are freed from a cgroup when they
are not actively referenced from the cgroup that brought them in

Signed-off-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelianov &lt;xemul@openvz.org&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Kirill Korotaev &lt;dev@sw.ru&gt;
Cc: Herbert Poetzl &lt;herbert@13thfloor.at&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Vaidyanathan Srinivasan &lt;svaidy@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Memory controller: memory accounting</title>
<updated>2008-02-07T16:42:18+00:00</updated>
<author>
<name>Balbir Singh</name>
<email>balbir@linux.vnet.ibm.com</email>
</author>
<published>2008-02-07T08:13:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=8a9f3ccd24741b50200c3f33d62534c7271f3dfc'/>
<id>8a9f3ccd24741b50200c3f33d62534c7271f3dfc</id>
<content type='text'>
Add the accounting hooks.  The accounting is carried out for RSS and Page
Cache (unmapped) pages.  There is now a common limit and accounting for both.
The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
time.  Page cache is accounted at add_to_page_cache(),
__delete_from_page_cache().  Swap cache is also accounted for.

Each page's page_cgroup is protected with the last bit of the
page_cgroup pointer, this makes handling of race conditions involving
simultaneous mappings of a page easier.  A reference count is kept in the
page_cgroup to deal with cases where a page might be unmapped from the RSS
of all tasks, but still lives in the page cache.

Credits go to Vaidyanathan Srinivasan for helping with reference counting work
of the page cgroup.  Almost all of the page cache accounting code has help
from Vaidyanathan Srinivasan.

[hugh@veritas.com: fix swapoff breakage]
[akpm@linux-foundation.org: fix locking]
Signed-off-by: Vaidyanathan Srinivasan &lt;svaidy@linux.vnet.ibm.com&gt;
Signed-off-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelianov &lt;xemul@openvz.org&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Kirill Korotaev &lt;dev@sw.ru&gt;
Cc: Herbert Poetzl &lt;herbert@13thfloor.at&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;Valdis.Kletnieks@vt.edu&gt;
Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the accounting hooks.  The accounting is carried out for RSS and Page
Cache (unmapped) pages.  There is now a common limit and accounting for both.
The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
time.  Page cache is accounted at add_to_page_cache(),
__delete_from_page_cache().  Swap cache is also accounted for.

Each page's page_cgroup is protected with the last bit of the
page_cgroup pointer, this makes handling of race conditions involving
simultaneous mappings of a page easier.  A reference count is kept in the
page_cgroup to deal with cases where a page might be unmapped from the RSS
of all tasks, but still lives in the page cache.

Credits go to Vaidyanathan Srinivasan for helping with reference counting work
of the page cgroup.  Almost all of the page cache accounting code has help
from Vaidyanathan Srinivasan.

[hugh@veritas.com: fix swapoff breakage]
[akpm@linux-foundation.org: fix locking]
Signed-off-by: Vaidyanathan Srinivasan &lt;svaidy@linux.vnet.ibm.com&gt;
Signed-off-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelianov &lt;xemul@openvz.org&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Kirill Korotaev &lt;dev@sw.ru&gt;
Cc: Herbert Poetzl &lt;herbert@13thfloor.at&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;Valdis.Kletnieks@vt.edu&gt;
Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: don't waste swap on locked pages</title>
<updated>2008-02-05T17:44:18+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hugh@veritas.com</email>
</author>
<published>2008-02-05T06:29:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=5a9bbdcd29adbb786c53eba1dfc3c2d256020d6b'/>
<id>5a9bbdcd29adbb786c53eba1dfc3c2d256020d6b</id>
<content type='text'>
try_to_unmap always fails on a page found in a VM_LOCKED vma (unless
migrating), and recycles it back to the active list.  But if it's an
anonymous page, we've already allocated swap to it: just wasting swap.
Spot locked pages in page_referenced_one and treat them as referenced.

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Tested-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Ethan Solomita &lt;solo@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
try_to_unmap always fails on a page found in a VM_LOCKED vma (unless
migrating), and recycles it back to the active list.  But if it's an
anonymous page, we've already allocated swap to it: just wasting swap.
Spot locked pages in page_referenced_one and treat them as referenced.

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Tested-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Ethan Solomita &lt;solo@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>radix-tree: avoid atomic allocations for preloaded insertions</title>
<updated>2008-02-05T17:44:17+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-02-05T06:29:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=e2848a0efedef4dad52d1334d37f8719cd6268fd'/>
<id>e2848a0efedef4dad52d1334d37f8719cd6268fd</id>
<content type='text'>
Most pagecache (and some other) radix tree insertions have the great
opportunity to preallocate a few nodes with relaxed gfp flags.  But the
preallocation is squandered when it comes time to allocate a node, we
default to first attempting a GFP_ATOMIC allocation -- that doesn't
normally fail, but it can eat into atomic memory reserves that we don't
need to be using.

Another upshot of this is that it removes the sometimes highly contended
zone-&gt;lock from underneath tree_lock.  Pagecache insertions are always
performed with a radix tree preload, and after this change, such a
situation will never fall back to kmem_cache_alloc within
radix_tree_node_alloc.

David Miller reports seeing this allocation fail on a highly threaded
sparc64 system:

[527319.459981] dd: page allocation failure. order:0, mode:0x20
[527319.460403] Call Trace:
[527319.460568]  [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
[527319.460636]  [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
[527319.460698]  [000000000055309c] radix_tree_node_alloc+0x20/0x90
[527319.460763]  [0000000000553238] radix_tree_insert+0x12c/0x260
[527319.460830]  [0000000000495cd0] add_to_page_cache+0x38/0xb0
[527319.460893]  [00000000004e4794] mpage_readpages+0x6c/0x134
[527319.460955]  [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
[527319.461028]  [000000000049cc88] ondemand_readahead+0x208/0x214
[527319.461094]  [0000000000496018] do_generic_mapping_read+0xe8/0x428
[527319.461152]  [0000000000497948] generic_file_aio_read+0x108/0x170
[527319.461217]  [00000000004badac] do_sync_read+0x88/0xd0
[527319.461292]  [00000000004bb5cc] vfs_read+0x78/0x10c
[527319.461361]  [00000000004bb920] sys_read+0x34/0x60
[527319.461424]  [0000000000406294] linux_sparc_syscall32+0x3c/0x40

The calltrace is significant: __do_page_cache_readahead allocates a number
of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
memory to satisfy GFP_ATOMIC allocations.  However after the list of pages
goes to mpage_readpages, there can be significant intervals (including disk
IO) before all the pages are inserted into the radix-tree.  So the reserves
can easily be depleted at that point.  The patch is confirmed to fix the
problem.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Most pagecache (and some other) radix tree insertions have the great
opportunity to preallocate a few nodes with relaxed gfp flags.  But the
preallocation is squandered when it comes time to allocate a node, we
default to first attempting a GFP_ATOMIC allocation -- that doesn't
normally fail, but it can eat into atomic memory reserves that we don't
need to be using.

Another upshot of this is that it removes the sometimes highly contended
zone-&gt;lock from underneath tree_lock.  Pagecache insertions are always
performed with a radix tree preload, and after this change, such a
situation will never fall back to kmem_cache_alloc within
radix_tree_node_alloc.

David Miller reports seeing this allocation fail on a highly threaded
sparc64 system:

[527319.459981] dd: page allocation failure. order:0, mode:0x20
[527319.460403] Call Trace:
[527319.460568]  [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
[527319.460636]  [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
[527319.460698]  [000000000055309c] radix_tree_node_alloc+0x20/0x90
[527319.460763]  [0000000000553238] radix_tree_insert+0x12c/0x260
[527319.460830]  [0000000000495cd0] add_to_page_cache+0x38/0xb0
[527319.460893]  [00000000004e4794] mpage_readpages+0x6c/0x134
[527319.460955]  [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
[527319.461028]  [000000000049cc88] ondemand_readahead+0x208/0x214
[527319.461094]  [0000000000496018] do_generic_mapping_read+0xe8/0x428
[527319.461152]  [0000000000497948] generic_file_aio_read+0x108/0x170
[527319.461217]  [00000000004badac] do_sync_read+0x88/0xd0
[527319.461292]  [00000000004bb5cc] vfs_read+0x78/0x10c
[527319.461361]  [00000000004bb920] sys_read+0x34/0x60
[527319.461424]  [0000000000406294] linux_sparc_syscall32+0x3c/0x40

The calltrace is significant: __do_page_cache_readahead allocates a number
of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
memory to satisfy GFP_ATOMIC allocations.  However after the list of pages
goes to mpage_readpages, there can be significant intervals (including disk
IO) before all the pages are inserted into the radix-tree.  So the reserves
can easily be depleted at that point.  The patch is confirmed to fix the
problem.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[S390] Optimize storage key handling for anonymous pages</title>
<updated>2007-11-20T10:13:46+00:00</updated>
<author>
<name>Christian Borntraeger</name>
<email>borntraeger@de.ibm.com</email>
</author>
<published>2007-11-20T10:13:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ce7e9fae8db07af4080e868f4588f8f095f803dc'/>
<id>ce7e9fae8db07af4080e868f4588f8f095f803dc</id>
<content type='text'>
page_mkclean used to call page_clear_dirty for every given page. This
is different to all other architectures, where the dirty bit in the
PTEs is only resetted, if page_mapping() returns a non-NULL pointer.
We can move the page_test_dirty/page_clear_dirty sequence into the
2nd if to avoid unnecessary iske/sske sequences, which are expensive.

This change also helps kvm for s390 as the host must transfer the
dirty bit into the guest status bits. By moving the page_clear_dirty
operation into the 2nd if, the vm will only call page_clear_dirty
for pages where it walks the mapping anyway. There it calls
ptep_clear_flush for writable ptes, so we can transfer the dirty bit
to the guest.

Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
page_mkclean used to call page_clear_dirty for every given page. This
is different to all other architectures, where the dirty bit in the
PTEs is only resetted, if page_mapping() returns a non-NULL pointer.
We can move the page_test_dirty/page_clear_dirty sequence into the
2nd if to avoid unnecessary iske/sske sequences, which are expensive.

This change also helps kvm for s390 as the host must transfer the
dirty bit into the guest status bits. By moving the page_clear_dirty
operation into the 2nd if, the vm will only call page_clear_dirty
for pages where it walks the mapping anyway. There it calls
ptep_clear_flush for writable ptes, so we can transfer the dirty bit
to the guest.

Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
