linux.git/mm, branch v2.6.21.7

mm: kill validate_anon_vma to avoid mapcount BUG

2007-08-04T16:10:25+00:00

validate_anon_vma gave a useful check on the integrity of the anon_vma list
when Andrea was developing obj rmap; but it was not enabled in SLES9
itself, nor in mainline, until Nick changed commented-out RMAP_DEBUG to
configurable CONFIG_DEBUG_VM in 2.6.17.  Now Petr Vandrovec reports that
its BUG_ON(mapcount > 100000) can easily crash a CONFIG_DEBUG_VM=y system.

That limit was just an arbitrary number to protect against an infinite
loop.  We could raise it to something enormous (depending on sizeof struct
vma and size of memory?); but I rather think validate_anon_vma has outlived
its usefulness, and is better just removed - which gives a magnificent
performance boost to anything like Petr's test program ;)

Of course, a very long anon_vma list is bad news for preemption latency,
and I believe there has been one recent report of such: let's not forget
that, but validate_anon_vma only makes it worse not better.

Signed-off-by: Hugh Dickins 
Cc: Petr Vandrovec 
Acked-by: Nick Piggin 
Cc: Andrea Arcangeli 
Signed-off-by: Andrew Morton 
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman

[PATCH] x86_64: allocate sparsemem memmap above 4G

2007-06-11T18:36:48+00:00

On systems with huge amount of physical memory, VFS cache and memory memmap
may eat all available system memory under 4G, then the system may fail to
allocate swiotlb bounce buffer.

There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
not cover sparsemem model.

This patch add fix to sparsemem model by first try to allocate memmap above
4G.

Signed-off-by: Zou Nan hai 
Acked-by: Suresh Siddha 
Cc: Andi Kleen 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[chrisw: trivial backport]
Signed-off-by: Chris Wright 
Signed-off-by: Greg Kroah-Hartman

[PATCH] fix leaky resv_huge_pages when cpuset is in use

2007-05-23T21:32:48+00:00

The internal hugetlb resv_huge_pages variable can permanently leak nonzero
value in the error path of hugetlb page fault handler when hugetlb page is
used in combination of cpuset.  The leaked count can permanently trap N
number of hugetlb pages in unusable "reserved" state.

Steps to reproduce the bug:

  (1) create two cpuset, user1 and user2
  (2) reserve 50 htlb pages in cpuset user1
  (3) attempt to shmget/shmat 50 htlb page inside cpuset user2
  (4) kernel oom the user process in step 3
  (5) ipcrm the shm segment

At this point resv_huge_pages will have a count of 49, even though
there are no active hugetlbfs file nor hugetlb shared memory segment
in the system.  The leak is permanent and there is no recovery method
other than system reboot. The leaked count will hold up all future use
of that many htlb pages in all cpusets.

The culprit is that the error path of alloc_huge_page() did not
properly undo the change it made to resv_huge_page, causing
inconsistent state.

Signed-off-by: Ken Chen 
Cc: David Gibson 
Cc: Adam Litke 
Cc: Martin Bligh 
Acked-by: David Gibson 
Signed-off-by: Andrew Morton 
Signed-off-by: Chris Wright

[PATCH] slob: fix page order calculation on not 4KB page

2007-05-23T21:32:43+00:00

SLOB doesn't calculate correct page order when page size is not 4KB.  This
patch fixes it with using get_order() instead of find_order() which is SLOB
version of get_order().

Signed-off-by: Akinobu Mita 
Acked-by: Matt Mackall 
Signed-off-by: Andrew Morton 
Signed-off-by: Chris Wright

[PATCH] oom: fix constraint deadlock

2007-05-23T21:32:43+00:00

Fixes a deadlock in the OOM killer for allocations that are not
__GFP_HARDWALL.

Before the OOM killer checks for the allocation constraint, it takes
callback_mutex.

constrained_alloc() iterates through each zone in the allocation zonelist
and calls cpuset_zone_allowed_softwall() to determine whether an allocation
for gfp_mask is possible.  If a zone's node is not in the OOM-triggering
task's mems_allowed, it is not exiting, and we did not fail on a
__GFP_HARDWALL allocation, cpuset_zone_allowed_softwall() attempts to take
callback_mutex to check the nearest exclusive ancestor of current's cpuset.
 This results in deadlock.

We now take callback_mutex after iterating through the zonelist since we
don't need it yet.

Cc: Andi Kleen 
Cc: Nick Piggin 
Cc: Christoph Lameter 
Cc: Martin J. Bligh 
Signed-off-by: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Chris Wright

page migration: fix NR_FILE_PAGES accounting

2007-04-24T15:23:08+00:00

NR_FILE_PAGES must be accounted for depending on the zone that the page
belongs to.  If we replace the page in the radix tree then we may have to
shift the count to another zone.

Suggested-by: Ethan Solomita 
Eventually-typed-in-by: Christoph Lameter 
Cc: Martin Bligh 
Cc: 
Signed-off-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fix OOM killing processes wrongly thought MPOL_BIND

2007-04-24T15:23:07+00:00

I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)".  memhog is killed correctly once we initialize nodemask in
constrained_alloc().

Signed-off-by: Hugh Dickins 
Acked-by: Christoph Lameter 
Acked-by: William Irwin 
Acked-by: KAMEZAWA Hiroyuki 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

oom: kill all threads that share mm with killed task

2007-04-24T15:11:49+00:00

oom_kill_task() calls __oom_kill_task() to OOM kill a selected task.
When finding other threads that share an mm with that task, we need to
kill those individual threads and not the same one.

(Bug introduced by f2a2a7108aa0039ba7a5fe7a0d2ecef2219a7584)

Acked-by: William Irwin 
Acked-by: Christoph Lameter 
Cc: Nick Piggin 
Cc: Andrew Morton 
Cc: Andi Kleen 
Signed-off-by: David Rientjes 
Signed-off-by: Linus Torvalds

[PATCH] nommu: fix bug ip_conntrack does not work on nommu

2007-04-12T22:31:42+00:00

num_physpages is not exported out in mm/nommu.c, so the ip_conntrack module
link will fail.

Signed-off-by: Bryan Wu 
Acked-By: David Howells 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6

2007-04-04T17:11:16+00:00

* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] cio: Fix handling of interrupt for csch().
  [S390] page_mkclean data corruption.