linux.git/kernel/cpuset.c, branch v2.6.17.10

[PATCH] cpuset: might_sleep_if check in cpuset_zones_allowed

2006-05-21T19:59:18+00:00

It's too easy to incorrectly call cpuset_zone_allowed() in an atomic
context without __GFP_HARDWALL set, and when done, it is not noticed until
a tight memory situation forces allocations to be tried outside the current
cpuset.

Add a 'might_sleep_if()' check, to catch this earlier on, instead of
waiting for a similar check in the mutex_lock() code, which is only rarely
invoked.

Signed-off-by: Paul Jackson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset: update cpuset_zones_allowed comment

2006-05-21T19:59:18+00:00

Update the kernel/cpuset.c:cpuset_zone_allowed() comment.

The rule for when mm/page_alloc.c should call cpuset_zone_allowed()
was intended to be:

  Don't call cpuset_zone_allowed() if you can't sleep, unless you
  pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
  the code that might scan up ancestor cpusets and sleep.

The explanation of this rule in the comment above cpuset_zone_allowed() was
stale, as a result of a restructuring of some __alloc_pages() code in
November 2005.

Rewrite that comment ...

Signed-off-by: Paul Jackson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset: memory migration interaction fix

2006-03-31T20:18:55+00:00

Fix memory migration so that it works regardless of what cpuset the invoking
task is in.

If a task invoked a memory migration, by doing one of:

       1) writing a different nodemask to a cpuset 'mems' file, or

       2) writing a tasks pid to a different cpuset's 'tasks' file,
          where the cpuset had its 'memory_migrate' option turned on, then the
          allocation of the new pages for the migrated task(s) was constrained
          by the invoking tasks cpuset.

If this task wasn't in a cpuset that allowed the requested memory nodes, the
memory migration would happen to some other nodes that were in that invoking
tasks cpuset.  This was usually surprising and puzzling behaviour: Why didn't
the pages move?  Why did the pages move -there-?

To fix this, temporarilly change the invoking tasks 'mems_allowed' task_struct
field to the nodes the migrating tasks is moving to, so that new pages can be
allocated there.

Signed-off-by: Paul Jackson 
Acked-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset: unsafe mm reference fix

2006-03-31T20:18:55+00:00

Fix unsafe reference to a tasks mm struct, by moving the reference inside of a
convenient nearby properly guarded code block.

Signed-off-by: Paul Jackson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset: task_lock comment fix

2006-03-31T20:18:55+00:00

Fix cpuset comment involving case of a tasks cpuset pointer being NULL.
Thanks to "the_top_cpuset_hack", this code no longer sees NULL task->cpuset
pointers.

Signed-off-by: Paul Jackson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset: remove useless local variable initialization

2006-03-24T15:33:24+00:00

Remove a useless variable initialization in cpuset __cpuset_zone_allowed().
 The local variable 'allowed' is unconditionally set before use, later on
in the code, so does not need to be initialized.

Not that it seems to matter to the code generated any, as the compiler
optimizes out the superfluous assignment anyway.

Signed-off-by: Paul Jackson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset: don't need to mark cpuset_mems_generation atomic

2006-03-24T15:33:24+00:00

Drop the atomic_t marking on the cpuset static global
cpuset_mems_generation.  Since all access to it is guarded by the global
manage_mutex, there is no need for further serialization of this value.

Signed-off-by: Paul Jackson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset: remove unnecessary NULL check

2006-03-24T15:33:23+00:00

Remove a no longer needed test for NULL cpuset pointer, with a little
comment explaining why the test isn't needed.

Signed-off-by: Paul Jackson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset memory spread basic implementation

2006-03-24T15:33:22+00:00

This patch provides the implementation and cpuset interface for an alternative
memory allocation policy that can be applied to certain kinds of memory
allocations, such as the page cache (file system buffers) and some slab caches
(such as inode caches).

The policy is called "memory spreading." If enabled, it spreads out these
kinds of memory allocations over all the nodes allowed to a task, instead of
preferring to place them on the node where the task is executing.

All other kinds of allocations, including anonymous pages for a tasks stack
and data regions, are not affected by this policy choice, and continue to be
allocated preferring the node local to execution, as modified by the NUMA
mempolicy.

There are two boolean flag files per cpuset that control where the kernel
allocates pages for the file system buffers and related in kernel data
structures.  They are called 'memory_spread_page' and 'memory_spread_slab'.

If the per-cpuset boolean flag file 'memory_spread_page' is set, then the
kernel will spread the file system buffers (page cache) evenly over all the
nodes that the faulting task is allowed to use, instead of preferring to put
those pages on the node where the task is running.

If the per-cpuset boolean flag file 'memory_spread_slab' is set, then the
kernel will spread some file system related slab caches, such as for inodes
and dentries evenly over all the nodes that the faulting task is allowed to
use, instead of preferring to put those pages on the node where the task is
running.

The implementation is simple.  Setting the cpuset flags 'memory_spread_page'
or 'memory_spread_cache' turns on the per-process flags PF_SPREAD_PAGE or
PF_SPREAD_SLAB, respectively, for each task that is in the cpuset or
subsequently joins that cpuset.  In subsequent patches, the page allocation
calls for the affected page cache and slab caches are modified to perform an
inline check for these flags, and if set, a call to a new routine
cpuset_mem_spread_node() returns the node to prefer for the allocation.

The cpuset_mem_spread_node() routine is also simple.  It uses the value of a
per-task rotor cpuset_mem_spread_rotor to select the next node in the current
tasks mems_allowed to prefer for the allocation.

This policy can provide substantial improvements for jobs that need to place
thread local data on the corresponding node, but that need to access large
file system data sets that need to be spread across the several nodes in the
jobs cpuset in order to fit.  Without this patch, especially for jobs that
might have one thread reading in the data set, the memory allocation across
the nodes in the jobs cpuset can become very uneven.

A couple of Copyright year ranges are updated as well.  And a couple of email
addresses that can be found in the MAINTAINERS file are removed.

Signed-off-by: Paul Jackson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset use combined atomic_inc_return calls

2006-03-24T15:33:22+00:00

Replace pairs of calls to , with a single call
atomic_inc_return, saving a few bytes of source and kernel text.

Signed-off-by: Paul Jackson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds