linux.git/fs/buffer.c, branch v2.6.27-rc4

fs: rename buffer trylock

2008-08-05T04:56:09+00:00

Like the page lock change, this also requires name change, so convert the
raw test_and_set bitop to a trylock.

Signed-off-by: Nick Piggin 
Signed-off-by: Linus Torvalds

fs/buffer.c: uninline __remove_assoc_queue()

2008-07-30T16:41:46+00:00

Uninline the __remove_assoc_queue() function in fs/buffer.c, called at too
many places and too long to really be inlined.  Size results:

   text	   data	    bss	    dec	    hex	filename
1134606	 118840	 212992	1466438	 166046	vmlinux.old
1134303	 118840	 212992	1466135	 165f17	vmlinux
   -303       0       0    -303    -12F +/-

This patch is part of the Linux Tiny project and has been originally
written by Matt Mackall .

Signed-off-by: Thomas Petazzoni 
Cc: Matt Mackall 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

vfs: pagecache usage optimization for pagesize!=blocksize

2008-07-28T23:30:21+00:00

When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.

I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment.  Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.

So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate.  This can
reduce read IO and improve system throughput.

I wrote a benchmark program and got result number with this program.

This benchmark do:

  1: mount and open a test file.

  2: create a 512MB file.

  3: close a file and umount.

  4: mount and again open a test file.

  5: pwrite randomly 300000 times on a test file.  offset is aligned
     by IO size(1024bytes).

  6: measure time of preading randomly 100000 times on a test file.

The result was:
	2.6.26
        330 sec

	2.6.26-patched
        226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block.  So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment.  This test result
showed this.

The benchmark program is as follows:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
	unsigned long i, offset, filesize;
	int fd;
	char buf[LEN];
	time_t t1, t2;

	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
		perror("cannot mount\n");
		exit(1);
	}
	memset(buf, 0, LEN);
	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
	if (fd < 0) {
		perror("cannot open file\n");
		exit(1);
	}
	for (i = 0; i < LOOP; i++)
		write(fd, buf, LEN);
	close(fd);
	if (umount("/root/test1/") < 0) {
		perror("cannot umount\n");
		exit(1);
	}
	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
		perror("cannot mount\n");
		exit(1);
	}
	fd = open("/root/test1/testfile", O_RDWR);
	if (fd < 0) {
		perror("cannot open file\n");
		exit(1);
	}

	filesize = LEN * LOOP;
	for (i = 0; i < 300000; i++){
		offset = (random() % filesize) & (~(LEN - 1));
		pwrite(fd, buf, LEN, offset);
	}
	printf("start test\n");
	time(&t1);
	for (i = 0; i < 100000; i++){
		offset = (random() % filesize) & (~(LEN - 1));
		pread(fd, buf, LEN, offset);
	}
	time(&t2);
	printf("%ld sec\n", t2-t1);
	close(fd);
	if (umount("/root/test1/") < 0) {
		perror("cannot umount\n");
		exit(1);
	}
}

Signed-off-by: Hisashi Hifumi 
Cc: Nick Piggin 
Cc: Christoph Hellwig 
Cc: Jan Kara 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Use WARN() in fs/

2008-07-26T19:00:07+00:00

Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.

Signed-off-by: Arjan van de Ven 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

SL*B: drop kmem cache argument from constructor

2008-07-26T19:00:07+00:00

Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan 
Acked-by: Pekka Enberg 
Acked-by: Christoph Lameter 
Cc: Jon Tollefson 
Cc: Nick Piggin 
Cc: Matt Mackall 
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: spinlock tree_lock

2008-07-26T19:00:06+00:00

mapping->tree_lock has no read lockers.  convert the lock from an rwlock
to a spinlock.

Signed-off-by: Nick Piggin 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Hugh Dickins 
Cc: "Paul E. McKenney" 
Reviewed-by: Peter Zijlstra 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge branch 'generic-ipi' into generic-ipi-for-linus

2008-07-15T19:55:59+00:00

Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar

vfs: add hooks for ext4's delayed allocation support

2008-07-11T23:27:31+00:00

Export mpage_bio_submit() and __mpage_writepage() for the benefit of
ext4's delayed allocation support.   Also change __block_write_full_page
so that if buffers that have the BH_Delay flag set it will call
get_block() to get the physical block allocated, just as in the
!BH_Mapped case.

Signed-off-by: Alex Tomas 
Signed-off-by: "Theodore Ts'o"

vfs: Move mark_inode_dirty() from under page lock in generic_write_end()

2008-07-11T23:27:31+00:00

There's no need to call mark_inode_dirty() under page lock in
generic_write_end(). It unnecessarily makes hold time of page lock longer
and more importantly it forces locking order of page lock and transaction
start for journaling filesystems.

Signed-off-by: Jan Kara 
Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: "Theodore Ts'o"

Properly notify block layer of sync writes

2008-07-01T07:07:34+00:00

fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.

This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:

INFO: task kjournald:20558 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kjournald     D ffff810010820978  6712 20558      2
ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
Call Trace:
[] kobject_get+0x12/0x17
[] getnstimeofday+0x2f/0x83
[] sync_buffer+0x0/0x3f
[] io_schedule+0x5d/0x9f
[] sync_buffer+0x3b/0x3f
[] __wait_on_bit+0x40/0x6f
[] sync_buffer+0x0/0x3f
[] out_of_line_wait_on_bit+0x6c/0x78
[] wake_bit_function+0x0/0x23
[] sync_dirty_buffer+0x98/0xcb
[] journal_commit_transaction+0x97d/0xcb6
[] lock_timer_base+0x26/0x4b
[] kjournald+0xc1/0x1fb
[] autoremove_wake_function+0x0/0x2e
[] kjournald+0x0/0x1fb
[] kthread+0x47/0x74
[] schedule_tail+0x28/0x5d
[] child_rip+0xa/0x12
[] kthread+0x0/0x74
[] child_rip+0x0/0x12

Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.

Signed-off-by: Jens Axboe