<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/buffer.c, branch v2.6.27-rc4</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>fs: rename buffer trylock</title>
<updated>2008-08-05T04:56:09+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-08-02T10:02:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ca5de404ff036a29b25e9a83f6919c9f606c5841'/>
<id>ca5de404ff036a29b25e9a83f6919c9f606c5841</id>
<content type='text'>
Like the page lock change, this also requires name change, so convert the
raw test_and_set bitop to a trylock.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Like the page lock change, this also requires name change, so convert the
raw test_and_set bitop to a trylock.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/buffer.c: uninline __remove_assoc_queue()</title>
<updated>2008-07-30T16:41:46+00:00</updated>
<author>
<name>Thomas Petazzoni</name>
<email>thomas.petazzoni@free-electrons.com</email>
</author>
<published>2008-07-30T05:33:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=dbacefc9c4f6bd365243db379473ab7041656d90'/>
<id>dbacefc9c4f6bd365243db379473ab7041656d90</id>
<content type='text'>
Uninline the __remove_assoc_queue() function in fs/buffer.c, called at too
many places and too long to really be inlined.  Size results:

   text	   data	    bss	    dec	    hex	filename
1134606	 118840	 212992	1466438	 166046	vmlinux.old
1134303	 118840	 212992	1466135	 165f17	vmlinux
   -303       0       0    -303    -12F +/-

This patch is part of the Linux Tiny project and has been originally
written by Matt Mackall &lt;mpm@selenic.com&gt;.

Signed-off-by: Thomas Petazzoni &lt;thomas.petazzoni@free-electrons.com&gt;
Cc: Matt Mackall &lt;mpm@selenic.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Uninline the __remove_assoc_queue() function in fs/buffer.c, called at too
many places and too long to really be inlined.  Size results:

   text	   data	    bss	    dec	    hex	filename
1134606	 118840	 212992	1466438	 166046	vmlinux.old
1134303	 118840	 212992	1466135	 165f17	vmlinux
   -303       0       0    -303    -12F +/-

This patch is part of the Linux Tiny project and has been originally
written by Matt Mackall &lt;mpm@selenic.com&gt;.

Signed-off-by: Thomas Petazzoni &lt;thomas.petazzoni@free-electrons.com&gt;
Cc: Matt Mackall &lt;mpm@selenic.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: pagecache usage optimization for pagesize!=blocksize</title>
<updated>2008-07-28T23:30:21+00:00</updated>
<author>
<name>Hisashi Hifumi</name>
<email>hifumi.hisashi@oss.ntt.co.jp</email>
</author>
<published>2008-07-28T22:46:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=8ab22b9abb5c55413802e4adc9aa6223324547c3'/>
<id>8ab22b9abb5c55413802e4adc9aa6223324547c3</id>
<content type='text'>
When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.

I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment.  Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.

So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate.  This can
reduce read IO and improve system throughput.

I wrote a benchmark program and got result number with this program.

This benchmark do:

  1: mount and open a test file.

  2: create a 512MB file.

  3: close a file and umount.

  4: mount and again open a test file.

  5: pwrite randomly 300000 times on a test file.  offset is aligned
     by IO size(1024bytes).

  6: measure time of preading randomly 100000 times on a test file.

The result was:
	2.6.26
        330 sec

	2.6.26-patched
        226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block.  So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment.  This test result
showed this.

The benchmark program is as follows:

#include &lt;stdio.h&gt;
#include &lt;sys/types.h&gt;
#include &lt;sys/stat.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;unistd.h&gt;
#include &lt;time.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
#include &lt;sys/mount.h&gt;

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
	unsigned long i, offset, filesize;
	int fd;
	char buf[LEN];
	time_t t1, t2;

	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) &lt; 0) {
		perror("cannot mount\n");
		exit(1);
	}
	memset(buf, 0, LEN);
	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
	if (fd &lt; 0) {
		perror("cannot open file\n");
		exit(1);
	}
	for (i = 0; i &lt; LOOP; i++)
		write(fd, buf, LEN);
	close(fd);
	if (umount("/root/test1/") &lt; 0) {
		perror("cannot umount\n");
		exit(1);
	}
	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) &lt; 0) {
		perror("cannot mount\n");
		exit(1);
	}
	fd = open("/root/test1/testfile", O_RDWR);
	if (fd &lt; 0) {
		perror("cannot open file\n");
		exit(1);
	}

	filesize = LEN * LOOP;
	for (i = 0; i &lt; 300000; i++){
		offset = (random() % filesize) &amp; (~(LEN - 1));
		pwrite(fd, buf, LEN, offset);
	}
	printf("start test\n");
	time(&amp;t1);
	for (i = 0; i &lt; 100000; i++){
		offset = (random() % filesize) &amp; (~(LEN - 1));
		pread(fd, buf, LEN, offset);
	}
	time(&amp;t2);
	printf("%ld sec\n", t2-t1);
	close(fd);
	if (umount("/root/test1/") &lt; 0) {
		perror("cannot umount\n");
		exit(1);
	}
}

Signed-off-by: Hisashi Hifumi &lt;hifumi.hisashi@oss.ntt.co.jp&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Jan Kara &lt;jack@ucw.cz&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.

I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment.  Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.

So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate.  This can
reduce read IO and improve system throughput.

I wrote a benchmark program and got result number with this program.

This benchmark do:

  1: mount and open a test file.

  2: create a 512MB file.

  3: close a file and umount.

  4: mount and again open a test file.

  5: pwrite randomly 300000 times on a test file.  offset is aligned
     by IO size(1024bytes).

  6: measure time of preading randomly 100000 times on a test file.

The result was:
	2.6.26
        330 sec

	2.6.26-patched
        226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block.  So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment.  This test result
showed this.

The benchmark program is as follows:

#include &lt;stdio.h&gt;
#include &lt;sys/types.h&gt;
#include &lt;sys/stat.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;unistd.h&gt;
#include &lt;time.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
#include &lt;sys/mount.h&gt;

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
	unsigned long i, offset, filesize;
	int fd;
	char buf[LEN];
	time_t t1, t2;

	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) &lt; 0) {
		perror("cannot mount\n");
		exit(1);
	}
	memset(buf, 0, LEN);
	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
	if (fd &lt; 0) {
		perror("cannot open file\n");
		exit(1);
	}
	for (i = 0; i &lt; LOOP; i++)
		write(fd, buf, LEN);
	close(fd);
	if (umount("/root/test1/") &lt; 0) {
		perror("cannot umount\n");
		exit(1);
	}
	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) &lt; 0) {
		perror("cannot mount\n");
		exit(1);
	}
	fd = open("/root/test1/testfile", O_RDWR);
	if (fd &lt; 0) {
		perror("cannot open file\n");
		exit(1);
	}

	filesize = LEN * LOOP;
	for (i = 0; i &lt; 300000; i++){
		offset = (random() % filesize) &amp; (~(LEN - 1));
		pwrite(fd, buf, LEN, offset);
	}
	printf("start test\n");
	time(&amp;t1);
	for (i = 0; i &lt; 100000; i++){
		offset = (random() % filesize) &amp; (~(LEN - 1));
		pread(fd, buf, LEN, offset);
	}
	time(&amp;t2);
	printf("%ld sec\n", t2-t1);
	close(fd);
	if (umount("/root/test1/") &lt; 0) {
		perror("cannot umount\n");
		exit(1);
	}
}

Signed-off-by: Hisashi Hifumi &lt;hifumi.hisashi@oss.ntt.co.jp&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Jan Kara &lt;jack@ucw.cz&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Use WARN() in fs/</title>
<updated>2008-07-26T19:00:07+00:00</updated>
<author>
<name>Arjan van de Ven</name>
<email>arjan@linux.intel.com</email>
</author>
<published>2008-07-26T02:45:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=5c752ad9f35910ff1912b3f3ae82878178ddc432'/>
<id>5c752ad9f35910ff1912b3f3ae82878178ddc432</id>
<content type='text'>
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.

Signed-off-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.

Signed-off-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>SL*B: drop kmem cache argument from constructor</title>
<updated>2008-07-26T19:00:07+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2008-07-26T02:45:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=51cc50685a4275c6a02653670af9f108a64e01cf'/>
<id>51cc50685a4275c6a02653670af9f108a64e01cf</id>
<content type='text'>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Acked-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Jon Tollefson &lt;kniht@linux.vnet.ibm.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Matt Mackall &lt;mpm@selenic.com&gt;
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Acked-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Jon Tollefson &lt;kniht@linux.vnet.ibm.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Matt Mackall &lt;mpm@selenic.com&gt;
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: spinlock tree_lock</title>
<updated>2008-07-26T19:00:06+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-07-26T02:45:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=19fd6231279be3c3bdd02ed99f9b0eb195978064'/>
<id>19fd6231279be3c3bdd02ed99f9b0eb195978064</id>
<content type='text'>
mapping-&gt;tree_lock has no read lockers.  convert the lock from an rwlock
to a spinlock.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@us.ibm.com&gt;
Reviewed-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
mapping-&gt;tree_lock has no read lockers.  convert the lock from an rwlock
to a spinlock.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@us.ibm.com&gt;
Reviewed-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'generic-ipi' into generic-ipi-for-linus</title>
<updated>2008-07-15T19:55:59+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2008-07-15T19:55:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=1a781a777b2f6ac46523fe92396215762ced624d'/>
<id>1a781a777b2f6ac46523fe92396215762ced624d</id>
<content type='text'>
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: add hooks for ext4's delayed allocation support</title>
<updated>2008-07-11T23:27:31+00:00</updated>
<author>
<name>Alex Tomas</name>
<email>alex@clusterfs.com</email>
</author>
<published>2008-07-11T23:27:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=29a814d2ee0e43c2980f33f91c1311ec06c0aa35'/>
<id>29a814d2ee0e43c2980f33f91c1311ec06c0aa35</id>
<content type='text'>
Export mpage_bio_submit() and __mpage_writepage() for the benefit of
ext4's delayed allocation support.   Also change __block_write_full_page
so that if buffers that have the BH_Delay flag set it will call
get_block() to get the physical block allocated, just as in the
!BH_Mapped case.

Signed-off-by: Alex Tomas &lt;alex@clusterfs.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Export mpage_bio_submit() and __mpage_writepage() for the benefit of
ext4's delayed allocation support.   Also change __block_write_full_page
so that if buffers that have the BH_Delay flag set it will call
get_block() to get the physical block allocated, just as in the
!BH_Mapped case.

Signed-off-by: Alex Tomas &lt;alex@clusterfs.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: Move mark_inode_dirty() from under page lock in generic_write_end()</title>
<updated>2008-07-11T23:27:31+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2008-07-11T23:27:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=c7d206b3379f7d6462e778b74f475c470ee3dcaf'/>
<id>c7d206b3379f7d6462e778b74f475c470ee3dcaf</id>
<content type='text'>
There's no need to call mark_inode_dirty() under page lock in
generic_write_end(). It unnecessarily makes hold time of page lock longer
and more importantly it forces locking order of page lock and transaction
start for journaling filesystems.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There's no need to call mark_inode_dirty() under page lock in
generic_write_end(). It unnecessarily makes hold time of page lock longer
and more importantly it forces locking order of page lock and transaction
start for journaling filesystems.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Properly notify block layer of sync writes</title>
<updated>2008-07-01T07:07:34+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2008-07-01T07:07:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=18ce3751ccd488c78d3827e9f6bf54e6322676fb'/>
<id>18ce3751ccd488c78d3827e9f6bf54e6322676fb</id>
<content type='text'>
fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.

This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:

INFO: task kjournald:20558 blocked for more than 120 seconds.
"echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kjournald     D ffff810010820978  6712 20558      2
ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
Call Trace:
[&lt;ffffffff803ba6f2&gt;] kobject_get+0x12/0x17
[&lt;ffffffff80247537&gt;] getnstimeofday+0x2f/0x83
[&lt;ffffffff8029c1ac&gt;] sync_buffer+0x0/0x3f
[&lt;ffffffff8066d195&gt;] io_schedule+0x5d/0x9f
[&lt;ffffffff8029c1e7&gt;] sync_buffer+0x3b/0x3f
[&lt;ffffffff8066d3f0&gt;] __wait_on_bit+0x40/0x6f
[&lt;ffffffff8029c1ac&gt;] sync_buffer+0x0/0x3f
[&lt;ffffffff8066d48b&gt;] out_of_line_wait_on_bit+0x6c/0x78
[&lt;ffffffff80243909&gt;] wake_bit_function+0x0/0x23
[&lt;ffffffff8029e3ad&gt;] sync_dirty_buffer+0x98/0xcb
[&lt;ffffffff8030056b&gt;] journal_commit_transaction+0x97d/0xcb6
[&lt;ffffffff8023a676&gt;] lock_timer_base+0x26/0x4b
[&lt;ffffffff8030300a&gt;] kjournald+0xc1/0x1fb
[&lt;ffffffff802438db&gt;] autoremove_wake_function+0x0/0x2e
[&lt;ffffffff80302f49&gt;] kjournald+0x0/0x1fb
[&lt;ffffffff802437bb&gt;] kthread+0x47/0x74
[&lt;ffffffff8022de51&gt;] schedule_tail+0x28/0x5d
[&lt;ffffffff8020cac8&gt;] child_rip+0xa/0x12
[&lt;ffffffff80243774&gt;] kthread+0x0/0x74
[&lt;ffffffff8020cabe&gt;] child_rip+0x0/0x12

Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.

This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:

INFO: task kjournald:20558 blocked for more than 120 seconds.
"echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kjournald     D ffff810010820978  6712 20558      2
ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
Call Trace:
[&lt;ffffffff803ba6f2&gt;] kobject_get+0x12/0x17
[&lt;ffffffff80247537&gt;] getnstimeofday+0x2f/0x83
[&lt;ffffffff8029c1ac&gt;] sync_buffer+0x0/0x3f
[&lt;ffffffff8066d195&gt;] io_schedule+0x5d/0x9f
[&lt;ffffffff8029c1e7&gt;] sync_buffer+0x3b/0x3f
[&lt;ffffffff8066d3f0&gt;] __wait_on_bit+0x40/0x6f
[&lt;ffffffff8029c1ac&gt;] sync_buffer+0x0/0x3f
[&lt;ffffffff8066d48b&gt;] out_of_line_wait_on_bit+0x6c/0x78
[&lt;ffffffff80243909&gt;] wake_bit_function+0x0/0x23
[&lt;ffffffff8029e3ad&gt;] sync_dirty_buffer+0x98/0xcb
[&lt;ffffffff8030056b&gt;] journal_commit_transaction+0x97d/0xcb6
[&lt;ffffffff8023a676&gt;] lock_timer_base+0x26/0x4b
[&lt;ffffffff8030300a&gt;] kjournald+0xc1/0x1fb
[&lt;ffffffff802438db&gt;] autoremove_wake_function+0x0/0x2e
[&lt;ffffffff80302f49&gt;] kjournald+0x0/0x1fb
[&lt;ffffffff802437bb&gt;] kthread+0x47/0x74
[&lt;ffffffff8022de51&gt;] schedule_tail+0x28/0x5d
[&lt;ffffffff8020cac8&gt;] child_rip+0xa/0x12
[&lt;ffffffff80243774&gt;] kthread+0x0/0x74
[&lt;ffffffff8020cabe&gt;] child_rip+0x0/0x12

Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
