<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/futex.c, branch v3.15</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>futex: Make lookup_pi_state more robust</title>
<updated>2014-06-05T19:31:07+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-06-03T12:27:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=54a217887a7b658e2650c3feff22756ab80c7339'/>
<id>54a217887a7b658e2650c3feff22756ab80c7339</id>
<content type='text'>
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex.  We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.

The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address.  This can lead to state leakage and worse under some
circumstances.

Handle the cases explicit:

       Waiter | pi_state | pi-&gt;owner | uTID      | uODIED | ?

  [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
  [2]  NULL   | ---      | ---       | &gt;0        | 0/1    | Valid

  [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid

  [4]  Found  | Found    | NULL      | 0         | 1      | Valid
  [5]  Found  | Found    | NULL      | &gt;0        | 1      | Invalid

  [6]  Found  | Found    | task      | 0         | 1      | Valid

  [7]  Found  | Found    | NULL      | Any       | 0      | Invalid

  [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
  [9]  Found  | Found    | task      | 0         | 0      | Invalid
  [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid

 [1] Indicates that the kernel can acquire the futex atomically. We
     came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

 [2] Valid, if TID does not belong to a kernel thread. If no matching
     thread is found then it indicates that the owner TID has died.

 [3] Invalid. The waiter is queued on a non PI futex

 [4] Valid state after exit_robust_list(), which sets the user space
     value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

 [5] The user space value got manipulated between exit_robust_list()
     and exit_pi_state_list()

 [6] Valid state after exit_pi_state_list() which sets the new owner in
     the pi_state but cannot access the user space value.

 [7] pi_state-&gt;owner can only be NULL when the OWNER_DIED bit is set.

 [8] Owner and user space value match

 [9] There is no transient state which sets the user space TID to 0
     except exit_robust_list(), but this is indicated by the
     FUTEX_OWNER_DIED bit. See [4]

[10] There is no transient state which leaves owner and user space
     TID out of sync.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Will Drewry &lt;wad@chromium.org&gt;
Cc: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex.  We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.

The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address.  This can lead to state leakage and worse under some
circumstances.

Handle the cases explicit:

       Waiter | pi_state | pi-&gt;owner | uTID      | uODIED | ?

  [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
  [2]  NULL   | ---      | ---       | &gt;0        | 0/1    | Valid

  [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid

  [4]  Found  | Found    | NULL      | 0         | 1      | Valid
  [5]  Found  | Found    | NULL      | &gt;0        | 1      | Invalid

  [6]  Found  | Found    | task      | 0         | 1      | Valid

  [7]  Found  | Found    | NULL      | Any       | 0      | Invalid

  [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
  [9]  Found  | Found    | task      | 0         | 0      | Invalid
  [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid

 [1] Indicates that the kernel can acquire the futex atomically. We
     came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

 [2] Valid, if TID does not belong to a kernel thread. If no matching
     thread is found then it indicates that the owner TID has died.

 [3] Invalid. The waiter is queued on a non PI futex

 [4] Valid state after exit_robust_list(), which sets the user space
     value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

 [5] The user space value got manipulated between exit_robust_list()
     and exit_pi_state_list()

 [6] Valid state after exit_pi_state_list() which sets the new owner in
     the pi_state but cannot access the user space value.

 [7] pi_state-&gt;owner can only be NULL when the OWNER_DIED bit is set.

 [8] Owner and user space value match

 [9] There is no transient state which sets the user space TID to 0
     except exit_robust_list(), but this is indicated by the
     FUTEX_OWNER_DIED bit. See [4]

[10] There is no transient state which leaves owner and user space
     TID out of sync.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Will Drewry &lt;wad@chromium.org&gt;
Cc: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Always cleanup owner tid in unlock_pi</title>
<updated>2014-06-05T19:31:07+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-06-03T12:27:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=13fbca4c6ecd96ec1a1cfa2e4f2ce191fe928a5e'/>
<id>13fbca4c6ecd96ec1a1cfa2e4f2ce191fe928a5e</id>
<content type='text'>
If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex.  So the owner TID of the current owner
(the unlocker) persists.  That's observable inconsistant state,
especially when the ownership of the pi state got transferred.

Clean it up unconditionally.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Will Drewry &lt;wad@chromium.org&gt;
Cc: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex.  So the owner TID of the current owner
(the unlocker) persists.  That's observable inconsistant state,
especially when the ownership of the pi state got transferred.

Clean it up unconditionally.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Will Drewry &lt;wad@chromium.org&gt;
Cc: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Validate atomic acquisition in futex_lock_pi_atomic()</title>
<updated>2014-06-05T19:31:07+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-06-03T12:27:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=b3eaa9fc5cd0a4d74b18f6b8dc617aeaf1873270'/>
<id>b3eaa9fc5cd0a4d74b18f6b8dc617aeaf1873270</id>
<content type='text'>
We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.

Verify whether the futex has waiters associated with kernel state.  If
it has, return -EINVAL.  The state is corrupted already, so no point in
cleaning it up.  Subsequent calls will fail as well.  Not our problem.

[ tglx: Use futex_top_waiter() and explain why we do not need to try
  	restoring the already corrupted user space state. ]

Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Will Drewry &lt;wad@chromium.org&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.

Verify whether the futex has waiters associated with kernel state.  If
it has, return -EINVAL.  The state is corrupted already, so no point in
cleaning it up.  Subsequent calls will fail as well.  Not our problem.

[ tglx: Use futex_top_waiter() and explain why we do not need to try
  	restoring the already corrupted user space state. ]

Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Will Drewry &lt;wad@chromium.org&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)</title>
<updated>2014-06-05T19:31:07+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-06-03T12:27:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=e9c243a5a6de0be8e584c604d353412584b592f8'/>
<id>e9c243a5a6de0be8e584c604d353412584b592f8</id>
<content type='text'>
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call.  If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.

This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")

[ tglx: Compare the resulting keys as well, as uaddrs might be
  	different depending on the mapping ]

Fixes CVE-2014-3153.

Reported-by: Pinkie Pie
Signed-off-by: Will Drewry &lt;wad@chromium.org&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call.  If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.

This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")

[ tglx: Compare the resulting keys as well, as uaddrs might be
  	different depending on the mapping ]

Fixes CVE-2014-3153.

Reported-by: Pinkie Pie
Signed-off-by: Will Drewry &lt;wad@chromium.org&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Prevent attaching to kernel threads</title>
<updated>2014-05-19T12:18:49+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-05-12T20:45:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=f0d71b3dcb8332f7971b5f2363632573e6d9486a'/>
<id>f0d71b3dcb8332f7971b5f2363632573e6d9486a</id>
<content type='text'>
We happily allow userspace to declare a random kernel thread to be the
owner of a user space PI futex.

Found while analysing the fallout of Dave Jones syscall fuzzer.

We also should validate the thread group for private futexes and find
some fast way to validate whether the "alleged" owner has RW access on
the file which backs the SHM, but that's a separate issue.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Darren Hart &lt;darren@dvhart.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Clark Williams &lt;williams@redhat.com&gt;
Cc: Paul McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Carlos ODonell &lt;carlos@redhat.com&gt;
Cc: Jakub Jelinek &lt;jakub@redhat.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We happily allow userspace to declare a random kernel thread to be the
owner of a user space PI futex.

Found while analysing the fallout of Dave Jones syscall fuzzer.

We also should validate the thread group for private futexes and find
some fast way to validate whether the "alleged" owner has RW access on
the file which backs the SHM, but that's a separate issue.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Darren Hart &lt;darren@dvhart.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Clark Williams &lt;williams@redhat.com&gt;
Cc: Paul McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Carlos ODonell &lt;carlos@redhat.com&gt;
Cc: Jakub Jelinek &lt;jakub@redhat.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Add another early deadlock detection check</title>
<updated>2014-05-19T12:18:49+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-05-12T20:45:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=866293ee54227584ffcb4a42f69c1f365974ba7f'/>
<id>866293ee54227584ffcb4a42f69c1f365974ba7f</id>
<content type='text'>
Dave Jones trinity syscall fuzzer exposed an issue in the deadlock
detection code of rtmutex:
  http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com

That underlying issue has been fixed with a patch to the rtmutex code,
but the futex code must not call into rtmutex in that case because
    - it can detect that issue early
    - it avoids a different and more complex fixup for backing out

If the user space variable got manipulated to 0x80000000 which means
no lock holder, but the waiters bit set and an active pi_state in the
kernel is found we can figure out the recursive locking issue by
looking at the pi_state owner. If that is the current task, then we
can safely return -EDEADLK.

The check should have been added in commit 59fa62451 (futex: Handle
futex_pi OWNER_DIED take over correctly) already, but I did not see
the above issue caused by user space manipulation back then.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Darren Hart &lt;darren@dvhart.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Clark Williams &lt;williams@redhat.com&gt;
Cc: Paul McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Carlos ODonell &lt;carlos@redhat.com&gt;
Cc: Jakub Jelinek &lt;jakub@redhat.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Dave Jones trinity syscall fuzzer exposed an issue in the deadlock
detection code of rtmutex:
  http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com

That underlying issue has been fixed with a patch to the rtmutex code,
but the futex code must not call into rtmutex in that case because
    - it can detect that issue early
    - it avoids a different and more complex fixup for backing out

If the user space variable got manipulated to 0x80000000 which means
no lock holder, but the waiters bit set and an active pi_state in the
kernel is found we can figure out the recursive locking issue by
looking at the pi_state owner. If that is the current task, then we
can safely return -EDEADLK.

The check should have been added in commit 59fa62451 (futex: Handle
futex_pi OWNER_DIED take over correctly) already, but I did not see
the above issue caused by user space manipulation back then.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Darren Hart &lt;darren@dvhart.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Clark Williams &lt;williams@redhat.com&gt;
Cc: Paul McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Carlos ODonell &lt;carlos@redhat.com&gt;
Cc: Jakub Jelinek &lt;jakub@redhat.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: update documentation for ordering guarantees</title>
<updated>2014-04-13T00:57:51+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2014-04-09T18:55:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d7e8af1afeffb03ab250b91cd70ba8c701f0f2b7'/>
<id>d7e8af1afeffb03ab250b91cd70ba8c701f0f2b7</id>
<content type='text'>
Commits 11d4616bd07f ("futex: revert back to the explicit waiter
counting code") and 69cd9eba3886 ("futex: avoid race between requeue and
wake") changed some of the finer details of how we think about futexes.
One was a late fix and the other a consequence of overlooking the whole
requeuing logic.

The first change caused our documentation to be incorrect, and the
second made us aware that we need to explicitly add more details to it.

Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commits 11d4616bd07f ("futex: revert back to the explicit waiter
counting code") and 69cd9eba3886 ("futex: avoid race between requeue and
wake") changed some of the finer details of how we think about futexes.
One was a late fix and the other a consequence of overlooking the whole
requeuing logic.

The first change caused our documentation to be incorrect, and the
second made us aware that we need to explicitly add more details to it.

Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: avoid race between requeue and wake</title>
<updated>2014-04-09T15:02:12+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-04-08T22:30:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=69cd9eba38867a493a043bb13eb9b33cad5f1a9a'/>
<id>69cd9eba38867a493a043bb13eb9b33cad5f1a9a</id>
<content type='text'>
Jan Stancek reported:
 "pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
  occasionally fails, because some threads fail to wake up.

  Testcase creates 5 threads, which are all waiting on same condition.
  Main thread then calls pthread_cond_broadcast() without holding mutex,
  which calls:

      futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)

  This immediately wakes up single thread A, which unlocks mutex and
  tries to wake up another thread:

      futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)

  If thread A manages to call futex_wake() before any waiters are
  requeued for uaddr2, no other thread is woken up"

The ordering constraints for the hash bucket waiter counting are that
the waiter counts have to be incremented _before_ getting the spinlock
(because the spinlock acts as part of the memory barrier), but the
"requeue" operation didn't honor those rules, and nobody had even
thought about that case.

This fairly simple patch just increments the waiter count for the target
hash bucket (hb2) when requeing a futex before taking the locks.  It
then decrements them again after releasing the lock - the code that
actually moves the futex(es) between hash buckets will do the additional
required waiter count housekeeping.

Reported-and-tested-by: Jan Stancek &lt;jstancek@redhat.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org # 3.14
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Jan Stancek reported:
 "pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
  occasionally fails, because some threads fail to wake up.

  Testcase creates 5 threads, which are all waiting on same condition.
  Main thread then calls pthread_cond_broadcast() without holding mutex,
  which calls:

      futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)

  This immediately wakes up single thread A, which unlocks mutex and
  tries to wake up another thread:

      futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)

  If thread A manages to call futex_wake() before any waiters are
  requeued for uaddr2, no other thread is woken up"

The ordering constraints for the hash bucket waiter counting are that
the waiter counts have to be incremented _before_ getting the spinlock
(because the spinlock acts as part of the memory barrier), but the
"requeue" operation didn't honor those rules, and nobody had even
thought about that case.

This fairly simple patch just increments the waiter count for the target
hash bucket (hb2) when requeing a futex before taking the locks.  It
then decrements them again after releasing the lock - the code that
actually moves the futex(es) between hash buckets will do the additional
required waiter count housekeeping.

Reported-and-tested-by: Jan Stancek &lt;jstancek@redhat.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org # 3.14
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-03-31T17:59:39+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-03-31T17:59:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=462bf234a82ae1ae9d7628f59bc81022591e1348'/>
<id>462bf234a82ae1ae9d7628f59bc81022591e1348</id>
<content type='text'>
Pull core locking updates from Ingo Molnar:
 "The biggest change is the MCS spinlock generalization changes from Tim
  Chen, Peter Zijlstra, Jason Low et al.  There's also lockdep
  fixes/enhancements from Oleg Nesterov, in particular a false negative
  fix related to lockdep_set_novalidate_class() usage"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  locking/mutex: Fix debug checks
  locking/mutexes: Add extra reschedule point
  locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
  locking/mutexes: Unlock the mutex without the wait_lock
  locking/mutexes: Modify the way optimistic spinners are queued
  locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
  locking: Move mcs_spinlock.h into kernel/locking/
  m68k: Skip futex_atomic_cmpxchg_inatomic() test
  futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
  Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
  lockdep: Change lockdep_set_novalidate_class() to use _and_name
  lockdep: Change mark_held_locks() to check hlock-&gt;check instead of lockdep_no_validate
  lockdep: Don't create the wrong dependency on hlock-&gt;check == 0
  lockdep: Make held_lock-&gt;check and "int check" argument bool
  locking/mcs: Allow architecture specific asm files to be used for contended case
  locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
  sched/wait: Suppress Sparse 'variable shadowing' warning
  hung_task/Documentation: Fix hung_task_warnings description
  locking/mcs: Allow architectures to hook in to contended paths
  locking/mcs: Micro-optimize the MCS code, add extra comments
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull core locking updates from Ingo Molnar:
 "The biggest change is the MCS spinlock generalization changes from Tim
  Chen, Peter Zijlstra, Jason Low et al.  There's also lockdep
  fixes/enhancements from Oleg Nesterov, in particular a false negative
  fix related to lockdep_set_novalidate_class() usage"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  locking/mutex: Fix debug checks
  locking/mutexes: Add extra reschedule point
  locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
  locking/mutexes: Unlock the mutex without the wait_lock
  locking/mutexes: Modify the way optimistic spinners are queued
  locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
  locking: Move mcs_spinlock.h into kernel/locking/
  m68k: Skip futex_atomic_cmpxchg_inatomic() test
  futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
  Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
  lockdep: Change lockdep_set_novalidate_class() to use _and_name
  lockdep: Change mark_held_locks() to check hlock-&gt;check instead of lockdep_no_validate
  lockdep: Don't create the wrong dependency on hlock-&gt;check == 0
  lockdep: Make held_lock-&gt;check and "int check" argument bool
  locking/mcs: Allow architecture specific asm files to be used for contended case
  locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
  sched/wait: Suppress Sparse 'variable shadowing' warning
  hung_task/Documentation: Fix hung_task_warnings description
  locking/mcs: Allow architectures to hook in to contended paths
  locking/mcs: Micro-optimize the MCS code, add extra comments
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: revert back to the explicit waiter counting code</title>
<updated>2014-03-21T05:11:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-03-21T05:11:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=11d4616bd07f38d496bd489ed8fad1dc4d928823'/>
<id>11d4616bd07f38d496bd489ed8fad1dc4d928823</id>
<content type='text'>
Srikar Dronamraju reports that commit b0c29f79ecea ("futexes: Avoid
taking the hb-&gt;lock if there's nothing to wake up") causes java threads
getting stuck on futexes when runing specjbb on a power7 numa box.

The cause appears to be that the powerpc spinlocks aren't using the same
ticket lock model that we use on x86 (and other) architectures, which in
turn result in the "spin_is_locked()" test in hb_waiters_pending()
occasionally reporting an unlocked spinlock even when there are pending
waiters.

So this reinstates Davidlohr Bueso's original explicit waiter counting
code, which I had convinced Davidlohr to drop in favor of figuring out
the pending waiters by just using the existing state of the spinlock and
the wait queue.

Reported-and-tested-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Original-code-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Srikar Dronamraju reports that commit b0c29f79ecea ("futexes: Avoid
taking the hb-&gt;lock if there's nothing to wake up") causes java threads
getting stuck on futexes when runing specjbb on a power7 numa box.

The cause appears to be that the powerpc spinlocks aren't using the same
ticket lock model that we use on x86 (and other) architectures, which in
turn result in the "spin_is_locked()" test in hb_waiters_pending()
occasionally reporting an unlocked spinlock even when there are pending
waiters.

So this reinstates Davidlohr Bueso's original explicit waiter counting
code, which I had convinced Davidlohr to drop in favor of figuring out
the pending waiters by just using the existing state of the spinlock and
the wait queue.

Reported-and-tested-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Original-code-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
