<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/infiniband, branch v2.6.22-rc4</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>IB/cm: Fix stale connection detection</title>
<updated>2007-05-29T23:07:09+00:00</updated>
<author>
<name>Sean Hefty</name>
<email>sean.hefty@intel.com</email>
</author>
<published>2007-05-22T00:38:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d998ccce020e2cfcf11c6b57503532930ede2894'/>
<id>d998ccce020e2cfcf11c6b57503532930ede2894</id>
<content type='text'>
The ib_cm can incorrectly detect a stale connection (a new connection
request for a QPN that is already connected) as a duplicate connection
request.  Separate the handling of potential duplicate REQs from stale
connections.

Signed-off-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The ib_cm can incorrectly detect a stale connection (a new connection
request for a QPN that is already connected) as a duplicate connection
request.  Separate the handling of potential duplicate REQs from stale
connections.

Signed-off-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB/cm: Fix performance regression on Mellanox</title>
<updated>2007-05-29T23:07:09+00:00</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@dev.mellanox.co.il</email>
</author>
<published>2007-05-28T11:37:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ec56dc0b7f6c3fec20bbc2e98ff1a06edf2fc9b9'/>
<id>ec56dc0b7f6c3fec20bbc2e98ff1a06edf2fc9b9</id>
<content type='text'>
commit 518b1646 ("IPoIB/cm: Fix SRQ WR leak") introduced a severe
performance regression on Mellanox cards, because keeping a QP in the
error state for extended periods of time moves hardware to the slow
path (until the QP is destroyed).  For example, MPI latency goes from
~3 usecs to ~7 usecs.

Fix this by posting a send WR on one of the QPs that are being
flushed, instead of using a separate drain QP that is kept in the
error state.

This fixes bug &lt;https://bugs.openfabrics.org/show_bug.cgi?id=636&gt;,
reported and bisected by Scott Weitzenkamp at Cisco and debugged by
Sasha Mikheev at Voltaire.

Signed-off-by: Michael S. Tsirkin &lt;mst@dev.mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 518b1646 ("IPoIB/cm: Fix SRQ WR leak") introduced a severe
performance regression on Mellanox cards, because keeping a QP in the
error state for extended periods of time moves hardware to the slow
path (until the QP is destroyed).  For example, MPI latency goes from
~3 usecs to ~7 usecs.

Fix this by posting a send WR on one of the QPs that are being
flushed, instead of using a separate drain QP that is kept in the
error state.

This fixes bug &lt;https://bugs.openfabrics.org/show_bug.cgi?id=636&gt;,
reported and bisected by Scott Weitzenkamp at Cisco and debugged by
Sasha Mikheev at Voltaire.

Signed-off-by: Michael S. Tsirkin &lt;mst@dev.mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/mthca: Fix handling of send CQE with error for QPs connected to SRQ</title>
<updated>2007-05-29T23:07:09+00:00</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@dev.mellanox.co.il</email>
</author>
<published>2007-05-27T15:06:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=8b7e15772a286d0ef8e4f8eca422ce5368b6fa97'/>
<id>8b7e15772a286d0ef8e4f8eca422ce5368b6fa97</id>
<content type='text'>
mthca_free_err_wqe() currently treats both send and receive CQEs
identically if a QP is using an SRQ.  But for Tavor hardware, send
CQEs with error can be chained together even if the RQ is part of SRQ,
so we may miss some CQEs.

Fix by following the WQE chain for all send CQEs even for non-SRQ QPs.

This fixes crashes in IPoIB CM:
&lt;https://bugs.openfabrics.org//show_bug.cgi?id=604&gt;

Signed-off-by: Michael S. Tsirkin &lt;mst@dev.mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
mthca_free_err_wqe() currently treats both send and receive CQEs
identically if a QP is using an SRQ.  But for Tavor hardware, send
CQEs with error can be chained together even if the RQ is part of SRQ,
so we may miss some CQEs.

Fix by following the WQE chain for all send CQEs even for non-SRQ QPs.

This fixes crashes in IPoIB CM:
&lt;https://bugs.openfabrics.org//show_bug.cgi?id=604&gt;

Signed-off-by: Michael S. Tsirkin &lt;mst@dev.mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB/cm: Drain cq in ipoib_cm_dev_stop()</title>
<updated>2007-05-24T21:02:40+00:00</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@dev.mellanox.co.il</email>
</author>
<published>2007-05-24T15:32:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2dfbfc37121d307e1f1d24c2979382cb17b19347'/>
<id>2dfbfc37121d307e1f1d24c2979382cb17b19347</id>
<content type='text'>
Since NAPI polling is disabled while ipoib_cm_dev_stop() is running,
ipoib_cm_dev_stop() must poll the CQ itself in order to see the
packets draining.

Signed-off-by: Michael S. Tsirkin &lt;mst@dev.mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since NAPI polling is disabled while ipoib_cm_dev_stop() is running,
ipoib_cm_dev_stop() must poll the CQ itself in order to see the
packets draining.

Signed-off-by: Michael S. Tsirkin &lt;mst@dev.mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB/cm: Fix timeout check in ipoib_cm_dev_stop()</title>
<updated>2007-05-24T21:02:39+00:00</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@dev.mellanox.co.il</email>
</author>
<published>2007-05-24T21:02:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=8fd357a6e3375083f7d321413eb8f6739491f342'/>
<id>8fd357a6e3375083f7d321413eb8f6739491f342</id>
<content type='text'>
time_after() was used backwards, so the timeout occurred immediately.

Signed-off-by: Michael S. Tsirkin &lt;mst@dev.mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
time_after() was used backwards, so the timeout occurred immediately.

Signed-off-by: Michael S. Tsirkin &lt;mst@dev.mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/ehca: Fix number of send WRs reported for new QP</title>
<updated>2007-05-24T21:02:39+00:00</updated>
<author>
<name>Stefan Roscher</name>
<email>stefan.roscher@de.ibm.com</email>
</author>
<published>2007-05-24T14:51:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=65a2c841d68ae3402ea4cad8d00fe4b9b0a5bc80'/>
<id>65a2c841d68ae3402ea4cad8d00fe4b9b0a5bc80</id>
<content type='text'>
Due to a typo, the driver was reporting the wrong number of "actual send
WRs" after ehca_create_qp().

Signed-off-by: Joachim Fenkes &lt;fenkes@de.ibm.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Due to a typo, the driver was reporting the wrong number of "actual send
WRs" after ehca_create_qp().

Signed-off-by: Joachim Fenkes &lt;fenkes@de.ibm.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/mlx4: Initialize send queue entry ownership bits</title>
<updated>2007-05-24T21:02:38+00:00</updated>
<author>
<name>Eli Cohen</name>
<email>eli@mellanox.co.il</email>
</author>
<published>2007-05-24T13:05:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=c0be5fb5f835110652911ea8b88ad78f841e5b45'/>
<id>c0be5fb5f835110652911ea8b88ad78f841e5b45</id>
<content type='text'>
We need to initialize the owner bit of send queue WQEs to hardware 
ownership whenever the QP is modified from reset to init, not just 
when the QP is first allocated.  This avoids having the hardware 
process stale WQEs when the QP is moved to reset but not destroyed and 
then modified to init again. 

Signed-off-by: Eli Cohen &lt;eli@mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We need to initialize the owner bit of send queue WQEs to hardware 
ownership whenever the QP is modified from reset to init, not just 
when the QP is first allocated.  This avoids having the hardware 
process stale WQEs when the QP is moved to reset but not destroyed and 
then modified to init again. 

Signed-off-by: Eli Cohen &lt;eli@mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/mlx4: Don't allocate RQ doorbell if using SRQ</title>
<updated>2007-05-23T22:16:08+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>rolandd@cisco.com</email>
</author>
<published>2007-05-23T22:16:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=02d89b87081f516ad3993637f9b75db0d9786554'/>
<id>02d89b87081f516ad3993637f9b75db0d9786554</id>
<content type='text'>
If a QP is attached to a shared receive queue (SRQ), then it doesn't
have a receive queue (RQ).  So don't allocate an RQ doorbell (or map a
doorbell from userspace for userspace QPs) for that QP.

Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a QP is attached to a shared receive queue (SRQ), then it doesn't
have a receive queue (RQ).  So don't allocate an RQ doorbell (or map a
doorbell from userspace for userspace QPs) for that QP.

Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband</title>
<updated>2007-05-21T23:19:32+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@woody.linux-foundation.org</email>
</author>
<published>2007-05-21T23:19:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=8aee74c8ee875448cc6d1cf995c9469eb60ae515'/>
<id>8aee74c8ee875448cc6d1cf995c9469eb60ae515</id>
<content type='text'>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/cm: Improve local id allocation
  IPoIB/cm: Fix SRQ WR leak
  IB/ipoib: Fix typos in error messages
  IB/mlx4: Check if SRQ is full when posting receive
  IB/mlx4: Pass send queue sizes from userspace to kernel
  IB/mlx4: Fix check of opcode in mlx4_ib_post_send()
  mlx4_core: Fix array overrun in dump_dev_cap_flags()
  IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions
  IB/mthca: Fix RESET to ERROR transition
  IB/mlx4: Set GRH:HopLimit when sending globally routed MADs
  IB/mthca: Set GRH:HopLimit when building MLX headers
  IB/mlx4: Fix check of max_qp_dest_rdma in modify QP
  IB/mthca: Fix use-after-free on device restart
  IB/ehca: Return proper error code if register_mr fails
  IPoIB: Handle P_Key table reordering
  IB/core: Use start_port() and end_port()
  IB/core: Add helpers for uncached GID and P_Key searches
  IB/ipath: Fix potential deadlock with multicast spinlocks
  IB/core: Free umem when mm is already gone
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/cm: Improve local id allocation
  IPoIB/cm: Fix SRQ WR leak
  IB/ipoib: Fix typos in error messages
  IB/mlx4: Check if SRQ is full when posting receive
  IB/mlx4: Pass send queue sizes from userspace to kernel
  IB/mlx4: Fix check of opcode in mlx4_ib_post_send()
  mlx4_core: Fix array overrun in dump_dev_cap_flags()
  IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions
  IB/mthca: Fix RESET to ERROR transition
  IB/mlx4: Set GRH:HopLimit when sending globally routed MADs
  IB/mthca: Set GRH:HopLimit when building MLX headers
  IB/mlx4: Fix check of max_qp_dest_rdma in modify QP
  IB/mthca: Fix use-after-free on device restart
  IB/ehca: Return proper error code if register_mr fails
  IPoIB: Handle P_Key table reordering
  IB/core: Use start_port() and end_port()
  IB/core: Add helpers for uncached GID and P_Key searches
  IB/ipath: Fix potential deadlock with multicast spinlocks
  IB/core: Free umem when mm is already gone
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/cm: Improve local id allocation</title>
<updated>2007-05-21T20:41:29+00:00</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@dev.mellanox.co.il</email>
</author>
<published>2007-05-21T16:06:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=9f81036c54ed1f860d2807c5a6aa4f2b30c21204'/>
<id>9f81036c54ed1f860d2807c5a6aa4f2b30c21204</id>
<content type='text'>
The IB CM uses an idr for local id allocations, with a running counter
as start_id.  This fails to generate distinct ids if

1. An id is constantly created and destroyed
2. A chunk of ids just beyond the current next_id value is occupied

This in turn leads to an increased chance of connection request being
mis-detected as a duplicate, sometimes for several retries, until
next_id gets past the block of allocated ids. This has been observed
in practice.

As a fix, remember the last id allocated and start immediately above it.
This also fixes a problem with the old code, where next_id might
overflow and become negative.

Signed-off-by: Michael S. Tsirkin &lt;mst@dev.mellanox.co.il&gt;
Acked-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The IB CM uses an idr for local id allocations, with a running counter
as start_id.  This fails to generate distinct ids if

1. An id is constantly created and destroyed
2. A chunk of ids just beyond the current next_id value is occupied

This in turn leads to an increased chance of connection request being
mis-detected as a duplicate, sometimes for several retries, until
next_id gets past the block of allocated ids. This has been observed
in practice.

As a fix, remember the last id allocated and start immediately above it.
This also fixes a problem with the old code, where next_id might
overflow and become negative.

Signed-off-by: Michael S. Tsirkin &lt;mst@dev.mellanox.co.il&gt;
Acked-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
