Blame man/torus-2QoS.8.in

Packit 13e616
.TH TORUS\-2QOS 8 "November 10, 2010" "OpenIB" "OpenIB Management"
Packit 13e616
.
Packit 13e616
.SH NAME
Packit 13e616
torus\-2QoS \- Routing engine for OpenSM subnet manager
Packit 13e616
.
Packit 13e616
.SH DESCRIPTION
Packit 13e616
.
Packit 13e616
Torus-2QoS is routing algorithm designed for large-scale 2D/3D torus fabrics.
Packit 13e616
The torus-2QoS routing engine can provide the following functionality on
Packit 13e616
a 2D/3D torus:
Packit 13e616
.br
Packit 13e616
\" roff illiteracy leads to following brain-dead list implementation
Packit 13e616
\"
Packit 13e616
.na  \" otherwise line space adjustment can add spaces between dash and text
Packit 13e616
.in +2m
Packit 13e616
\[en]
Packit 13e616
'in +2m
Packit 13e616
Routing that is free of credit loops.
Packit 13e616
.in
Packit 13e616
\[en]
Packit 13e616
'in +2m
Packit 13e616
Two levels of Quality of Service (QoS), assuming switches support eight
Packit 13e616
data VLs and channel adapters support two data VLs.
Packit 13e616
.in
Packit 13e616
\[en]
Packit 13e616
'in +2m
Packit 13e616
The ability to route around a single failed switch, and/or multiple failed
Packit 13e616
links, without
Packit 13e616
.in
Packit 13e616
.in +2m
Packit 13e616
\[en]
Packit 13e616
'in +2
Packit 13e616
introducing credit loops, or
Packit 13e616
.in
Packit 13e616
\[en]
Packit 13e616
'in +2m
Packit 13e616
changing path SL values.
Packit 13e616
.in -4m
Packit 13e616
\[en]
Packit 13e616
'in +2m
Packit 13e616
Very short run times, with good scaling properties as fabric size increases.
Packit 13e616
.ad
Packit 13e616
.
Packit 13e616
.SH UNICAST ROUTING
Packit 13e616
.
Packit 13e616
Unicast routing in torus-2QoS is based on Dimension Order Routing (DOR).
Packit 13e616
It avoids the deadlocks that would otherwise occur in a DOR-routed
Packit 13e616
torus using the concept of a dateline for each torus dimension.
Packit 13e616
It encodes into a path SL which datelines the path crosses, as follows:
Packit 13e616
\f(CR
Packit 13e616
.P
Packit 13e616
.nf
Packit 13e616
    sl = 0;
Packit 13e616
    for (d = 0; d < torus_dimensions; d++) {
Packit 13e616
	/* path_crosses_dateline(d) returns 0 or 1 */
Packit 13e616
	sl |= path_crosses_dateline(d) << d;
Packit 13e616
    }
Packit 13e616
.fi
Packit 13e616
\fR
Packit 13e616
.P
Packit 13e616
On a 3D torus this consumes three SL bits, leaving one SL bit unused.
Packit 13e616
Torus-2QoS uses this SL bit to implement two QoS levels.
Packit 13e616
.P
Packit 13e616
Torus-2QoS also makes use of the output port
Packit 13e616
dependence of switch SL2VL maps to encode into one VL bit the
Packit 13e616
information encoded in three SL bits.
Packit 13e616
It computes in which torus coordinate direction each inter-switch link
Packit 13e616
"points", and writes SL2VL maps for such ports as follows:
Packit 13e616
\f(CR
Packit 13e616
.P
Packit 13e616
.nf
Packit 13e616
    for (sl = 0; sl < 16; sl++) {
Packit 13e616
	/* cdir(port) computes which torus coordinate direction
Packit 13e616
	 * a switch port "points" in; returns 0, 1, or 2
Packit 13e616
	 */
Packit 13e616
	sl2vl(iport,oport,sl) = 0x1 & (sl >> cdir(oport));
Packit 13e616
    }
Packit 13e616
.fi
Packit 13e616
\fR
Packit 13e616
.P
Packit 13e616
Thus, on a pristine 3D torus,
Packit 13e616
\fIi.e.\fR,
Packit 13e616
in the absence of failed fabric switches,
Packit 13e616
torus-2QoS consumes eight SL values (SL bits 0-2) and
Packit 13e616
two VL values (VL bit 0) per QoS level to provide deadlock-free routing.
Packit 13e616
.P
Packit 13e616
Torus-2QoS routes around link failure by "taking the long way around" any
Packit 13e616
1D ring interrupted by link failure.  For example, consider the 2D 6x5
Packit 13e616
torus below, where switches are denoted by [+a-zA-Z]:
Packit 13e616
.
Packit 13e616
.
Packit 13e616
\# define macros to start and end ascii art, assuming Roman font.
Packit 13e616
\# the start macro takes an argument which is the width in ems of
Packit 13e616
\# the ascii art, and is used to center it.
Packit 13e616
\#
Packit 13e616
.de ascii_art
Packit 13e616
.nop \f(CR
Packit 13e616
.nr indent_in_ems ((((\\n[.ll] - \\n[.i]) / \\w'm') - \\$1)/2)
Packit 13e616
.in +\\n[indent_in_ems]m
Packit 13e616
.nf
Packit 13e616
..
Packit 13e616
.de end_ascii_art
Packit 13e616
.fi
Packit 13e616
.in
Packit 13e616
.nop \fR
Packit 13e616
..
Packit 13e616
\# end of macro definitions
Packit 13e616
.
Packit 13e616
.
Packit 13e616
.ascii_art 36
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  4  --+----+----+----+----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  3  --+----+----+----D----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  2  --+----+----I----r----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  1  --m----S----n----T----o----p--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
y=0  --+----+----+----+----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
Packit 13e616
     x=0    1    2    3    4    5
Packit 13e616
.end_ascii_art
Packit 13e616
.P
Packit 13e616
For a pristine fabric the path from S to D would be S-n-T-r-D.
Packit 13e616
In the event that either link S-n or n-T has failed, torus-2QoS would
Packit 13e616
use the path S-m-p-o-T-r-D.
Packit 13e616
Note that it can do this without changing the path SL
Packit 13e616
value; once the 1D ring m-S-n-T-o-p-m has been broken by failure, path
Packit 13e616
segments using it cannot contribute to deadlock, and the x-direction
Packit 13e616
dateline (between, say, x=5 and x=0) can be ignored for path segments on
Packit 13e616
that ring.
Packit 13e616
.P
Packit 13e616
One result of this is that torus-2QoS can route around many simultaneous
Packit 13e616
link failures, as long as no 1D ring is broken into disjoint segments.
Packit 13e616
For example, if links n-T and T-o have both failed, that ring has been broken
Packit 13e616
into two disjoint segments, T and o-p-m-S-n.
Packit 13e616
Torus-2QoS checks for such
Packit 13e616
issues, reports if they are found, and refuses to route such fabrics.
Packit 13e616
.P
Packit 13e616
Note that in the case where there are multiple parallel links between a
Packit 13e616
pair of switches, torus-2QoS will allocate routes across such links
Packit 13e616
in a round-robin fashion, based on ports at the path destination switch that
Packit 13e616
are active and not used for inter-switch links.
Packit 13e616
Should a link that is one of several such parallel links fail, routes
Packit 13e616
are redistributed across the remaining links.
Packit 13e616
When the last of such a set of parallel links fails, traffic is rerouted
Packit 13e616
as described above.
Packit 13e616
.P
Packit 13e616
Handling a failed switch under DOR requires introducing into a path at
Packit 13e616
least one turn that would be otherwise "illegal",
Packit 13e616
\fIi.e.\fR,
Packit 13e616
not allowed by DOR rules.
Packit 13e616
Torus-2QoS will introduce such a turn as close as possible to the
Packit 13e616
failed switch in order to route around it.
Packit 13e616
.P
Packit 13e616
In the above example, suppose switch T has failed, and consider the path
Packit 13e616
from S to D.
Packit 13e616
Torus-2QoS will produce the path S-n-I-r-D, rather than the
Packit 13e616
S-n-T-r-D path for a pristine torus, by introducing an early turn at n.
Packit 13e616
Normal DOR rules will cause traffic arriving at switch I to be forwarded
Packit 13e616
to switch r; for traffic arriving from I due to the "early" turn at n,
Packit 13e616
this will generate an "illegal" turn at I.
Packit 13e616
.P
Packit 13e616
Torus-2QoS will also use the input port dependence of SL2VL maps to set VL
Packit 13e616
bit 1 (which would be otherwise unused) for y-x, z-x, and z-y turns,
Packit 13e616
\fIi.e.\fR,
Packit 13e616
those turns that are illegal under DOR.
Packit 13e616
This causes the first hop after any such turn to use a separate set of
Packit 13e616
VL values, and prevents deadlock in the presence of a single failed switch.
Packit 13e616
.P
Packit 13e616
For any given path, only the hops after a turn that is illegal under DOR
Packit 13e616
can contribute to a credit loop that leads to deadlock.  So in the example
Packit 13e616
above with failed switch T, the location of the illegal turn at I in the
Packit 13e616
path from S to D requires that any credit loop caused by that turn must
Packit 13e616
encircle the failed switch at T.  Thus the second and later hops after the
Packit 13e616
illegal turn at I (\fIi.e.\fR, hop r-D) cannot contribute to a credit loop
Packit 13e616
because they cannot be used to construct a loop encircling T.  The hop I-r
Packit 13e616
uses a separate VL, so it cannot contribute to a credit loop encircling T.
Packit 13e616
.P
Packit 13e616
Extending this argument shows that in addition to being capable of routing
Packit 13e616
around a single switch failure without introducing deadlock, torus-2QoS can
Packit 13e616
also route around multiple failed switches on the condition they are
Packit 13e616
adjacent in the last dimension routed by DOR.  For example, consider the
Packit 13e616
following case on a 6x6 2D torus:
Packit 13e616
.
Packit 13e616
.ascii_art 36
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  5  --+----+----+----+----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  4  --+----+----+----D----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  3  --+----+----I----u----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  2  --+----+----q----R----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  1  --m----S----n----T----o----p--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
y=0  --+----+----+----+----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
Packit 13e616
     x=0    1    2    3    4    5
Packit 13e616
.end_ascii_art
Packit 13e616
.P
Packit 13e616
Suppose switches T and R have failed, and consider the path from S to D.
Packit 13e616
Torus-2QoS will generate the path S-n-q-I-u-D, with an illegal turn at
Packit 13e616
switch I, and with hop I-u using a VL with bit 1 set.
Packit 13e616
.P
Packit 13e616
As a further example, consider a case that torus-2QoS cannot route without
Packit 13e616
deadlock: two failed switches adjacent in a dimension that is not the last
Packit 13e616
dimension routed by DOR; here the failed switches are O and T:
Packit 13e616
.
Packit 13e616
.ascii_art 36
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  5  --+----+----+----+----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  4  --+----+----+----+----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  3  --+----+----+----+----D----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  2  --+----+----I----q----r----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  1  --m----S----n----O----T----p--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
y=0  --+----+----+----+----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
Packit 13e616
     x=0    1    2    3    4    5
Packit 13e616
.end_ascii_art
Packit 13e616
.P
Packit 13e616
In a pristine fabric, torus-2QoS would generate the path from S to D as
Packit 13e616
S-n-O-T-r-D.  With failed switches O and T, torus-2QoS will generate the
Packit 13e616
path S-n-I-q-r-D, with illegal turn at switch I, and with hop I-q using a
Packit 13e616
VL with bit 1 set.  In contrast to the earlier examples, the second hop
Packit 13e616
after the illegal turn, q-r, can be used to construct a credit loop
Packit 13e616
encircling the failed switches.
Packit 13e616
.
Packit 13e616
.SH MULTICAST ROUTING
Packit 13e616
.
Packit 13e616
Since torus-2QoS uses all four available SL bits, and the three data VL
Packit 13e616
bits that are typically available in current switches, there is no way
Packit 13e616
to use SL/VL values to separate multicast traffic from unicast traffic.
Packit 13e616
Thus, torus-2QoS must generate multicast routing such that credit loops
Packit 13e616
cannot arise from a combination of multicast and unicast path segments.
Packit 13e616
.P
Packit 13e616
It turns out that it is possible to construct spanning trees for multicast
Packit 13e616
routing that have that property.  For the 2D 6x5 torus example above, here
Packit 13e616
is the full-fabric spanning tree that torus-2QoS will construct, where "x"
Packit 13e616
is the root switch and each "+" is a non-root switch:
Packit 13e616
.
Packit 13e616
.ascii_art 36
Packit 13e616
  4    +    +    +    +    +    +
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  3    +    +    +    +    +    +
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  2    +----+----+----x----+----+
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  1    +    +    +    +    +    +
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
y=0    +    +    +    +    +    +
Packit 13e616
Packit 13e616
     x=0    1    2    3    4    5
Packit 13e616
.end_ascii_art
Packit 13e616
.P
Packit 13e616
For multicast traffic routed from root to tip, every turn in the above
Packit 13e616
spanning tree is a legal DOR turn.
Packit 13e616
.P
Packit 13e616
For traffic routed from tip to root, and some traffic routed through the
Packit 13e616
root, turns are not legal DOR turns.  However, to construct a credit loop,
Packit 13e616
the union of multicast routing on this spanning tree with DOR unicast
Packit 13e616
routing can only provide 3 of the 4 turns needed for the loop.
Packit 13e616
.P
Packit 13e616
In addition, if none of the above spanning tree branches crosses a dateline
Packit 13e616
used for unicast credit loop avoidance on a torus, and if multicast traffic
Packit 13e616
is confined to SL 0 or SL 8 (recall that torus-2QoS uses SL bit 3 to
Packit 13e616
differentiate QoS level), then multicast traffic also cannot contribute to
Packit 13e616
the "ring" credit loops that are otherwise possible in a torus.
Packit 13e616
.P
Packit 13e616
Torus-2QoS uses these ideas to create a master spanning tree.  Every
Packit 13e616
multicast group spanning tree will be constructed as a subset of the master
Packit 13e616
tree, with the same root as the master tree.
Packit 13e616
.P
Packit 13e616
Such multicast group spanning trees will in general not be optimal for
Packit 13e616
groups which are a subset of the full fabric. However, this compromise must
Packit 13e616
be made to enable support for two QoS levels on a torus while preventing
Packit 13e616
credit loops.
Packit 13e616
.P
Packit 13e616
In the presence of link or switch failures that result in a fabric for
Packit 13e616
which torus-2QoS can generate credit-loop-free unicast routes, it is also
Packit 13e616
possible to generate a master spanning tree for multicast that retains the
Packit 13e616
required properties.  For example, consider that same 2D 6x5 torus, with
Packit 13e616
the link from (2,2) to (3,2) failed.  Torus-2QoS will generate the following
Packit 13e616
master spanning tree:
Packit 13e616
.
Packit 13e616
.ascii_art 36
Packit 13e616
  4    +    +    +    +    +    +
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  3    +    +    +    +    +    +
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  2  --+----+----+    x----+----+--
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  1    +    +    +    +    +    +
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
y=0    +    +    +    +    +    +
Packit 13e616
Packit 13e616
     x=0    1    2    3    4    5
Packit 13e616
.end_ascii_art
Packit 13e616
.P
Packit 13e616
Two things are notable about this master spanning tree.  First, assuming
Packit 13e616
the x dateline was between x=5 and x=0, this spanning tree has a branch
Packit 13e616
that crosses the dateline.  However, just as for unicast, crossing a
Packit 13e616
dateline on a 1D ring (here, the ring for y=2) that is broken by a failure
Packit 13e616
cannot contribute to a torus credit loop.
Packit 13e616
.P
Packit 13e616
Second, this spanning tree is no longer optimal even for multicast groups
Packit 13e616
that encompass the entire fabric.  That, unfortunately, is a compromise that
Packit 13e616
must be made to retain the other desirable properties of torus-2QoS routing.
Packit 13e616
.P
Packit 13e616
In the event that a single switch fails, torus-2QoS will generate a master
Packit 13e616
spanning tree that has no "extra" turns by appropriately selecting a root
Packit 13e616
switch.
Packit 13e616
In the 2D 6x5 torus example, assume now that the switch at (3,2),
Packit 13e616
\fIi.e.\fR, the root for a pristine fabric, fails.
Packit 13e616
Torus-2QoS will generate the
Packit 13e616
following master spanning tree for that case:
Packit 13e616
.
Packit 13e616
.ascii_art 36
Packit 13e616
		      |
Packit 13e616
  4    +    +    +    +    +    +
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
  3    +    +    +    +    +    +
Packit 13e616
       |    |    |         |    |
Packit 13e616
  2    +    +    +         +    +
Packit 13e616
       |    |    |         |    |
Packit 13e616
  1    +----+----x----+----+----+
Packit 13e616
       |    |    |    |    |    |
Packit 13e616
y=0    +    +    +    +    +    +
Packit 13e616
		      |
Packit 13e616
Packit 13e616
     x=0    1    2    3    4    5
Packit 13e616
.end_ascii_art
Packit 13e616
.P
Packit 13e616
Assuming the y dateline was between y=4 and y=0, this spanning tree has
Packit 13e616
a branch that crosses a dateline.  However, again this cannot contribute
Packit 13e616
to credit loops as it occurs on a 1D ring (the ring for x=3) that is
Packit 13e616
broken by a failure, as in the above example.
Packit 13e616
.
Packit 13e616
.SH TORUS TOPOLOGY DISCOVERY
Packit 13e616
.
Packit 13e616
The algorithm used by torus-2QoS to construct the torus topology from
Packit 13e616
the undirected graph representing the fabric requires that the radix of
Packit 13e616
each dimension be configured via torus-2QoS.conf.
Packit 13e616
It also requires that the torus topology be "seeded"; for a 3D torus this
Packit 13e616
requires configuring four switches that define the three coordinate
Packit 13e616
directions of the torus.
Packit 13e616
.P
Packit 13e616
Given this starting information, the algorithm is to examine the
Packit 13e616
cube formed by the eight switch locations bounded by the corners
Packit 13e616
(x,y,z) and (x+1,y+1,z+1).
Packit 13e616
Based on switches already placed into the torus topology at some of these
Packit 13e616
locations, the algorithm examines 4-loops of inter-switch links to find the
Packit 13e616
one that is consistent with a face of the cube of switch locations,
Packit 13e616
and adds its swiches to the discovered topology in the correct locations.
Packit 13e616
.P
Packit 13e616
Because the algorithm is based on examining the topology of 4-loops of links,
Packit 13e616
a torus with one or more radix-4 dimensions requires extra initial
Packit 13e616
seed configuration.
Packit 13e616
See torus-2QoS.conf(5) for details.
Packit 13e616
Torus-2QoS will detect and report when it has insufficient configuration
Packit 13e616
for a torus with radix-4 dimensions.
Packit 13e616
.P
Packit 13e616
In the event the torus is significantly degraded, \fIi.e.\fR, there are
Packit 13e616
many missing switches or links, it may happen that torus-2QoS is unable
Packit 13e616
to place into the torus some switches and/or links that were discovered
Packit 13e616
in the fabric, and will generate a warning in that case.
Packit 13e616
A similar condition occurs if torus-2QoS is misconfigured, \fIi.e.\fR,
Packit 13e616
the radix of a torus dimension as configured does not match the radix
Packit 13e616
of that torus dimension as wired, and many switches/links in the fabric
Packit 13e616
will not be placed into the torus.
Packit 13e616
.
Packit 13e616
.SH QUALITY OF SERVICE CONFIGURATION
Packit 13e616
.
Packit 13e616
OpenSM will not program switches and channel adapters with
Packit 13e616
SL2VL maps or VL arbitration configuration unless it is invoked with -Q.
Packit 13e616
Since torus-2QoS depends on such functionality for correct operation,
Packit 13e616
always invoke OpenSM with -Q when torus-2QoS is in the list of routing
Packit 13e616
engines.
Packit 13e616
.P
Packit 13e616
Any quality of service configuration method supported by OpenSM will
Packit 13e616
work with torus-2QoS, subject to the following limitations and
Packit 13e616
considerations.
Packit 13e616
.P
Packit 13e616
For all routing engines supported by OpenSM except torus-2QoS,
Packit 13e616
there is a one-to-one correspondence between QoS level and SL.
Packit 13e616
Torus-2QoS can only support two quality of service levels, so only
Packit 13e616
the high-order bit of any SL value used for unicast QoS configuration
Packit 13e616
will be honored by torus-2QoS.
Packit 13e616
.P
Packit 13e616
For multicast QoS configuration, only SL values 0 and 8 should be used
Packit 13e616
with torus-2QoS.
Packit 13e616
.P
Packit 13e616
Since SL to VL map configuration must be under the complete control of
Packit 13e616
torus-2QoS, any configuration via qos_sl2vl, qos_swe_sl2vl,
Packit 13e616
\fIetc.\fR, must and  will be ignored, and a warning will be generated.
Packit 13e616
.P
Packit 13e616
For inter-switch links, Torus-2QoS uses VL values 0-3 to implement one of
Packit 13e616
its supported QoS levels, and VL values 4-7 to implement the other. For
Packit 13e616
endport links (CA, router, switch management port), Torus-2QoS uses VL
Packit 13e616
value 0 for one of its supported QoS levels and VL value 1 to implement
Packit 13e616
the other.  Hard-to-diagnose application issues may arise if traffic is
Packit 13e616
not delivered fairly across each of these two VL ranges. For
Packit 13e616
inter-switch links, Torus-2QoS will detect and warn if VL arbitration is
Packit 13e616
configured unfairly across VLs in the range 0-3, and also in the range
Packit 13e616
4-7. Note that the default OpenSM VL arbitration configuration does
Packit 13e616
not meet this constraint, so all torus-2QoS users should configure VL
Packit 13e616
arbitration via qos_ca_vlarb_high, qos_swe_vlarb_high, qos_ca_vlarb_low,
Packit 13e616
qos_swe_vlarb_low, \fIetc.\fR
Packit 13e616
.P
Packit 13e616
Note that torus-2QoS maps SL values to VL values differently
Packit 13e616
for inter-switch and endport links.  This is why qos_vlarb_high and
Packit 13e616
qos_vlarb_low should not be used, as using them may result in
Packit 13e616
VL arbitration for a QoS level being different across inter-switch
Packit 13e616
links vs. across endport links.
Packit 13e616
.
Packit 13e616
.SH OPERATIONAL CONSIDERATIONS
Packit 13e616
.
Packit 13e616
Any routing algorithm for a torus IB fabric must employ path
Packit 13e616
SL values to avoid credit loops.
Packit 13e616
As a result, all applications run over such fabrics must perform a
Packit 13e616
path record query to obtain the correct path SL for connection setup.
Packit 13e616
Applications that use \fBrdma_cm\fR for connection setup will automatically
Packit 13e616
meet this requirement.
Packit 13e616
.P
Packit 13e616
If a change in fabric topology causes changes in path SL values required
Packit 13e616
to route without credit loops, in general all applications would need
Packit 13e616
to repath to avoid message deadlock.  Since torus-2QoS has the ability
Packit 13e616
to reroute after a single switch failure without changing path SL values,
Packit 13e616
repathing by running applications is not required when the fabric
Packit 13e616
is routed with torus-2QoS.
Packit 13e616
.P
Packit 13e616
Torus-2QoS can provide unchanging path SL values in the presence of
Packit 13e616
subnet manager failover provided that all OpenSM instances have the
Packit 13e616
same idea of dateline location.  See torus-2QoS.conf(5) for details.
Packit 13e616
.P
Packit 13e616
Torus-2QoS will detect configurations of failed switches and links
Packit 13e616
that prevent routing that is free of credit loops, and will
Packit 13e616
log warnings and refuse to route.  If "no_fallback" was configured in the
Packit 13e616
list of OpenSM routing engines, then no other routing engine
Packit 13e616
will attempt to route the fabric.  In that case all paths that
Packit 13e616
do not transit the failed components will continue to work, and
Packit 13e616
the subset of paths that are still operational will continue to remain
Packit 13e616
free of credit loops.
Packit 13e616
OpenSM will continue to attempt to route the fabric after every sweep
Packit 13e616
interval, and after any change (such as a link up) in the fabric topology.
Packit 13e616
When the fabric components are repaired, full functionality will be
Packit 13e616
restored.
Packit 13e616
.P
Packit 13e616
In the event OpenSM was configured to allow some other engine to
Packit 13e616
route the fabric if torus-2QoS fails, then credit loops and message
Packit 13e616
deadlock are likely if torus-2QoS had previously routed
Packit 13e616
the fabric successfully.
Packit 13e616
Even if the other engine is capable of routing a torus
Packit 13e616
without credit loops, applications that built connections with
Packit 13e616
path SL values granted under torus-2QoS will likely experience
Packit 13e616
message deadlock under routing generated by a different engine,
Packit 13e616
unless they repath.
Packit 13e616
.P
Packit 13e616
To verify that a torus fabric is routed free of credit loops,
Packit 13e616
use \fBibdmchk\fR to analyze data collected via \fBibdiagnet -vlr\fR.
Packit 13e616
.
Packit 13e616
.SH FILES
Packit 13e616
.TP
Packit 13e616
.B @OPENSM_CONFIG_DIR@/@OPENSM_CONFIG_FILE@
Packit 13e616
default OpenSM config file.
Packit 13e616
.TP
Packit 13e616
.B @OPENSM_CONFIG_DIR@/@QOS_POLICY_FILE@
Packit 13e616
default QoS policy config file.
Packit 13e616
.TP
Packit 13e616
.B @OPENSM_CONFIG_DIR@/@TORUS2QOS_CONF_FILE@
Packit 13e616
default torus-2QoS config file.
Packit 13e616
.
Packit 13e616
.SH SEE ALSO
Packit 13e616
.
Packit 13e616
opensm(8), torus-2QoS.conf(5), ibdiagnet(1), ibdmchk(1), rdma_cm(7).