Blob Blame History Raw
                           OpenSM Release Notes
                          ======================

Version: OpenFabric Enterprise Distribution (OFED) 1.0
Repo:    https://openib.org/svn/gen2/branches/1.0/src/userspace/management/osm
Version: 7992
Date:    Jun 2006

1 Overview
----------
This document describes the contents of the OpenSM OFED 1.0 release.
OpenSM is an InfiniBand compliant Subnet Manager and Administrator,
and runs on top of OpenIB. The OpenSM version for this release
is openib-1.2.1

This document includes the following sections:
1 This Overview section (describing new features and software
  dependencies)
2 Known Issues And Limitations
3 Unsupported IB compliance statements
4 Major Bug Fixes
5 Main Verification Flows
6 Qualified software stacks and devices

1.1 New Features

* SA GuidInfoRecord support

* Default for maxsmps changed:
  Control the number of SMP sent in parallel and thus shorten the
  fabric initialization time.

* osmtest/osmt_slvl_vl_arb.c:
  Output file name changed from vl_arbs.txt to qos.txt

* Support new IBTA Errata IsPortInfoCapMaskMatchSupported:
  This new capability of the SA enables matching of individual port
  capability bits dramatically reducing the query size for agents like
  the SRP initiator query for finding SRP targets.

* Honor guid2lid when coming out of standby:
  This change adds an option to the opensm that forces it to honor the
  guid2lid file given when it comes out of Standby state. Currently,
  when opensm comes out of Standby state, it ignores the guid2lid file
  it read, and honors only the lids defined on the ports themselves.

* Add guid to opensm opts
  This enables the port on which to run the SM to be defined through
  the configuration file as well as through the command line.

* PPC support:
  No PPC QA was performed.

1.2 Software Dependencies

OpenSM depends on the installation of either OFED 1.0,
OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD
distribution) or Mellanox VAPI stacks. The qualified driver versions
are provided in Table 2, "Qualified IB Stacks".

1.4 Supported Devices Firmware

The main task of OpenSM is to initialize InfiniBand devices. The
qualified devices and their corresponding firmware versions
are listed in Table 3.

2 Known Issues And Limitations
------------------------------

* No Partition/Pkey policy support:
  OpenSM does not provide means to set partitions.

* No Service / Key associations:
  There is no way to manage Service access by Keys.

* No SM to SM SMDB synchronization:
  Puts the burden of re-registering services, multicast groups, and
  inform-info on the client application (or IB access layer core).

* No "port down" event handling:
  Changing the switch port through which OpenSM connects to the IB
  fabric may cause incorrect operation. Please restart OpenSM whenever
  such a connectivity change is made.

* Changing connections during SM operation:
  Under some conditions the SM can get confused by a change in
  cabling (moving a cable from one switch port to the other) and
  momentarily see this as having the same GUID appear connected
  to two different IB ports. Under some conditions, when the SM fails to
  get the corresponding change event it might mistakenly report this case
  as a "duplicated GUID" case and abort. It is advisable to double-check
  the syslog after each such change in connectivity and restart
  OpenSM if it has exited.

* No QoS support:
  No SL2VL and VLArbitration setting is performed by the SM.

3 Unsupported IB Compliance Statements
--------------------------------------
The following section lists all the IB compliance statements which
OpenSM does not support. Please refer to the IB specification for detailed
information regarding each compliance statement.

* C14-22 (Authentication):
  M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
  SubnSet method. As a work-around, an OpenSM option is provided for
  defining the protect bits.

* C14-67 (Authentication):
  On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
  the SM shall generate a SubnGetResp if the M_Key matches, or
  silently drop the packet if M_Key does not match.

* C15-0.1.23.4 (Authentication):
  InformInfoRecords shall always be provided with the QPN set to 0,
  except for the case of a trusted request, in which case the actual
  subscriber QPN shall be returned.

* o13-17.1.2 (Event-FWD):
  If no permission to forward, the subscription should be removed and
  no further forwarding should occur.

* C14-37.1.2 (Handover):
  Priority should be kept in non-volatile memory.

* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
  GUIDInfo - SM should enable assigning Port GUIDInfo.

* C14-44 (Initialization):
  If the SM discovers that it is missing an M_Key to update CA/RT/SW,
  it should notify the higher level.

* C14-62.1.1.11 (Initialization):
  PortInfo:VLHighLimit should match the configured VLArb on the port.

* C14-62.1.1.12 (Initialization):
  PortInfo:M_Key - Set the M_Key to a node based random value.

* C14-62.1.1.13 (Initialization):
  PortInfo:P_KeyProtectBits - set according to an optional policy.

* C14-62.1.1.24 (Initialization):
  SwitchInfo:DefaultPort - should be configured for random FDB.

* C14-62.1.1.32 (Initialization):
  RandomForwardingTable should be configured.

* o15-0.1.12 (Multicast):
  If the JoinState is SendOnlyNonMember = 1 (only), then the endport
  should join as sender only.

* o15-0.1.13 (Multicast):
  If a Join request using unrealistic parameters is received, return
  ERR_REQ_INVALID.

* o15-0.1.8 (Multicast):
  If a request for creating an MCG with fields that cannot be met,
  return ERR_REQ_INVALID (currently ignoring SL and FlowLabelTclass).

* C15-0.1.11 (SA-Query):
  Query response should use only base LIDs (as the feature has not
  been qualified yet).

* C15-0.1.19 (SA-Query):
  Respond to SubnGetMulti(MultiPathRec)

* C15-0.1.8.6 (SA-Query):
  Respond to SubnAdmGetTraceTable - this is an optional attribute.

* C15-0.1.8.7 (SA-Query):
  SubnAdmGetMulti SubnAdmGetMultiResp - Only in case of a MultiPath.

* C15-0.1.13 Services:
  Reject ServiceRecord create, modify or delete if the given
  ServiceP_Key does not match the one included in the ServiceGID port
  and the port that sent the request.

* C15-0.1.14 (Services):
  Provide means to associate service name and ServiceKeys.

4 Major Bug Fixes
-----------------

The following is a list of bugs that were fixed. Note that other less critical
or visible bugs were also fixed.

* Eliminate error on active -> active port state transition
  SM may transition port from armed to active but in the meantime, due
  to passing a data packet with active enable set, the port may
  already have transitioned to active. Active -> active port state
  transition is indicated as an error but it isn't really an error so
  don't indicate error in the osm log.

* Routing not set for the first LID in the last LFT block:
  Fix: osm_switch.c: In osm_switch_get_fwd_tbl_block last block calculation

* Corrupted guid2lid file causes OpenSM exit
  Fix: exit only if exit_on_fatal option is set (the default)

* OpenSM was causing Client-Re-Registration continuously:
  The SM was storing the response PortInfo.ClientReRegstration
  bit and using it during next Set(PortInfo). Fix: clear the bit on
  receive.

* Multicast Query Selectors MTU, rate, and PacketLifeTime were not exact

* Try not to recognize port change as duplicated GUID
  This fix solves the issue of a port move during heavy sweep
  being recognized as a duplicated guid. Fix: If the SM sees what
  seems to be a duplicated guid, but it also received an indication
  for immediately forcing another heavy sweep (for example, as a
  result of receiving trap 128) then it shouldn't issue a duplicated
  guid error (and possibly exit), but should just ignore this and
  continue. This means that only if the SM recognizes such a
  duplication in a stable subnet that it'll issue the error (and
  possibly exit).

* Set PKey table on switch ports not supporting it:
  OpenSM attempts to set pkey table entries on external switch ports
  even if the switch declares a PartitionEnforcementCap of zero.  The
  consequence is ERR 4108. Fix: Observe PartitionEnforcementCap of zero.

* Incorrect MCMemberRecord Get/GetTable in trusted mode:
  This change fixes the retrieval of the MCMember records according to
  Errata MGTWG3280. This fix provide means to obtain all the group
  members by issuing a trusted GetTable query.

* Trusted MCMemberRecord query was not recognized as trusted:
  Trust is checked by comparing the request SM_Key field to the SM
  SM_Key. The bug was in looking up the SM_Key from the response not
  the request.

* Port left in down state after setting MTU or OpVLs on its neighbor:
  In case of a difference between the MTU of two ports, only the port
  with the higher MTU was set to down. Its remote port was written in
  the DB as in the ACTIVE state although its real status was INIT.
  Because of this, the SM didn't try to move the remote port to
  ACTIVE.

* Atomic counters used throughout the code were broken:
  A new implementation has been provided.

* MC Group creation with "less than" MTU ignores the requester MTU:
  When requesting to create an MC group with MTU(rate) selector 1
  (meaning less than rate specified), the MC group is created with
  MTU(rate) requested - 1. This is without checking the MTU(rate) of
  the port requesting the creation of the multicast group. This means
  that if, for example, port with MTU=2 sends a request for MC group
  creation with MTU selector=1 and MTU=5, Opensm will try to create a
  MC group with MTU=4, and fail, since the port capabilities are not
  realizable. Fix: creation of the MC group with MTU(rate) also takes
  into account the MTU(rate) of the port requesting the creation.

* MC Group join does not validate that the joining port's capabilities
  match those of the MC.  Fix: Add verification of endport physical
  capability to join MC group.

* ClientReRegistration not sent to ports discovered after first sweep:
  PortInfo sent with ClientReRegistration bit turned on only during
  the first sweep after becoming Master. This doesn't cover all cases
  where ClientReRegistration should be turned on. Fix: turn on this
  bit also on new ports it discovers (in cases of subnet merging, for
  example).

* segfault during a report on deleted multicast group:
  osm_mcast_mgr.c, executing the line of code:
  osm_mgrp_send_delete_notice( p_mgr->p_subn, p_mgr->p_log, p_mgrp );
  caused segmentation fault since the handle p_mgrp was already
  deleted while the function was called. Fix: inserted the line above
  into the protected section.

* segfault in osm_get_gid_by_mad_addr:
  The affected flows are ports and multicast joins.

* segfault in LID manager:
  Handle NULL p_rem_physp can validly be NULL when the remote SMA is
  not responding (but physical link is up).

* segfault in Up/Down routing engine


5 Main Verification Flows
-------------------------

OpenSM verification is run using the following activities:
* osmtest - a stand-alone program
* ibmgtsim (IB management simulator) based - a set of flows that
  simulate clusters, inject errors and verify OpenSM capability to
  respond and bring up the network correctly.
* small cluster regression testing - where the SM is used on back to
  back or single switch configurations. The regression includes
  multiple OpenSM dedicated tests.
* cluster testing - when we run OpenSM to setup a large cluster, perform
  hand-off, reboots and reconnects, verify routing correctness and SA
  responsiveness at the ULP level (IPoIB and SDP).

5.1 osmtest

osmtest is an automated verification tool used for OpenSM
testing. Its verification flows are described by list below.

* Inventory File: Obtain and verify all port info, node info, and path
  records parameters.

* Service Record:
   - Register new service
   - Register another service (with a lease period)
   - Register another service (with service p_key set to zero)
   - Get all services by name
   - Delete the first service
   - Delete the third service.
   - Added bad flows of get/delete  non valid service
   - Add / Get same service with different data
   - Add / Get / Delete by different component  mask values (services
     by Name & Key / Name & Data / Name & Id / Id only )

* Multicast Member Record:
   - Query of existing Groups (IPoIB)
   - BAD Join with insufficient comp mask (o15.0.1.3)
   - Create given MGID=0 (o15.0.1.4)
   - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
   - Create BAD MGID=0xFA. (o15.0.1.6)
   - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
   - New MGID with invalid join state (o15.0.1.9)
   - Retry of existing MGID - See JoinState update (o15.0.1.11)
   - BAD RATE when connecting to existing MGID (o15.0.1.13)
   - Partial JoinState delete request - removing FullMember (o15.0.1.14)
   - Full Delete of a group (o15.0.1.14)
   - Verify Delete by trying to Join deleted group (o15.0.1.14)
   - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)

* GUIDInfo Record:
   - All GUIDInfoRecords in subnet are obtained

* Event Forwarding: Register for trap forwarding using reports
   - Send a trap and wait for report
   - Unregister non-existing

* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
  disconnecting/connecting ports) and wait for report, then unregister.

* Stress Test: send PortInfoRecord queries, both single and RMPP and
  check for the rate of responses as well as their validity.


5.2 IB Management Simulator OpenSM Test Flows:

The simulator provides ability to simulate the SM handling of virtual
topologies that are not limited to actual lab equipment availability.
OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
regressions use smaller (16 and 128 nodes clusters).

The following test flows are run on the IB management simulator:

* Stability:
  Up to 12 links from the fabric are randomly selected to drop packets
  at drop rates up to 90%. The SM is required to succeed in bringing the
  fabric up. The resulting routing is verified to be correct, too.

* LID Manager:
  Using LMC = 2 the fabric is initialized with LIDs. Faults such as
  zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
  randomly assigned to various nodes and other errors are randomly
  output to the guid2lid cache file. The SM sweep is run 5 times and
  after each iteration a complete verification is made to ensure that all
  LIDs that could possibly be maintained are kept, as well as that all nodes
  were assigned a legal LID range.

* Multicast Routing:
  Nodes randomly join the 0xc000 group and eventually the
  resulting routing is verified for completeness and adherence to
  Up/Down routing rules.

* osmtest:
  The complete osmtest flow as described in the previous table is run on
  the simulated fabrics.

* Stress Test:
  This flow merges fabric, LID and stability issues with continuous
  PathRecord, ServiceRecord and Multicast Join/Leave activity to
  stress the SM/SA during continuous sweeps.

5.3 OpenSM Regression

Using a back-to-back or single switch connection, the following set of
tests is run nightly on the stacks described in table 2. The included
tests are:

* Stress Testing: Flood the SA with queries from multiple channel
  adapters to check the robustness of the entire stack up to the SA.

* Dynamic Changes: Dynamic Topology changes, through randomly
  dropping SMP packets, used to test OpenSM adaptation to an unstable
  network & verify DB correctness.

* Trap Injection: This flow injects traps to the SM and verifies that it
  handles them gracefully.

* SA Query Test: This test exhaustively checks the SA responses to all
  possible single component mask. To do that the test examines the
  entire set of records the SA can provide, classifies them by their
  field values and then selects every field (using component mask and a
  value) and verifies that the response matches the expected set of records.
  A random selection using multiple component mask bits is also performed.

5.4 Cluster testing:

Cluster testing is usually run before a distribution release. It
involves real hardware setups of 16 to 32 nodes (or more if a beta site
is available). Each test is validated by running all-to-all ping through the IB
interface. The test procedure includes:

* Cluster bringup

* Hand-off between 2 or 3 SM's while performing
  - Node reboots
  - Switch power cycles (disconnecting the SM's)

* Unresponsive port detection and recovery

* osmtest from multiple nodes

* Trap injection and recovery


6 Qualification
----------------

Table 2 - Qualified IB Stacks
=============================

Stack                                    | Version
-----------------------------------------|--------------------------
OFED                                     |   1.0
OpenIB Gen2 (IBG2 distribution)          |   1.0
OpenIB Gen1 (IBGD distribution)          |   1.8.0
VAPI (Mellanox InfiniBand HCA Driver)    |   3.2 and later

Table 3 - Qualified Devices and Corresponding Firmware
======================================================

Mellanox
Device  |   FW versions
--------|-----------------------------------------------------------
MT43132 |   InfiniScale - fw-43132  5.2.0 (and later)
MT47396 |   InfiniScale III - fw-47396 0.5.0 (and later)
MT23108 |   InfiniHost - fw-23108   3.3.2
MT25204 |   InfiniHost III Lx - fw-25204  1.0.1
MT25208 |   InfiniHost III Ex (InfiniHost Mode) - fw-25208  4.6.2 (and later)
MT25208 |   InfiniHost III Ex (MemFree Mode) - fw-25218  5.0.1 (and later)

QLogic/PathScale
Device  |   Note
--------|-----------------------------------------------------------
iPath   | QHT6040 (PathScale InfiniPath HT-460)
iPath   | QHT6140 (PathScale InfiniPath HT-465)
iPath   | QLE6140 (PathScale InfiniPath PE-880)

Note: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
QP0 and QP1. However, it does support it as a device on the subnet.