Blob Blame History Raw
                        OpenSM Release Notes 3.0.13
                       =============================

Version: OpenFabrics Enterprise Distribution (OFED) 1.2
Repo:    git://git.openfabrics.org/~ofed_1_2/management.git (release)
         git://git.openfabrics.org/~halr/management.git (development)
Date:    June 2007

1 Overview
----------
This document describes the contents of the OpenSM OFED 1.2 release.
OpenSM is an InfiniBand compliant Subnet Manager and Administration,
and runs on top of OpenIB. The OpenSM version for this release
is openib-3.0.13

This document includes the following sections:
1 This Overview section (describing new features and software
  dependencies)
2 Known Issues And Limitations
3 Unsupported IB compliance statements
4 Major Bug Fixes
5 Main Verification Flows
6 Qualified software stacks and devices

1.1 Major New Features

* Routing improvements
  Two additional routing algorithms have been added in addition to
  performance improvements to the existing routing algorithms. The
  two new routing algorithms are FAT tree and LASH. See the
  opensm man page for additional details.

* SA Optional Record support now "virtually" complete
  Includes SA InformInfo improvements and InformInfoRecord support in
  addition to support for the remaining SA optional records
  (MulticastForwardingTableRecord, SwitchInfoRecord). Also, SMInfoRecord
  support was improved to include all SMs found.

* SA database dump/restore
  OpenSM now includes the ability to dump and restore the SA database.
  This allows for all SA registrations (multicast, services, and events)
  to be saved and restored later.

  In verbose mode, OpenSM will dump SA DB (existing multicast groups,
  services and InformInfo) into dump file which named "opensm-sa.dump"
  and located under standard OpenSM dump directory (/var/log by default).

  If option -S is specified and SA DB dump file name is provided, OpenSM
  will try to restore SA database from this file. And if restore is
  successful, OpenSM won't ask for client reregistration at subnet bring-up.

* Modular routing for multicast
  In conjunction was SA database dump/restore, there is the ability to
  dump and load switch lid matrices (min hops tables) which are used
  for multicast route calculation.

* IB router enablement
  OpenSM now supports router ports properly (in terms of PortInfo handling).
  There is also some experimental support for IB routers which is enabled
  via the ROUTER_EXP compile flag. This support includes SA PathRecord and
  MCMemberRecord support for off subnet GIDs.

* Socket support added to console
  OpenSM console now supports remote in addition to local access.
  Remote access is currently via telnet.

1.2 Minor New Features:

* Change output format of DR path from hex to decimal port numbers

* Log rotation
  The OpenSM log can now be rotated while OpenSM is running (without
  stopping and restarting OpenSM). This is accomplished via SIGUSR1.

* Support scope for IPoIB multicast groups in partition config

* Dump filename changed from subnet.lst to osm-subnet.lst
  Default temp directory for non Windows platforms was previously changed
  from /tmp to /var/log.

* Add option for force SDR link speed
  Add option to opensm.opts to force link speed. Currently, only forcing
  to SDR link speed is supported. This option is not supported as a
  command line option.

1.3 Library API Changes

  None

1.4 Software Dependencies

OpenSM depends on the installation of either OFED 1.2, OFED 1.1,
OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD
distribution), or Mellanox VAPI stacks. The qualified driver versions
are provided in Table 2, "Qualified IB Stacks".

1.5 Supported Devices Firmware

The main task of OpenSM is to initialize InfiniBand devices. The
qualified devices and their corresponding firmware versions
are listed in Table 3.

2 Known Issues And Limitations
------------------------------

* No Service / Key associations:
  There is no way to manage Service access by Keys.

* No SM to SM SMDB synchronization:
  Puts the burden of re-registering services, multicast groups, and
  inform-info on the client application (or IB access layer core).

* No "port down" event handling:
  Changing the switch port through which OpenSM connects to the IB
  fabric may cause incorrect operation. Please restart OpenSM whenever
  such a connectivity change is made.

* Changing connections during SM operation:
  Under some conditions the SM can get confused by a change in
  cabling (moving a cable from one switch port to the other) and
  momentarily see this as having the same GUID appear connected
  to two different IB ports. Under some conditions, when the SM fails to
  get the corresponding change event it might mistakenly report this case
  as a "duplicated GUID" case and abort. It is advisable to double-check
  the syslog after each such change in connectivity and restart
  OpenSM if it has exited. The same error ("duplicated GUID") will
  also appear with a loopback plug.

3 Unsupported IB Compliance Statements
--------------------------------------
The following section lists all the IB compliance statements which
OpenSM does not support. Please refer to the IB specification for detailed
information regarding each compliance statement.

* C14-22 (Authentication):
  M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
  SubnSet method. As a work-around, an OpenSM option is provided for
  defining the protect bits.

* C14-67 (Authentication):
  On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
  the SM shall generate a SubnGetResp if the M_Key matches, or
  silently drop the packet if M_Key does not match.

* C15-0.1.23.4 (Authentication):
  InformInfoRecords shall always be provided with the QPN set to 0,
  except for the case of a trusted request, in which case the actual
  subscriber QPN shall be returned.

* o13-17.1.2 (Event-FWD):
  If no permission to forward, the subscription should be removed and
  no further forwarding should occur.

* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
  GUIDInfo - SM should enable assigning Port GUIDInfo.

* C14-44 (Initialization):
  If the SM discovers that it is missing an M_Key to update CA/RT/SW,
  it should notify the higher level.

* C14-62.1.1.12 (Initialization):
  PortInfo:M_Key - Set the M_Key to a node based random value.

* C14-62.1.1.13 (Initialization):
  PortInfo:P_KeyProtectBits - set according to an optional policy.

* C14-62.1.1.24 (Initialization):
  SwitchInfo:DefaultPort - should be configured for random FDB.

* C14-62.1.1.32 (Initialization):
  RandomForwardingTable should be configured.

* o15-0.1.12 (Multicast):
  If the JoinState is SendOnlyNonMember = 1 (only), then the endport
  should join as sender only.

* o15-0.1.8 (Multicast):
  If a request for creating an MCG with fields that cannot be met,
  return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).

* C15-0.1.8.6 (SA-Query):
  Respond to SubnAdmGetTraceTable - this is an optional attribute.

* C15-0.1.13 Services:
  Reject ServiceRecord create, modify or delete if the given
  ServiceP_Key does not match the one included in the ServiceGID port
  and the port that sent the request.

* C15-0.1.14 (Services):
  Provide means to associate service name and ServiceKeys.

4 Major Bug Fixes
-----------------

The following is a list of bugs that were fixed. Note that other less critical
or visible bugs were also fixed.

* osm_sminfo_rcv.c: Add SMInfo self query check. OpenSM can query
  itself for SMInfo occassionally due to port moving during subnet
  discovery process. Don't create remote SM entry in this case to
  prevent deadlocks.

* osm_ucast_updn.c: Two similar bugs in up/down routing fixed.
  8-bit integers were used as indexes when scanning subnet, which
  in one case caused OpenSM to crash when ranking "path" is longer
  than 256 switches, and in the other case, caused OpenSM to go into
  an infinite loop when fabric has more than 256 roots.

* osm_sm_state_mgr.c: In __osm_sm_state_mgr_send_master_sm_info_req,
  handle master GUID port not found properly

* osm_sa_multipath_record.c: In __osm_mpr_rcv_get_path_parms, return
  IB_NOT_FOUND rather than IB_ERROR when can't route to LID from switch

* osm_sa_path_record.c: In __osm_pr_rcv_get_path_parms, return IB_NOT_FOUND
  rather than IB_ERROR when can't route to LID from switch

* osm_vendor_ibumad.c:  In osm_vendor_set_sm, set issmfd to
  -1 on open error

* osm_vendor_ibumad: Termination crash fix
  When OpenSM is terminated umad_receiver thread still running even after
  the structures are destroyed and freed, this causes to random (but easily
  reproducible) crashes. The reason is that osm_vendor_delete() does not
  care about thread termination. This patch adds the receiver thread
  cancellation (by using pthread_cancel() and pthread_join()) and cares to
  keep have all mutexes unlocked upon termination. There is also minor
  termination code consolidation - osm_vendor_port_close() function.

* osm_port_profile.h: Fix reinsertion issue in osm_port_prof_set_ignored_port

* osm_matrix.h: Fix segfault with up/down and root nodes file

* osm_sa_path_record.c: In osm_pr_rcv_process, fix endian of hop_limit

* osm_vendor_ibumad.c: Close umad port in osm_vendor_delete

* osm_sa_(multipath path)_record.c: Fix MultiPathRecord/PathRecord issues
  with using MTU/rate/PktLife explicitly ignoring selectors

  OpenSM just uses the resulting path MTU/rate/pkt-life and fail the
  query even though the selector might be allowing for selecting an
  appropriate value.

  After this fix, the following results are obtained for a case of
  path allowing maximal 2K MTU.

In standard mode:
------------------------------------------------------------
MTU greater than ... 256     (0x01) ->  equal to ....... 2K
MTU less than ...... 256     (0x41) ->  NO PATHS
MTU equal to ....... 256     (0x81) ->  equal to ....... 256
MTU largest possible 256     (0xc1) ->  equal to ....... 2K
MTU greater than ... 512     (0x02) ->  equal to ....... 2K
MTU less than ...... 512     (0x42) ->  equal to ....... 256
MTU equal to ....... 512     (0x82) ->  equal to ....... 512
MTU largest possible 512     (0xc2) ->  equal to ....... 2K
MTU greater than ... 1K      (0x03) ->  equal to ....... 2K
MTU less than ...... 1K      (0x43) ->  equal to ....... 512
MTU equal to ....... 1K      (0x83) ->  equal to ....... 1K
MTU largest possible 1K      (0xc3) ->  equal to ....... 2K
MTU greater than ... 2K      (0x04) ->  NO PATHS
MTU less than ...... 2K      (0x44) ->  equal to ....... 1K
MTU equal to ....... 2K      (0x84) ->  equal to ....... 2K
MTU largest possible 2K      (0xc4) ->  equal to ....... 2K
MTU greater than ... 4K      (0x05) ->  NO PATHS
MTU less than ...... 4K      (0x45) ->  equal to ....... 2K
MTU equal to ....... 4K      (0x85) ->  NO PATHS
MTU largest possible 4K      (0xc5) ->  equal to ....... 2K
============================================================

With enable_quirks (when one of the ends is a Tavor device):
------------------------------------------------------------
MTU greater than ... 256     (0x01) ->  equal to ....... 1K
MTU less than ...... 256     (0x41) ->  NO PATHS
MTU equal to ....... 256     (0x81) ->  equal to ....... 256
MTU largest possible 256     (0xc1) ->  equal to ....... 2K
MTU greater than ... 512     (0x02) ->  equal to ....... 1K
MTU less than ...... 512     (0x42) ->  equal to ....... 256
MTU equal to ....... 512     (0x82) ->  equal to ....... 512
MTU largest possible 512     (0xc2) ->  equal to ....... 2K
MTU greater than ... 1K      (0x03) ->  NO PATHS
MTU less than ...... 1K      (0x43) ->  equal to ....... 512
MTU equal to ....... 1K      (0x83) ->  equal to ....... 1K
MTU largest possible 1K      (0xc3) ->  equal to ....... 2K
MTU greater than ... 2K      (0x04) ->  NO PATHS
MTU less than ...... 2K      (0x44) ->  equal to ....... 1K
MTU equal to ....... 2K      (0x84) ->  equal to ....... 2K
MTU largest possible 2K      (0xc4) ->  equal to ....... 2K
MTU greater than ... 4K      (0x05) ->  NO PATHS
MTU less than ...... 4K      (0x45) ->  equal to ....... 1K
MTU equal to ....... 4K      (0x85) ->  NO PATHS
MTU largest possible 4K      (0xc5) ->  equal to ....... 2K
============================================================

* osm_pkey_rcv.c: rwlock double release fix
  When the port is removed from subnet, but previously requested pkey
  table block is received after this - the lock will be released twice.
  This leads to deadlocks later when other MAD processor will try to
  acquire the same lock.

* osm_sa_informinfo.c: Fix InformInfoRecord searches

* Better SA MCMemberRecord leave locking
  Hold locked multicast group leave request (MCMember Record) processing.
  This prevents kind of race with multicast group join request where
  those requests can be reordered during processing.

* osm_sa_informinfo.c: Conformance changes for subscribe component

* osm_sa_path_record.c: Handle LID 0 as error

* Fix comparing InformInfo records
  1. The received InformInfo struct was modified before dumping it.
  2. The function that compares InformInfo structures was just
     comparing the whole memory allocated for it, including reserved
     fields. Fixed to compare more selectively.

  As for QPN, from the IB spec, table 119 InformInfo:
  QPN : Ignored except when subscribe=0 (an unsubscribe
  request). Queue pair to which Report()s were sent as
  a result of a corresponding subscription. If no
  subscription for this Report() with this QPN exists,
  the request to unsubscribe performs no action and
  produces GetResp() with status indicating an invalid
  field value.

* osm_trap_rcv.c: Reduce repeated trap messages so log doesn't fill
  so quickly

* osm_helper.c: Fix stack smashing detected problem in osm_dump_service_record

* Fix permission on db files directory
  When creating directory for db files (guid2lid) storing create it with
  reasonable permissions (current 777 decimal = octal 01411) and don't do
  it world writable.

* Fix node_desc.description as string usages

5 Main Verification Flows
-------------------------

OpenSM verification is run using the following activities:
* osmtest - a stand-alone program
* ibmgtsim (IB management simulator) based - a set of flows that
  simulate clusters, inject errors and verify OpenSM capability to
  respond and bring up the network correctly.
* small cluster regression testing - where the SM is used on back to
  back or single switch configurations. The regression includes
  multiple OpenSM dedicated tests.
* cluster testing - when we run OpenSM to setup a large cluster, perform
  hand-off, reboots and reconnects, verify routing correctness and SA
  responsiveness at the ULP level (IPoIB and SDP).

5.1 osmtest

osmtest is an automated verification tool used for OpenSM
testing. Its verification flows are described by list below.

* Inventory File: Obtain and verify all port info, node info, link and path
  records parameters.

* Service Record:
   - Register new service
   - Register another service (with a lease period)
   - Register another service (with service p_key set to zero)
   - Get all services by name
   - Delete the first service
   - Delete the third service
   - Added bad flows of get/delete  non valid service
   - Add / Get same service with different data
   - Add / Get / Delete by different component  mask values (services
     by Name & Key / Name & Data / Name & Id / Id only )

* Multicast Member Record:
   - Query of existing Groups (IPoIB)
   - BAD Join with insufficient comp mask (o15.0.1.3)
   - Create given MGID=0 (o15.0.1.4)
   - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
   - Create BAD MGID=0xFA. (o15.0.1.6)
   - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
   - New MGID with invalid join state (o15.0.1.9)
   - Retry of existing MGID - See JoinState update (o15.0.1.11)
   - BAD RATE when connecting to existing MGID (o15.0.1.13)
   - Partial JoinState delete request - removing FullMember (o15.0.1.14)
   - Full Delete of a group (o15.0.1.14)
   - Verify Delete by trying to Join deleted group (o15.0.1.14)
   - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)

* GUIDInfo Record:
   - All GUIDInfoRecords in subnet are obtained

* MultiPathRecord:
   - Perform some compliant and noncompliant MultiPathRecord requests
   - Validation is via status in responses and IB analyzer

* PKeyTableRecord:
  - Perform some compliant and noncompliant PKeyTableRecord queries
  - Validation is via status in responses and IB analyzer

* LinearForwardingTableRecord:
  - Perform some compliant and noncompliant LinearForwardingTableRecord queries
  - Validation is via status in responses and IB analyzer

* Event Forwarding: Register for trap forwarding using reports
   - Send a trap and wait for report
   - Unregister non-existing

* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
  disconnecting/connecting ports) and wait for report, then unregister.

* Stress Test: send PortInfoRecord queries, both single and RMPP and
  check for the rate of responses as well as their validity.


5.2 IB Management Simulator OpenSM Test Flows:

The simulator provides ability to simulate the SM handling of virtual
topologies that are not limited to actual lab equipment availability.
OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
regressions use smaller (16 and 128 nodes clusters).

The following test flows are run on the IB management simulator:

* Stability:
  Up to 12 links from the fabric are randomly selected to drop packets
  at drop rates up to 90%. The SM is required to succeed in bringing the
  fabric up. The resulting routing is verified to be correct as well.

* LID Manager:
  Using LMC = 2 the fabric is initialized with LIDs. Faults such as
  zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
  randomly assigned to various nodes and other errors are randomly
  output to the guid2lid cache file. The SM sweep is run 5 times and
  after each iteration a complete verification is made to ensure that all
  LIDs that could possibly be maintained are kept, as well as that all nodes
  were assigned a legal LID range.

* Multicast Routing:
  Nodes randomly join the 0xc000 group and eventually the
  resulting routing is verified for completeness and adherence to
  Up/Down routing rules.

* osmtest:
  The complete osmtest flow as described in the previous table is run on
  the simulated fabrics.

* Stress Test:
  This flow merges fabric, LID and stability issues with continuous
  PathRecord, ServiceRecord and Multicast Join/Leave activity to
  stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
  were added to the test such both existing and non existing nodes
  perform them in random order.

5.3 OpenSM Regression

Using a back-to-back or single switch connection, the following set of
tests is run nightly on the stacks described in table 2. The included
tests are:

* Stress Testing: Flood the SA with queries from multiple channel
  adapters to check the robustness of the entire stack up to the SA.

* Dynamic Changes: Dynamic Topology changes, through randomly
  dropping SMP packets, used to test OpenSM adaptation to an unstable
  network & verify DB correctness.

* Trap Injection: This flow injects traps to the SM and verifies that it
  handles them gracefully.

* SA Query Test: This test exhaustively checks the SA responses to all
  possible single component mask. To do that the test examines the
  entire set of records the SA can provide, classifies them by their
  field values and then selects every field (using component mask and a
  value) and verifies that the response matches the expected set of records.
  A random selection using multiple component mask bits is also performed.

5.4 Cluster testing:

Cluster testing is usually run before a distribution release. It
involves real hardware setups of 16 to 32 nodes (or more if a beta site
is available). Each test is validated by running all-to-all ping through the IB
interface. The test procedure includes:

* Cluster bringup

* Hand-off between 2 or 3 SM's while performing:
  - Node reboots
  - Switch power cycles (disconnecting the SM's)

* Unresponsive port detection and recovery

* osmtest from multiple nodes

* Trap injection and recovery


6 Qualification
----------------

Table 2 - Qualified IB Stacks
=============================

Stack                                    | Version
-----------------------------------------|--------------------------
OFED                                     |   1.2
OFED                                     |   1.1
OFED                                     |   1.0
OpenIB Gen2 (IBG2 distribution)          |   1.0
OpenIB Gen1 (IBGD distribution)          |   1.8.0
VAPI (Mellanox InfiniBand HCA Driver)    |   3.2 and later

Table 3 - Qualified Devices and Corresponding Firmware
======================================================

Mellanox
Device  |   FW versions
--------|-----------------------------------------------------------
MT43132 |   InfiniScale - fw-43132  5.2.0 (and later)
MT47396 |   InfiniScale III - fw-47396 0.5.0 (and later)
MT23108 |   InfiniHost - fw-23108   3.3.2 (and later)
MT25204 |   InfiniHost III Lx - fw-25204  1.0.1i (and later)
MT25208 |   InfiniHost III Ex (InfiniHost Mode) - fw-25208  4.6.2 (and later)
MT25208 |   InfiniHost III Ex (MemFree Mode) - fw-25218  5.0.1 (and later)

QLogic/PathScale
Device  |   Note
--------|-----------------------------------------------------------
iPath   | QHT6040 (PathScale InfiniPath HT-460)
iPath   | QHT6140 (PathScale InfiniPath HT-465)
iPath   | QLE6140 (PathScale InfiniPath PE-880)

Note: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
QP0 and QP1. However, it does support it as a device on the subnet.