|
Packit |
13e616 |
OpenSM Release Notes
|
|
Packit |
13e616 |
======================
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Release: IBG2
|
|
Packit |
13e616 |
Repo: https://openib.org/svn/trunk/contrib/mellanox/gen2/src/userspace/management/osm
|
|
Packit |
13e616 |
Version: 4956
|
|
Packit |
13e616 |
Date: Jan 2006
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1 Overview
|
|
Packit |
13e616 |
----------
|
|
Packit |
13e616 |
This document describes the contents of the OpenSM IBG2 release.
|
|
Packit |
13e616 |
OpenSM is an InfiniBand compliant Subnet Manager and Administrator,
|
|
Packit |
13e616 |
and runs on top of OpenIB.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
This document includes the following sections:
|
|
Packit |
13e616 |
1 This Overview section (describing new features and software
|
|
Packit |
13e616 |
dependencies)
|
|
Packit |
13e616 |
2 Known Issues And Limitations
|
|
Packit |
13e616 |
3 Unsupported IB compliancy statements
|
|
Packit |
13e616 |
4 Major Bug Fixes
|
|
Packit |
13e616 |
5 Main Verification Flows
|
|
Packit |
13e616 |
6 Qualified software stacks and devices
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1.1 New Features
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* New libs created during installation: libopensm - contains interface
|
|
Packit |
13e616 |
to the logging and mads pool machanism. libosmcomp - contains
|
|
Packit |
13e616 |
interface to the complib utilities. libosmvendor - contains
|
|
Packit |
13e616 |
interface to sending/receiving MADs through the SMI or GSI over the
|
|
Packit |
13e616 |
IBG2 driver.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Change building mechanism to use autotools.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Change directory stucturing of the OpenSM code according to libs:
|
|
Packit |
13e616 |
osm/libvendor - for vendor specific files. osm/complib - for complib
|
|
Packit |
13e616 |
specific files. osm/opensm - for opensm core files. osm/include
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Semi-static LID assignment: OpenSM uses a cache file for storing all
|
|
Packit |
13e616 |
LID assignments such that, even after a reboot, the LIDs do not
|
|
Packit |
13e616 |
change. The static LID assignment is built on top of a new
|
|
Packit |
13e616 |
"persistancy" layer that abstracts that actual database from its
|
|
Packit |
13e616 |
usage. The implemented database is based on files stored under
|
|
Packit |
13e616 |
/var/cache/osm (this location can be overriden via the environment
|
|
Packit |
13e616 |
variable OSM_CACHE_DIR). Other implementations can use LDAP for
|
|
Packit |
13e616 |
example. Note that a standby SM ignores its previously assigned LIDs
|
|
Packit |
13e616 |
when it becomes the master, and the previous master LID settings are
|
|
Packit |
13e616 |
used.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Irresponsive Port Handling: A port that does not respond to SM
|
|
Packit |
13e616 |
queries will be queried upon future light or heavy sweeps, and if
|
|
Packit |
13e616 |
then it responds, it will be setup immediately. Previously such a
|
|
Packit |
13e616 |
port was queried only upon a heavy sweep.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Leaf Switch Port HOQ: A different maximal head of queue life time is
|
|
Packit |
13e616 |
assigned to switch ports connected to HCAs such that a bad chipset
|
|
Packit |
13e616 |
or defective hardware will not cause back presure on the fabric.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OSM_TMP_DIR: This is a new environment variable controlling the
|
|
Packit |
13e616 |
directory where subnet.lst, osm.fdbs and osm.mcfdbs files are
|
|
Packit |
13e616 |
created. The deafult is still /tmp.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Configuration Options cache file: OpenSM was enhanced to provide a
|
|
Packit |
13e616 |
means to modify all its internal configuration options, including
|
|
Packit |
13e616 |
the ones that oreviously were only available under osmsh. The new
|
|
Packit |
13e616 |
file is located under the cache directory and is named
|
|
Packit |
13e616 |
opensm.opts. To automatically create this file OpenSM supports a new
|
|
Packit |
13e616 |
flag: `-c'. The file is generated with the current set of options
|
|
Packit |
13e616 |
used by OpenSM.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Previously, under extreme load conditions, when OpenSM got
|
|
Packit |
13e616 |
overloaded with SA queries during which the incoming messages queue
|
|
Packit |
13e616 |
also grew, delays were incurred in message response-time beyond the
|
|
Packit |
13e616 |
expected. This new version of OpenSM has been enhanced such that,
|
|
Packit |
13e616 |
under such a case, incoming new SA queries are returned with a
|
|
Packit |
13e616 |
RESOURCE_BUSY status (per the InfiniBand Architecture
|
|
Packit |
13e616 |
Specification).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Kill -HUP: If the OpenSM process (ps -efww |grep opensm.bin) gets a
|
|
Packit |
13e616 |
SIGHUP (sent by kill -HUP), it will start a heavy sweep as if a trap
|
|
Packit |
13e616 |
was received or a change in topology was observed by the SM.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1.2 Software Dependencies
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
OpenSM depends on the installation of either OpenIB gen2 (e.g. IBG2
|
|
Packit |
13e616 |
distribution), OpenIB gen1 (r.g. IBGD distribution) or Mellanox VAPI
|
|
Packit |
13e616 |
stacks. The qualified driver versions are provided in Table 2,
|
|
Packit |
13e616 |
"Qualified IB Stacks".
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1.4 Supported Devices Firmware
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
The main task of OpenSM is to initialize InfiniBand devices. The
|
|
Packit |
13e616 |
qualified devices and their corresponding firmware versions
|
|
Packit |
13e616 |
listed in Table 3.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
2 Known Issues And Limitations
|
|
Packit |
13e616 |
------------------------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* No Partition/Pkey policy support:
|
|
Packit |
13e616 |
OpenSM does not provide means to set poartitions.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* IB "trusted" concept is unsupported:
|
|
Packit |
13e616 |
Queries that should be classified according to the trustworthiness of
|
|
Packit |
13e616 |
their sources will not be handled correctly.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* No Service / Key associations:
|
|
Packit |
13e616 |
There is no way to manage Service access by Keys.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* No SM to SM SMDB synchronization:
|
|
Packit |
13e616 |
Puts the burden of re-registering services, multicast groups, and
|
|
Packit |
13e616 |
inform-info on the client application (or IB access layer core).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* NPTL problem under Red Hat 9.0, Red Hat AS 3.0:
|
|
Packit |
13e616 |
There are some bugs (pthread conditional wait missing events)
|
|
Packit |
13e616 |
with thread handling when using the dynamic Native POSIX Thread
|
|
Packit |
13e616 |
Library (/lib/tls) of Red Hat 9.0 & Red Hat AS 3.0 OSs. To overcome
|
|
Packit |
13e616 |
that, OpenSM installation places wrapper scripts named opensm and
|
|
Packit |
13e616 |
osmtest in the /usr/bin directory, which preload the standard libc
|
|
Packit |
13e616 |
and libptherad before invoking the executables. If using the osm
|
|
Packit |
13e616 |
package, a similar workaround is possible by putting the LD_PRELOAD
|
|
Packit |
13e616 |
setting in .tclshrc file, for example: set env (LD_PRELOAD)
|
|
Packit |
13e616 |
"/lib/libc.so.6:/lib/libpthread.so.0"
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* InformInfo failure over IBMGT:
|
|
Packit |
13e616 |
OpenSM might not respect a valid InformInfo unsubscribe request when
|
|
Packit |
13e616 |
running over Mellanox's IBMGT user level MAD interface (not on
|
|
Packit |
13e616 |
IBGD). This will be fixed in the next release.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* No "port down" event handling:
|
|
Packit |
13e616 |
Changing the switch port through which OpenSM connects to the IB
|
|
Packit |
13e616 |
fabric may cause wrong operation. Please restart OpenSM whenever
|
|
Packit |
13e616 |
such a connectivity change is made.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
3 Unsupported IB Compliancy Statements
|
|
Packit |
13e616 |
--------------------------------------
|
|
Packit |
13e616 |
The following section lists all the IB compliancy statements which
|
|
Packit |
13e616 |
OpenSM does not support. Please refer to IB specification for detailed
|
|
Packit |
13e616 |
information on each compliancy statement.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-22 (Authentication):
|
|
Packit |
13e616 |
M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
|
|
Packit |
13e616 |
SubnSet method. As a work-around, an OpenSM option is provided for
|
|
Packit |
13e616 |
defining the protect bits.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-67 (Authentication):
|
|
Packit |
13e616 |
On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
|
|
Packit |
13e616 |
the SM shall generate a SubnGetResp if the M_Key is matching or
|
|
Packit |
13e616 |
silently drop the packet if M_Key is not matching.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.23.1 (Authentication):
|
|
Packit |
13e616 |
PortInfoRecords shall always be provided with the M_Key component
|
|
Packit |
13e616 |
set to 0, except in the case of a trusted request, in which case the
|
|
Packit |
13e616 |
actual M_Key component contents shall be provided.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.23.2 (Authentication):
|
|
Packit |
13e616 |
P_KeyTableRecords and ServiceAssociationRecords shall only be
|
|
Packit |
13e616 |
provided in responses to trusted requests.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.23.4 (Authentication):
|
|
Packit |
13e616 |
InformInfoRecords shall always be provided with the QPN set to
|
|
Packit |
13e616 |
0, except for the case of a trusted request, in which case the actual
|
|
Packit |
13e616 |
subscriber QPN shall be returned.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* o13-17.1.2 (Event-FWD):
|
|
Packit |
13e616 |
If no permission to forward, the subscription should be removed and
|
|
Packit |
13e616 |
no further forwarding should occur.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-37.1.2 (Handover):
|
|
Packit |
13e616 |
Priority should be kept in non-volatile memory.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-38.1.1 (Handover):
|
|
Packit |
13e616 |
Support AttributeModifier values in SubnSet(SMInfo). If the state
|
|
Packit |
13e616 |
transition requested is invalid - return with status code 7.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
|
|
Packit |
13e616 |
GUIDInfo - SM should enable assigning Port GUIDInfo.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-44 (Initialization):
|
|
Packit |
13e616 |
If the SM discovers that it is missing an M_Key to update CA/RT/SW,
|
|
Packit |
13e616 |
it should notify the higher level.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-62.1.1.11 (Initialization):
|
|
Packit |
13e616 |
PortInfo:VLHighLimit should match the configured VLArb on the port.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-62.1.1.12 (Initialization):
|
|
Packit |
13e616 |
PortInfo:M_Key - Set the M_Key to a node based random value.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-62.1.1.13 (Initialization):
|
|
Packit |
13e616 |
PortInfo:P_KeyProtectBits - set according to an optional policy.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-62.1.1.24 (Initialization):
|
|
Packit |
13e616 |
SwitchInfo:DefaultPort - should be configured for random FDB.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-62.1.1.32 (Initialization):
|
|
Packit |
13e616 |
RandomForwardingTable should be configured.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* o15-0.1.12 (Multicast):
|
|
Packit |
13e616 |
If the JoinState is SendOnlyNonMember = 1 (only), then the endport
|
|
Packit |
13e616 |
should join as sender only.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* o15-0.1.13 (Multicast):
|
|
Packit |
13e616 |
If a Join request using unrealistic parameters is received, return
|
|
Packit |
13e616 |
ERR_REQ_INVALID.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* o15-0.1.8 (Multicast):
|
|
Packit |
13e616 |
If a request for creating an MCG with fields that cannot be met,
|
|
Packit |
13e616 |
return ERR_REQ_INVALID (currently ignoring SL and FlowLabelTclass).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.11 (SA-Query):
|
|
Packit |
13e616 |
Query response should use only base LIDs (as the feature has not
|
|
Packit |
13e616 |
been qualified yet).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.19 (SA-Query):
|
|
Packit |
13e616 |
Respond to SubnGetMulti(MultiPathRec)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.8.6 (SA-Query):
|
|
Packit |
13e616 |
Respond to SubnAdmGetTraceTable - this is an optional attribute.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.8.7 (SA-Query):
|
|
Packit |
13e616 |
SubnAdmGetMulti SubnAdmGetMultiResp - Only in case of a MultiPath.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-X.Y.Z.W (SA-Query):
|
|
Packit |
13e616 |
SubAdmGet/GetTable GUIDInfo - support GUIDInfo setting/retrieval.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.13 Services:
|
|
Packit |
13e616 |
Reject ServiceRecord create, modify or delete if the given
|
|
Packit |
13e616 |
ServiceP_Key does not match the one included in the ServiceGID port
|
|
Packit |
13e616 |
and the port that sent the request.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.14 (Services):
|
|
Packit |
13e616 |
Provide means to associate service name and ServiceKeys.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
4 Major Bug Fixes
|
|
Packit |
13e616 |
-----------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
The following list of bugs were fixed. Note that other less critical
|
|
Packit |
13e616 |
or visible bugs were also fixed.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* PortInfo query was not matching on several fields. These fields
|
|
Packit |
13e616 |
were added to teh comparison function.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OpenSM would crash during exit flow if run with "-o" flag A fix to
|
|
Packit |
13e616 |
the complib global timer destruction sequence solves this problem.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OpenSM does not complete the sweep if the driver fails to send a MAD
|
|
Packit |
13e616 |
Counting the number of outstanding MADs the SM waits for response
|
|
Packit |
13e616 |
for was enhanced to take this acse into acount
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OpenSM was not compliant to the spec statement: C14.62.1.1 Table 183
|
|
Packit |
13e616 |
p870 l34: ".., the SM shall ensure that one of the P_KeyTable
|
|
Packit |
13e616 |
entries in every node contains either the value 0xFFFF (the default
|
|
Packit |
13e616 |
P_Key, full membership) or the value 0x7FFF (the default P_Key,
|
|
Packit |
13e616 |
partial membership)." OpenSM sets the PKey table with an entry of
|
|
Packit |
13e616 |
0xffff in case there is no such entry or 0x7fff entries on that
|
|
Packit |
13e616 |
port. Switch ports are ignored.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* If the SA is queried with IB_PIR_COMPMASK_BASELID and base_lid of 0,
|
|
Packit |
13e616 |
the SA was incorrectly returning all the ports. Fix: do not ignore base
|
|
Packit |
13e616 |
lid of 0 as a query criteria.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* When provided a PathRecord query with num_paths = 0 the SM should
|
|
Packit |
13e616 |
assuem num_paths = 1. Fix: in the PathRecord query code.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* PathRecord query returned a deleted multicast groups info. Fix:
|
|
Packit |
13e616 |
Added a check for multicast group state to avoid such cases.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* LinkRecord query provided wrong results. Fix: in query code.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* PathRecord did not honor PacketLifeTime component. Fix: Added the
|
|
Packit |
13e616 |
check for packet lifetime matching.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Multicast and other registration hapenning all the time on the
|
|
Packit |
13e616 |
cluster. Fix: OpenSM was sending false "client-re-registration"
|
|
Packit |
13e616 |
messages (in PortInfo).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* On some heavy load cases OpenSM would consume 100% CPU time. Fix: an
|
|
Packit |
13e616 |
endless loop in timer implementation that would happen under rare
|
|
Packit |
13e616 |
heavy CPU load cases.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OpenSM hangs during LID assignment phase. Fix: Some condition that
|
|
Packit |
13e616 |
cause that was fixed.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OpenSM core dump in the middle of sweep. Fix: A memory range
|
|
Packit |
13e616 |
overflow write was found by valgrind and fix.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OpenSM core dump as result fo PathRecord query with no results. Fix:
|
|
Packit |
13e616 |
A memory free on non allocated memory was fixed.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OpenSM sweep algorithm confused by a timing race. Fix: A significant
|
|
Packit |
13e616 |
race conditionin the SM sweep algorithm was found and fixed.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OpenSM deadlock due to out of order SMINfo and NodeInfo MAD
|
|
Packit |
13e616 |
received. Fix: A fix in lock ordering resolves this issue.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* TrapRepress sent even if not a master. Fix: in trap receiver.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
5 Main Verification Flows
|
|
Packit |
13e616 |
-------------------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
OpenSM verification is run using the following activities:
|
|
Packit |
13e616 |
* osmtest - a standalone program
|
|
Packit |
13e616 |
* ibmgtsim (IB management simulator) based - a set of flows that
|
|
Packit |
13e616 |
simulate clusters, inject errors and verify OpenSM capability to
|
|
Packit |
13e616 |
respond and bring up the network correctly.
|
|
Packit |
13e616 |
* small cluster regression testing - where the SM is used on back to
|
|
Packit |
13e616 |
back or single switch configuration. The regression includes
|
|
Packit |
13e616 |
multiple OpenSM dedicated tests
|
|
Packit |
13e616 |
* cluster testing - when we run OpenSM to setup large cluster, perform
|
|
Packit |
13e616 |
handoff, reboots and reconnects, verify routing correctness and SA
|
|
Packit |
13e616 |
responsiveness at teh ULP level (IPoIB and SDP)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
5.1 osmtest
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
OsmTest is the main automated verification tool used for OpenSM
|
|
Packit |
13e616 |
testing. Its verification flows are described by list below.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Inventory File: Obtain and verify all port info, node info, and path
|
|
Packit |
13e616 |
records parameters.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Service Record:
|
|
Packit |
13e616 |
- Register new service
|
|
Packit |
13e616 |
- Register another service (with a lease period)
|
|
Packit |
13e616 |
- Register another service (with service p_key set to zero)
|
|
Packit |
13e616 |
- Get all services by name
|
|
Packit |
13e616 |
- Delete the first service
|
|
Packit |
13e616 |
- Delete the third service.
|
|
Packit |
13e616 |
- Added bad flows of get/delete non valid service
|
|
Packit |
13e616 |
- Add / Get same service with different data
|
|
Packit |
13e616 |
- Add / Get / Delete by different component mask values (services
|
|
Packit |
13e616 |
by Name & Key / Name & Data / Name & Id / Id only )
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Multicast Member Record:
|
|
Packit |
13e616 |
- Query of existing Groups (IPoIB)
|
|
Packit |
13e616 |
- BAD Join with insufficient comp mask (o15.0.1.3)
|
|
Packit |
13e616 |
- Create given MGID=0 (o15.0.1.4)
|
|
Packit |
13e616 |
- Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
|
|
Packit |
13e616 |
- Create BAD MGID=0xFA. (o15.0.1.6)
|
|
Packit |
13e616 |
- Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
|
|
Packit |
13e616 |
- New MGID with invalid join state (o15.0.1.9)
|
|
Packit |
13e616 |
- Retry of existing MGID - See JoinState update (o15.0.1.11)
|
|
Packit |
13e616 |
- BAD RATE when connecting to existing MGID (o15.0.1.13)
|
|
Packit |
13e616 |
- Partial JoinState delete request - removing FullMember (o15.0.1.14)
|
|
Packit |
13e616 |
- Full Delete of a group (o15.0.1.14)
|
|
Packit |
13e616 |
- Verify Delete by trying to Join deleted group (o15.0.1.14)
|
|
Packit |
13e616 |
- BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Event Forwarding: Register for trap forwarding using reports
|
|
Packit |
13e616 |
- Send a trap and wait for report
|
|
Packit |
13e616 |
- Unregister non-existing
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
|
|
Packit |
13e616 |
disconnect/connect ports) and wait for report, then unregister.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Stress Test: send PortInfoRecord queries both single and RMPP and
|
|
Packit |
13e616 |
check for the rate of responses as well as their validity.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
5.2 IB Management Simulator OpenSM Test Flows:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
The simulator provides ability to simulate the SM handling of virtual
|
|
Packit |
13e616 |
topologies that are not limitted to actual lab equipment availability.
|
|
Packit |
13e616 |
OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
|
|
Packit |
13e616 |
regressions use smaller (16 and 128 nodes clusters).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
The following test flows are running on the IB management simulator:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Stability:
|
|
Packit |
13e616 |
Up to 12 links from the fabric are randomly selected to drop packets
|
|
Packit |
13e616 |
at drop rates up to 90%. The SM is required to succeed bringing the
|
|
Packit |
13e616 |
fabric up. The reulting routing is verified to be correct too.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* LID Manager:
|
|
Packit |
13e616 |
Using LMC = 2 the fabric is being initialized with LIDs. Faults like
|
|
Packit |
13e616 |
zero LID, Duplicated LID, non-aligned (to LMC) LIDs are being
|
|
Packit |
13e616 |
randomly assigned to various nodes and other errors are randomly
|
|
Packit |
13e616 |
output to the guid2lid cache file. The SM sweep is run 5 times and
|
|
Packit |
13e616 |
after each iteration a complete verification is made to ensure all
|
|
Packit |
13e616 |
LIDs that could possibly be maintained are kept, as well as all nodes
|
|
Packit |
13e616 |
were assigned a legal LID range.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Multicast Routing:
|
|
Packit |
13e616 |
Nodes are randomly joining the 0xc000 group and eventually the
|
|
Packit |
13e616 |
resulting routing is verified for completness and adherance to
|
|
Packit |
13e616 |
Up/Down routing rules.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OsmTest:
|
|
Packit |
13e616 |
The complete osmtest flow as desribed in previous table is run on
|
|
Packit |
13e616 |
the simulated fabrics.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
5.3 OpenSM Regression
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Using a back to back or single switch connection the following set of
|
|
Packit |
13e616 |
tests are run nightly on the stacks described in table 2. The included
|
|
Packit |
13e616 |
tests are:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Stress Testing: Flood the SA with queries from multiple channel
|
|
Packit |
13e616 |
adapters to check the robustness of the entire stack up to the SA.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Dynamic Changes: Dynamic Topology changes, through randomlly
|
|
Packit |
13e616 |
droping SMP packets used to test OpenSM adaptation to unstable
|
|
Packit |
13e616 |
network & verify DB correctness.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Trap Injection: This flow injects traps to the SM and verify it does
|
|
Packit |
13e616 |
handle them gracefully.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* SA Query Test: This test exhoustivly checks the SA responses to all
|
|
Packit |
13e616 |
possible single component mask. To do that the test examine the
|
|
Packit |
13e616 |
entire set of records the SA can provide, classify them by their
|
|
Packit |
13e616 |
field values and then select every field (using component mask and a
|
|
Packit |
13e616 |
value) and verify the response matches the expected set of records.
|
|
Packit |
13e616 |
A random selection using multiple component mask bits is also performed.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
5.4 Cluster testing:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Cluster testing is usually run before a distribution release. It
|
|
Packit |
13e616 |
involves real hardware setup of 16 to 32 nodes (or more if beta site
|
|
Packit |
13e616 |
is available). Each test is validated by running all-to-all ping through IB
|
|
Packit |
13e616 |
interface. The test procedure includes:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Cluster bringup
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Handoff between 2 or 3 SM's while performing
|
|
Packit |
13e616 |
- Node reboots
|
|
Packit |
13e616 |
- Switch power cycles (disconneting the SM's)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Irresponsive port detection and recovery
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* osmtest from multiple nodes
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Trap injection and recovery
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
6 Qualification
|
|
Packit |
13e616 |
----------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Table 2 - Qualified IB Stacks
|
|
Packit |
13e616 |
=============================
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Stack | Version
|
|
Packit |
13e616 |
----------------------------------------|--------------------------
|
|
Packit |
13e616 |
VAPI (Mellanox Infininband HCA Driver) | 3.2 and later
|
|
Packit |
13e616 |
OpenIB Gen1 (IBGD distribution) | 1.8.0
|
|
Packit |
13e616 |
OpenIB Gen2 (IBG2 distribution) | 1.0
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Table 3 - Qualified Devices and Corresponding Firmware
|
|
Packit |
13e616 |
======================================================
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Device | FW versions
|
|
Packit |
13e616 |
--------|-----------------------------------------------------------
|
|
Packit |
13e616 |
MT43132 | InfiniScale - fw-43132 5.2.0 (and later)
|
|
Packit |
13e616 |
MT47396 | InfiniScale III - fw-47396 0.5.0 (and later)
|
|
Packit |
13e616 |
MT23108 | InfiniHost - fw-23108 3.3.2
|
|
Packit |
13e616 |
MT25204 | InfiniHost III Lx - fw-25204 1.0.1
|
|
Packit |
13e616 |
MT25208 | InfiniHost III Ex (InfiniHost Mode) - fw-25208 4.6.2 (and later)
|
|
Packit |
13e616 |
MT25208 | InfiniHost III Ex (MemFree Mode) - fw-25218 5.0.1 (and later)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Other vendors HCAs not yet verified but eHCA is known to be discovered and configured
|
|
Packit |
13e616 |
correctly.
|