|
Packit |
13e616 |
OpenSM Release Notes 3.2
|
|
Packit |
13e616 |
=============================
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Version: OpenSM 3.2.x
|
|
Packit |
13e616 |
Repo: git://git.openfabrics.org/~sashak/management.git
|
|
Packit |
13e616 |
Date: Dec 2008
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1 Overview
|
|
Packit |
13e616 |
----------
|
|
Packit |
13e616 |
This document describes the contents of the OpenSM 3.2 release.
|
|
Packit |
13e616 |
OpenSM is an InfiniBand compliant Subnet Manager and Administration,
|
|
Packit |
13e616 |
and runs on top of OpenIB. The OpenSM version for this release
|
|
Packit |
13e616 |
is opensm-3.2.5
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
This document includes the following sections:
|
|
Packit |
13e616 |
1 This Overview section (describing new features and software
|
|
Packit |
13e616 |
dependencies)
|
|
Packit |
13e616 |
2 Known Issues And Limitations
|
|
Packit |
13e616 |
3 Unsupported IB compliance statements
|
|
Packit |
13e616 |
4 Bug Fixes
|
|
Packit |
13e616 |
5 Main Verification Flows
|
|
Packit |
13e616 |
6 Qualified Software Stacks and Devices
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1.1 Major New Features
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Cached Routing
|
|
Packit |
13e616 |
OpenSM provides an optional unicast routing cache (enabled by '-A' or
|
|
Packit |
13e616 |
'--ucast_cache' options). When enabled, unicast routing cache prevents
|
|
Packit |
13e616 |
routing recalculation (which is a heavy task in a large cluster) when
|
|
Packit |
13e616 |
there was no topology change detected during the heavy sweep, or when
|
|
Packit |
13e616 |
the topology change does not require new routing calculation, e.g. when
|
|
Packit |
13e616 |
one or more CAs/RTRs/leaf switches going down, or one or more of these
|
|
Packit |
13e616 |
nodes coming back after being down.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Routing Chaining
|
|
Packit |
13e616 |
Routing chaining is the ability to configure the order in which routing
|
|
Packit |
13e616 |
algorithms are applied in opensm, i.e. '-R ftree,updn,minhop' - try
|
|
Packit |
13e616 |
using ftree routing. If ftree fails, try updn. If updn fails, try
|
|
Packit |
13e616 |
minhop.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* IPv6 Solicited Node Multicast addresses consolidation
|
|
Packit |
13e616 |
When this mode is used (enabled with --consolidate_ipv6_snm_req option)
|
|
Packit |
13e616 |
OpenSM will map all IPv6 Solicited Node Multicast address join requests
|
|
Packit |
13e616 |
into a single Multicast group with address ff10:601b::1:ff00:0. In this
|
|
Packit |
13e616 |
way limited MLID space is saved. This IBA noncompliant feature is very
|
|
Packit |
13e616 |
useful with large (~> 1024 nodes) clusters.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OpenSM sweep state machine rework
|
|
Packit |
13e616 |
Huge and buggy OpenSM sweep state machine was fully rewritten in safer
|
|
Packit |
13e616 |
and more effective synchronous manner.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Multi lid routing balancing for updn/minhop routing algorithms
|
|
Packit |
13e616 |
When LMC > 0 is used OpenSM will ensure to generate routing paths via
|
|
Packit |
13e616 |
different switches and when possible chassis.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Preserve base lid routes when LMC > 0
|
|
Packit |
13e616 |
When LMC > 0 is used OpenSM will preserve routing paths for base lids
|
|
Packit |
13e616 |
as it would be with LMC = 0. In this way traffic on each LID level is
|
|
Packit |
13e616 |
not affected by LMC changes.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Ordered routing paths balancing
|
|
Packit |
13e616 |
This adds ability to predefine the port order in which routing paths
|
|
Packit |
13e616 |
balancing is performed by OpenSM. Helps to improve performance
|
|
Packit |
13e616 |
dramatically (40-50%) for applications with known communication
|
|
Packit |
13e616 |
pattern. Activated with --guid_routing_order_file command line option.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Unified OpenSM configuration
|
|
Packit |
13e616 |
Now there is "conventional" config file instead of hidden option cache
|
|
Packit |
13e616 |
file (opensm.opts). OpenSM will find this in a default place (consult
|
|
Packit |
13e616 |
man page for exact value) or the file name can be specified with '-F'
|
|
Packit |
13e616 |
command line option. Also there is an option ('-c') to generate config
|
|
Packit |
13e616 |
file template.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Query remote SMs during light sweep
|
|
Packit |
13e616 |
Master OpenSM will query remote standby SMs periodically to catch its
|
|
Packit |
13e616 |
possible state changes and react accordingly (as required by IBA spec).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Predefined port ids for Up/Down algorithm
|
|
Packit |
13e616 |
This is useful as Up/Down fine tuning tool - the algorithm will use
|
|
Packit |
13e616 |
predefined port IDs instead of GUIDs for its decision about direction.
|
|
Packit |
13e616 |
Activated with --ids_guid_file command line option.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Improved plugin API version 2.
|
|
Packit |
13e616 |
Now OpenSM will provide to plugins the access to all data structures.
|
|
Packit |
13e616 |
This make it possible to implement powerful multi purpose plugins. All
|
|
Packit |
13e616 |
OpenSM header files are installed now and specific configuration/build
|
|
Packit |
13e616 |
options are exported via generated osm_config.h header file.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Many code improvements, optimizations and cleanups
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Automatic daily snapshots generation.
|
|
Packit |
13e616 |
This is is not a "feature", but simplifies the access to recent OpenSM
|
|
Packit |
13e616 |
bits.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1.2 Minor New Features:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Cleanup cl_qlock_pool memory allocator - speedup memory allocations
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Support for configurable (via OSM_UMAD_MAX_PENDING environment variable)
|
|
Packit |
13e616 |
size of pending MADs pool.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Set packet life time to subnet timeout option rather than default
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Enforce routing paths rebalancing on switch reconnection
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* In Up/Down routing algorithm compare GUID values in host byte order
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Add 'switchbalance' and 'lidbalance' commands for OpenSM console
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Respond to new trap 144 node description update flag
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Add '--connect_roots' command line options. This preserves connectivity
|
|
Packit |
13e616 |
between root nodes in Up/Down routing algorithm
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Setting SL in the IPoIB MCast groups in accordance with QoS policy
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Dump auto detected root node guids in Up/Down routing algorithm
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Unify OpenSM dumpers code
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Unify various guid files parsers - add generic nodenamemap style parser
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* When root node guids were provided in file update the list on each
|
|
Packit |
13e616 |
Up/Down run
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* During ./configure show values of configuration dirs and files
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Make prefix routes config file name configurable
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Add a Performance Manager HOWTO to the docs and the dist
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Support separate SA and SM keys as clarified in IBA 1.2.1
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Remove AM_MAINTAINER_MODE in ./configure
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Make vendor type OSM_VENDOR_INTF_OPENIB (libibumad) to be default
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Build osm_perfmgr_db.* content only when PerfMgr is enabled.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Move PerfMgr event_db_dump_file to common OpenSM dump dir
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Allow space separated strings as values in OpenSM config
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Support for multiple event plugins
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Add '--version' command line option
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Add '--create-config <file-name>' command line option
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Speedup and simplify logging code
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Speedup multicast processing in SA DB
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* In log messages convert unicast LIDs from hex to decimal format and
|
|
Packit |
13e616 |
GIDs from hex to IPv6 address format
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Handle all possible ports in "ignore-guids" file
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Add 'reroute' console command
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Remove many install-exec-hook from Makefiles
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Some cleanups in LASH routing algorithm code
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* In Makefiles remove -rpath and explicit -lpthread, -ldl from LDFLAGS
|
|
Packit |
13e616 |
(move to configurator)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Install all OpenSM header files
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Improve locking in SM Info receiver
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Add new OSM_EVENT_ID_SUBNET_UP event for plugins
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Redo lex and yacc files generation in conventional way
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Add a missing Node Description check on light sweep.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Move vendor specific compilation defines from command to generated
|
|
Packit |
13e616 |
config.h file
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Provide useful error message when log file opening fails
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Add generated osm_config.h file with OpenSM specific defines
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Display port number in decimal in log messages
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Replace osm_vendor_select.h by generated osm_config.h
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Unify options listing in OpenSM usage message
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* LFT buffers handling simplification
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Add 'dump_conf' console command
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* OpenSM performs sweep on SIGCONT (coming out of suspend).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* When our SM is in Standby state and its priority is increased
|
|
Packit |
13e616 |
(via console command), notify master SM by sending Trap 144.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* When entering standby state (after discovery) notify master SM
|
|
Packit |
13e616 |
with Trap 144.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* support more PortInfo:CapabilityMask bits
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* When babbling port policy is on disable the port with the least hop
|
|
Packit |
13e616 |
count.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1.3 Library API Changes
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
None
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1.4 Software Dependencies
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
OpenSM depends on the installation of either OFED 1.x, OpenIB gen2 (e.g.
|
|
Packit |
13e616 |
IBG2 distribution), OpenIB gen1 (e.g. IBGD distribution), or Mellanox
|
|
Packit |
13e616 |
VAPI stacks. The qualified driver versions are provided in Table 2,
|
|
Packit |
13e616 |
"Qualified IB Stacks".
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Also, building of QoS manager policy file parser requires flex, and either
|
|
Packit |
13e616 |
bison or byacc installed.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1.5 Supported Devices Firmware
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
The main task of OpenSM is to initialize InfiniBand devices. The
|
|
Packit |
13e616 |
qualified devices and their corresponding firmware versions
|
|
Packit |
13e616 |
are listed in Table 3.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
2 Known Issues And Limitations
|
|
Packit |
13e616 |
------------------------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* No Service / Key associations:
|
|
Packit |
13e616 |
There is no way to manage Service access by Keys.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* No SM to SM SMDB synchronization:
|
|
Packit |
13e616 |
Puts the burden of re-registering services, multicast groups, and
|
|
Packit |
13e616 |
inform-info on the client application (or IB access layer core).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
3 Unsupported IB Compliance Statements
|
|
Packit |
13e616 |
--------------------------------------
|
|
Packit |
13e616 |
The following section lists all the IB compliance statements which
|
|
Packit |
13e616 |
OpenSM does not support. Please refer to the IB specification for detailed
|
|
Packit |
13e616 |
information regarding each compliance statement.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-22 (Authentication):
|
|
Packit |
13e616 |
M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
|
|
Packit |
13e616 |
SubnSet method. As a work-around, an OpenSM option is provided for
|
|
Packit |
13e616 |
defining the protect bits.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-67 (Authentication):
|
|
Packit |
13e616 |
On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
|
|
Packit |
13e616 |
the SM shall generate a SubnGetResp if the M_Key matches, or
|
|
Packit |
13e616 |
silently drop the packet if M_Key does not match.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.23.4 (Authentication):
|
|
Packit |
13e616 |
InformInfoRecords shall always be provided with the QPN set to 0,
|
|
Packit |
13e616 |
except for the case of a trusted request, in which case the actual
|
|
Packit |
13e616 |
subscriber QPN shall be returned.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* o13-17.1.2 (Event-FWD):
|
|
Packit |
13e616 |
If no permission to forward, the subscription should be removed and
|
|
Packit |
13e616 |
no further forwarding should occur.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
|
|
Packit |
13e616 |
GUIDInfo - SM should enable assigning Port GUIDInfo.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-44 (Initialization):
|
|
Packit |
13e616 |
If the SM discovers that it is missing an M_Key to update CA/RT/SW,
|
|
Packit |
13e616 |
it should notify the higher level.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-62.1.1.12 (Initialization):
|
|
Packit |
13e616 |
PortInfo:M_Key - Set the M_Key to a node based random value.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-62.1.1.13 (Initialization):
|
|
Packit |
13e616 |
PortInfo:M_KeyProtectBits - set according to an optional policy.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-62.1.1.24 (Initialization):
|
|
Packit |
13e616 |
SwitchInfo:DefaultPort - should be configured for random FDB.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C14-62.1.1.32 (Initialization):
|
|
Packit |
13e616 |
RandomForwardingTable should be configured.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* o15-0.1.12 (Multicast):
|
|
Packit |
13e616 |
If the JoinState is SendOnlyNonMember = 1 (only), then the endport
|
|
Packit |
13e616 |
should join as sender only.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* o15-0.1.8 (Multicast):
|
|
Packit |
13e616 |
If a request for creating an MCG with fields that cannot be met,
|
|
Packit |
13e616 |
return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.8.6 (SA-Query):
|
|
Packit |
13e616 |
Respond to SubnAdmGetTraceTable - this is an optional attribute.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.13 Services:
|
|
Packit |
13e616 |
Reject ServiceRecord create, modify or delete if the given
|
|
Packit |
13e616 |
ServiceP_Key does not match the one included in the ServiceGID port
|
|
Packit |
13e616 |
and the port that sent the request.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* C15-0.1.14 (Services):
|
|
Packit |
13e616 |
Provide means to associate service name and ServiceKeys.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
4 Bug Fixes
|
|
Packit |
13e616 |
-----------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
4.1 Major Bug Fixes
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Set SA attribute offset to 0 when no records are returned
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Send trap 64 only after new ports are in ACTIVE state.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Fix in sending client reregistration bit
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Fix default OpenSM SM (and SA) Key byte order
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Fix in sending Multicast groups creation/deletion notification (Traps
|
|
Packit |
13e616 |
66,67)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Don't startup automatically on SuSE based systems
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
4.2 Other Bug Fixes
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_console.c: fix seg fault when running "portstatus ca" in
|
|
Packit |
13e616 |
the console
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: fix potential core dumps where osm_node_get_physp_ptr can
|
|
Packit |
13e616 |
return NULL
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_mcast_mgr: limit spanning tree creation recursion to value
|
|
Packit |
13e616 |
of max hops (64)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: switch LFTs incremental update fix
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_state_mgr.c: fix segmentation fault
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: eliminate some potential NULL pointer dereferences
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_console.c: fix guid parsing
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: fix off by 1 issue with max_lid and max_multicat_lid_ho
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: fix potentially wrong port_guid initialization
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/configure.in: fix wrong HAVE_DEFAULT_OPENSM_CONFIG_FILE define
|
|
Packit |
13e616 |
generation
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: fix snprintf() usage
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_sa_lft_record: validate LFT block number
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_sa_lft_record: pass block parameter in host byte order
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/include/Makefile.am: don't duplicate header files in EXTRA_DIST
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_sa_class_port_info.c: fix over bound array access
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* osmtest/osmt_service.c: fix over bound array access
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* osmtest: fix qpn encoding in osmtest_informinfo_request()
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_vendor_mlx_sa.c: handling attribute offset of 0
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: fix segfault corner case when osm_console_init fails
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/console: close console socket on cleanup path
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_ucast_lash: fix buffer overflow
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: fix broken IPv6 SNM consolidation code
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_sa_lft_record.c: fix block number encoding byte order
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_sa: fix memory leak in SA responder
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_mcast_mgr: fix memory leak
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: fix qos config parsing bugs
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_mcast_tbl.c: fix sending invalid MF block due to max mlid
|
|
Packit |
13e616 |
overflow
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: log_max_size config parameter in MB
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_ucast_lash: fix extra memory allocations
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: fix race in main OpenSM flow
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/ftree: fix GUID check against cn_guid_file
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/ftree: save FLT buffers memory allocations
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm/osm_sa_link_record.c: prevent potential endless recursion
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: remove SM from sm_guid_tbl when IsSM port capability flag is
|
|
Packit |
13e616 |
not set
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: fix QoS config bug
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* opensm: don't reassign zeroed params from config file
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Other less critical or visible bugs were also fixed.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
5 Main Verification Flows
|
|
Packit |
13e616 |
-------------------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
OpenSM verification is run using the following activities:
|
|
Packit |
13e616 |
* osmtest - a stand-alone program
|
|
Packit |
13e616 |
* ibmgtsim (IB management simulator) based - a set of flows that
|
|
Packit |
13e616 |
simulate clusters, inject errors and verify OpenSM capability to
|
|
Packit |
13e616 |
respond and bring up the network correctly.
|
|
Packit |
13e616 |
* small cluster regression testing - where the SM is used on back to
|
|
Packit |
13e616 |
back or single switch configurations. The regression includes
|
|
Packit |
13e616 |
multiple OpenSM dedicated tests.
|
|
Packit |
13e616 |
* cluster testing - when we run OpenSM to setup a large cluster, perform
|
|
Packit |
13e616 |
hand-off, reboots and reconnects, verify routing correctness and SA
|
|
Packit |
13e616 |
responsiveness at the ULP level (IPoIB and SDP).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
5.1 osmtest
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
osmtest is an automated verification tool used for OpenSM
|
|
Packit |
13e616 |
testing. Its verification flows are described by list below.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Inventory File: Obtain and verify all port info, node info, link and path
|
|
Packit |
13e616 |
records parameters.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Service Record:
|
|
Packit |
13e616 |
- Register new service
|
|
Packit |
13e616 |
- Register another service (with a lease period)
|
|
Packit |
13e616 |
- Register another service (with service p_key set to zero)
|
|
Packit |
13e616 |
- Get all services by name
|
|
Packit |
13e616 |
- Delete the first service
|
|
Packit |
13e616 |
- Delete the third service
|
|
Packit |
13e616 |
- Added bad flows of get/delete non valid service
|
|
Packit |
13e616 |
- Add / Get same service with different data
|
|
Packit |
13e616 |
- Add / Get / Delete by different component mask values (services
|
|
Packit |
13e616 |
by Name & Key / Name & Data / Name & Id / Id only )
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Multicast Member Record:
|
|
Packit |
13e616 |
- Query of existing Groups (IPoIB)
|
|
Packit |
13e616 |
- BAD Join with insufficient comp mask (o15.0.1.3)
|
|
Packit |
13e616 |
- Create given MGID=0 (o15.0.1.4)
|
|
Packit |
13e616 |
- Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
|
|
Packit |
13e616 |
- Create BAD MGID=0xFA. (o15.0.1.6)
|
|
Packit |
13e616 |
- Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
|
|
Packit |
13e616 |
- New MGID with invalid join state (o15.0.1.9)
|
|
Packit |
13e616 |
- Retry of existing MGID - See JoinState update (o15.0.1.11)
|
|
Packit |
13e616 |
- BAD RATE when connecting to existing MGID (o15.0.1.13)
|
|
Packit |
13e616 |
- Partial JoinState delete request - removing FullMember (o15.0.1.14)
|
|
Packit |
13e616 |
- Full Delete of a group (o15.0.1.14)
|
|
Packit |
13e616 |
- Verify Delete by trying to Join deleted group (o15.0.1.14)
|
|
Packit |
13e616 |
- BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* GUIDInfo Record:
|
|
Packit |
13e616 |
- All GUIDInfoRecords in subnet are obtained
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* MultiPathRecord:
|
|
Packit |
13e616 |
- Perform some compliant and noncompliant MultiPathRecord requests
|
|
Packit |
13e616 |
- Validation is via status in responses and IB analyzer
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* PKeyTableRecord:
|
|
Packit |
13e616 |
- Perform some compliant and noncompliant PKeyTableRecord queries
|
|
Packit |
13e616 |
- Validation is via status in responses and IB analyzer
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* LinearForwardingTableRecord:
|
|
Packit |
13e616 |
- Perform some compliant and noncompliant LinearForwardingTableRecord queries
|
|
Packit |
13e616 |
- Validation is via status in responses and IB analyzer
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Event Forwarding: Register for trap forwarding using reports
|
|
Packit |
13e616 |
- Send a trap and wait for report
|
|
Packit |
13e616 |
- Unregister non-existing
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
|
|
Packit |
13e616 |
disconnecting/connecting ports) and wait for report, then unregister.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Stress Test: send PortInfoRecord queries, both single and RMPP and
|
|
Packit |
13e616 |
check for the rate of responses as well as their validity.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
5.2 IB Management Simulator OpenSM Test Flows:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
The simulator provides ability to simulate the SM handling of virtual
|
|
Packit |
13e616 |
topologies that are not limited to actual lab equipment availability.
|
|
Packit |
13e616 |
OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
|
|
Packit |
13e616 |
regressions use smaller (16 and 128 nodes clusters).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
The following test flows are run on the IB management simulator:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Stability:
|
|
Packit |
13e616 |
Up to 12 links from the fabric are randomly selected to drop packets
|
|
Packit |
13e616 |
at drop rates up to 90%. The SM is required to succeed in bringing the
|
|
Packit |
13e616 |
fabric up. The resulting routing is verified to be correct as well.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* LID Manager:
|
|
Packit |
13e616 |
Using LMC = 2 the fabric is initialized with LIDs. Faults such as
|
|
Packit |
13e616 |
zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
|
|
Packit |
13e616 |
randomly assigned to various nodes and other errors are randomly
|
|
Packit |
13e616 |
output to the guid2lid cache file. The SM sweep is run 5 times and
|
|
Packit |
13e616 |
after each iteration a complete verification is made to ensure that all
|
|
Packit |
13e616 |
LIDs that could possibly be maintained are kept, as well as that all nodes
|
|
Packit |
13e616 |
were assigned a legal LID range.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Multicast Routing:
|
|
Packit |
13e616 |
Nodes randomly join the 0xc000 group and eventually the
|
|
Packit |
13e616 |
resulting routing is verified for completeness and adherence to
|
|
Packit |
13e616 |
Up/Down routing rules.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* osmtest:
|
|
Packit |
13e616 |
The complete osmtest flow as described in the previous table is run on
|
|
Packit |
13e616 |
the simulated fabrics.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Stress Test:
|
|
Packit |
13e616 |
This flow merges fabric, LID and stability issues with continuous
|
|
Packit |
13e616 |
PathRecord, ServiceRecord and Multicast Join/Leave activity to
|
|
Packit |
13e616 |
stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
|
|
Packit |
13e616 |
were added to the test such both existing and non existing nodes
|
|
Packit |
13e616 |
perform them in random order.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
5.3 OpenSM Regression
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Using a back-to-back or single switch connection, the following set of
|
|
Packit |
13e616 |
tests is run nightly on the stacks described in table 2. The included
|
|
Packit |
13e616 |
tests are:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Stress Testing: Flood the SA with queries from multiple channel
|
|
Packit |
13e616 |
adapters to check the robustness of the entire stack up to the SA.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Dynamic Changes: Dynamic Topology changes, through randomly
|
|
Packit |
13e616 |
dropping SMP packets, used to test OpenSM adaptation to an unstable
|
|
Packit |
13e616 |
network & verify DB correctness.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Trap Injection: This flow injects traps to the SM and verifies that it
|
|
Packit |
13e616 |
handles them gracefully.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* SA Query Test: This test exhaustively checks the SA responses to all
|
|
Packit |
13e616 |
possible single component mask. To do that the test examines the
|
|
Packit |
13e616 |
entire set of records the SA can provide, classifies them by their
|
|
Packit |
13e616 |
field values and then selects every field (using component mask and a
|
|
Packit |
13e616 |
value) and verifies that the response matches the expected set of records.
|
|
Packit |
13e616 |
A random selection using multiple component mask bits is also performed.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
5.4 Cluster testing:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Cluster testing is usually run before a distribution release. It
|
|
Packit |
13e616 |
involves real hardware setups of 16 to 32 nodes (or more if a beta site
|
|
Packit |
13e616 |
is available). Each test is validated by running all-to-all ping through the IB
|
|
Packit |
13e616 |
interface. The test procedure includes:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Cluster bringup
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Hand-off between 2 or 3 SM's while performing:
|
|
Packit |
13e616 |
- Node reboots
|
|
Packit |
13e616 |
- Switch power cycles (disconnecting the SM's)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Unresponsive port detection and recovery
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* osmtest from multiple nodes
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
* Trap injection and recovery
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
6 Qualified Software Stacks and Devices
|
|
Packit |
13e616 |
---------------------------------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
OpenSM Compatibility
|
|
Packit |
13e616 |
--------------------
|
|
Packit |
13e616 |
Note that OpenSM version 3.2.1 and earlier used a value of 1 in host
|
|
Packit |
13e616 |
byte order for the default SM_Key, so there is a compatibility issue
|
|
Packit |
13e616 |
with these earlier versions of OpenSM when the 3.2.2 or later version
|
|
Packit |
13e616 |
is running on a little endian machine. This affects SM handover as well
|
|
Packit |
13e616 |
as SA queries (saquery tool in infiniband-diags).
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Table 2 - Qualified IB Stacks
|
|
Packit |
13e616 |
=============================
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Stack | Version
|
|
Packit |
13e616 |
-----------------------------------------|--------------------------
|
|
Packit |
13e616 |
OFED | 1.4
|
|
Packit |
13e616 |
OFED | 1.3
|
|
Packit |
13e616 |
OFED | 1.2
|
|
Packit |
13e616 |
OFED | 1.1
|
|
Packit |
13e616 |
OFED | 1.0
|
|
Packit |
13e616 |
OpenIB Gen2 (IBG2 distribution) | 1.0
|
|
Packit |
13e616 |
OpenIB Gen1 (IBGD distribution) | 1.8.0
|
|
Packit |
13e616 |
VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Table 3 - Qualified Devices and Corresponding Firmware
|
|
Packit |
13e616 |
======================================================
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Mellanox
|
|
Packit |
13e616 |
Device | FW versions
|
|
Packit |
13e616 |
------------------------------------|-------------------------------
|
|
Packit |
13e616 |
InfiniScale | fw-43132 5.2.000 (and later)
|
|
Packit |
13e616 |
InfiniScale III | fw-47396 0.5.000 (and later)
|
|
Packit |
13e616 |
InfiniScale IV | fw-48436 7.1.000 (and later)
|
|
Packit |
13e616 |
InfiniHost | fw-23108 3.5.000 (and later)
|
|
Packit |
13e616 |
InfiniHost III Lx | fw-25204 1.2.000 (and later)
|
|
Packit |
13e616 |
InfiniHost III Ex (InfiniHost Mode) | fw-25208 4.8.200 (and later)
|
|
Packit |
13e616 |
InfiniHost III Ex (MemFree Mode) | fw-25218 5.3.000 (and later)
|
|
Packit |
13e616 |
ConnectX IB | fw-25408 2.3.000 (and later)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
QLogic/PathScale
|
|
Packit |
13e616 |
Device | Note
|
|
Packit |
13e616 |
--------|-----------------------------------------------------------
|
|
Packit |
13e616 |
iPath | QHT6040 (PathScale InfiniPath HT-460)
|
|
Packit |
13e616 |
iPath | QHT6140 (PathScale InfiniPath HT-465)
|
|
Packit |
13e616 |
iPath | QLE6140 (PathScale InfiniPath PE-880)
|
|
Packit |
13e616 |
iPath | QLE7240
|
|
Packit |
13e616 |
iPath | QLE7280
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
|
|
Packit |
13e616 |
QP0 and QP1. However, it does support it as a device on the subnet.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Note 2: QoS firmware and Mellanox devices
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
HCAs: QoS supported by ConnectX. QoS-enabled FW release is 2_5_000 and
|
|
Packit |
13e616 |
later.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Switches: QoS supported by InfiniScale III
|
|
Packit |
13e616 |
Any InfiniScale III FW that is supported by OpenSM supports QoS.
|