Blame doc/opensm_release_notes-3.2.txt

Packit 13e616
                        OpenSM Release Notes 3.2
Packit 13e616
                       =============================
Packit 13e616
Packit 13e616
Version: OpenSM 3.2.x
Packit 13e616
Repo:    git://git.openfabrics.org/~sashak/management.git
Packit 13e616
Date:    Dec 2008
Packit 13e616
Packit 13e616
1 Overview
Packit 13e616
----------
Packit 13e616
This document describes the contents of the OpenSM 3.2 release.
Packit 13e616
OpenSM is an InfiniBand compliant Subnet Manager and Administration,
Packit 13e616
and runs on top of OpenIB. The OpenSM version for this release
Packit 13e616
is opensm-3.2.5
Packit 13e616
Packit 13e616
This document includes the following sections:
Packit 13e616
1 This Overview section (describing new features and software
Packit 13e616
  dependencies)
Packit 13e616
2 Known Issues And Limitations
Packit 13e616
3 Unsupported IB compliance statements
Packit 13e616
4 Bug Fixes
Packit 13e616
5 Main Verification Flows
Packit 13e616
6 Qualified Software Stacks and Devices
Packit 13e616
Packit 13e616
1.1 Major New Features
Packit 13e616
Packit 13e616
* Cached Routing
Packit 13e616
  OpenSM provides an optional unicast routing cache (enabled by '-A' or
Packit 13e616
  '--ucast_cache' options). When enabled, unicast routing cache prevents
Packit 13e616
  routing recalculation (which is a heavy task in a large cluster) when
Packit 13e616
  there was no topology change detected during the heavy sweep, or when
Packit 13e616
  the topology change does not require new routing calculation, e.g. when
Packit 13e616
  one or more CAs/RTRs/leaf switches going down, or one or more of these
Packit 13e616
  nodes coming back after being down.
Packit 13e616
Packit 13e616
* Routing Chaining
Packit 13e616
  Routing chaining is the ability to configure the order in which routing
Packit 13e616
  algorithms are applied in opensm, i.e. '-R ftree,updn,minhop' - try
Packit 13e616
  using ftree routing. If ftree fails, try updn. If updn fails, try
Packit 13e616
  minhop.
Packit 13e616
Packit 13e616
* IPv6 Solicited Node Multicast addresses consolidation
Packit 13e616
  When this mode is used (enabled with --consolidate_ipv6_snm_req option)
Packit 13e616
  OpenSM will map all IPv6 Solicited Node Multicast address join requests
Packit 13e616
  into a single Multicast group with address ff10:601b::1:ff00:0. In this
Packit 13e616
  way limited MLID space is saved. This IBA noncompliant feature is very
Packit 13e616
  useful with large (~> 1024 nodes) clusters.
Packit 13e616
Packit 13e616
* OpenSM sweep state machine rework
Packit 13e616
  Huge and buggy OpenSM sweep state machine was fully rewritten in safer
Packit 13e616
  and more effective synchronous manner.
Packit 13e616
Packit 13e616
* Multi lid routing balancing for updn/minhop routing algorithms
Packit 13e616
  When LMC > 0 is used OpenSM will ensure to generate routing paths via
Packit 13e616
  different switches and when possible chassis.
Packit 13e616
Packit 13e616
* Preserve base lid routes when LMC > 0
Packit 13e616
  When LMC > 0 is used OpenSM will preserve routing paths for base lids
Packit 13e616
  as it would be with LMC = 0. In this way traffic on each LID level is
Packit 13e616
  not affected by LMC changes.
Packit 13e616
Packit 13e616
* Ordered routing paths balancing
Packit 13e616
  This adds ability to predefine the port order in which routing paths
Packit 13e616
  balancing is performed by OpenSM. Helps to improve performance
Packit 13e616
  dramatically (40-50%) for applications with known communication
Packit 13e616
  pattern. Activated with --guid_routing_order_file command line option.
Packit 13e616
Packit 13e616
* Unified OpenSM configuration
Packit 13e616
  Now there is "conventional" config file instead of hidden option cache
Packit 13e616
  file (opensm.opts). OpenSM will find this in a default place (consult
Packit 13e616
  man page for exact value) or the file name can be specified with '-F'
Packit 13e616
  command line option. Also there is an option ('-c') to generate config
Packit 13e616
  file template.
Packit 13e616
Packit 13e616
* Query remote SMs during light sweep
Packit 13e616
  Master OpenSM will query remote standby SMs periodically to catch its
Packit 13e616
  possible state changes and react accordingly (as required by IBA spec).
Packit 13e616
Packit 13e616
* Predefined port ids for Up/Down algorithm
Packit 13e616
  This is useful as Up/Down fine tuning tool - the algorithm will use
Packit 13e616
  predefined port IDs instead of GUIDs for its decision about direction.
Packit 13e616
  Activated with --ids_guid_file command line option.
Packit 13e616
Packit 13e616
* Improved plugin API version 2.
Packit 13e616
  Now OpenSM will provide to plugins the access to all data structures.
Packit 13e616
  This make it possible to implement powerful multi purpose plugins. All
Packit 13e616
  OpenSM header files are installed now and specific configuration/build
Packit 13e616
  options are exported via generated osm_config.h header file.
Packit 13e616
Packit 13e616
* Many code improvements, optimizations and cleanups
Packit 13e616
Packit 13e616
* Automatic daily snapshots generation.
Packit 13e616
  This is is not a "feature", but simplifies the access to recent OpenSM
Packit 13e616
  bits.
Packit 13e616
Packit 13e616
1.2 Minor New Features:
Packit 13e616
Packit 13e616
* Cleanup cl_qlock_pool memory allocator - speedup memory allocations
Packit 13e616
Packit 13e616
* Support for configurable (via OSM_UMAD_MAX_PENDING environment variable)
Packit 13e616
  size of pending MADs pool.
Packit 13e616
Packit 13e616
* Set packet life time to subnet timeout option rather than default
Packit 13e616
Packit 13e616
* Enforce routing paths rebalancing on switch reconnection
Packit 13e616
Packit 13e616
* In Up/Down routing algorithm compare GUID values in host byte order
Packit 13e616
Packit 13e616
* Add 'switchbalance' and 'lidbalance' commands for OpenSM console
Packit 13e616
Packit 13e616
* Respond to new trap 144 node description update flag
Packit 13e616
Packit 13e616
* Add '--connect_roots' command line options. This preserves connectivity
Packit 13e616
  between root nodes in Up/Down routing algorithm
Packit 13e616
Packit 13e616
* Setting SL in the IPoIB MCast groups in accordance with QoS policy
Packit 13e616
Packit 13e616
* Dump auto detected root node guids in Up/Down routing algorithm
Packit 13e616
Packit 13e616
* Unify OpenSM dumpers code
Packit 13e616
Packit 13e616
* Unify various guid files parsers - add generic nodenamemap style parser
Packit 13e616
Packit 13e616
* When root node guids were provided in file update the list on each
Packit 13e616
  Up/Down run
Packit 13e616
Packit 13e616
* During ./configure show values of configuration dirs and files
Packit 13e616
Packit 13e616
* Make prefix routes config file name configurable
Packit 13e616
Packit 13e616
* Add a Performance Manager HOWTO to the docs and the dist
Packit 13e616
Packit 13e616
* Support separate SA and SM keys as clarified in IBA 1.2.1
Packit 13e616
Packit 13e616
* Remove AM_MAINTAINER_MODE in ./configure
Packit 13e616
Packit 13e616
* Make vendor type OSM_VENDOR_INTF_OPENIB (libibumad) to be default
Packit 13e616
Packit 13e616
* Build osm_perfmgr_db.* content only when PerfMgr is enabled.
Packit 13e616
Packit 13e616
* Move PerfMgr event_db_dump_file to common OpenSM dump dir
Packit 13e616
Packit 13e616
* Allow space separated strings as values in OpenSM config
Packit 13e616
Packit 13e616
* Support for multiple event plugins
Packit 13e616
Packit 13e616
* Add '--version' command line option
Packit 13e616
Packit 13e616
* Add '--create-config <file-name>' command line option
Packit 13e616
Packit 13e616
* Speedup and simplify logging code
Packit 13e616
Packit 13e616
* Speedup multicast processing in SA DB
Packit 13e616
Packit 13e616
* In log messages convert unicast LIDs from hex to decimal format and
Packit 13e616
  GIDs from hex to IPv6 address format
Packit 13e616
Packit 13e616
* Handle all possible ports in "ignore-guids" file
Packit 13e616
Packit 13e616
* Add 'reroute' console command
Packit 13e616
Packit 13e616
* Remove many install-exec-hook from Makefiles
Packit 13e616
Packit 13e616
* Some cleanups in LASH routing algorithm code
Packit 13e616
Packit 13e616
* In Makefiles remove -rpath and explicit -lpthread, -ldl from LDFLAGS
Packit 13e616
  (move to configurator)
Packit 13e616
Packit 13e616
* Install all OpenSM header files
Packit 13e616
Packit 13e616
* Improve locking in SM Info receiver
Packit 13e616
Packit 13e616
* Add new OSM_EVENT_ID_SUBNET_UP event for plugins
Packit 13e616
Packit 13e616
* Redo lex and yacc files generation in conventional way
Packit 13e616
Packit 13e616
* Add a missing Node Description check on light sweep.
Packit 13e616
Packit 13e616
* Move vendor specific compilation defines from command to generated
Packit 13e616
  config.h file
Packit 13e616
Packit 13e616
* Provide useful error message when log file opening fails
Packit 13e616
Packit 13e616
* Add generated osm_config.h file with OpenSM specific defines
Packit 13e616
Packit 13e616
* Display port number in decimal in log messages
Packit 13e616
Packit 13e616
* Replace osm_vendor_select.h by generated osm_config.h
Packit 13e616
Packit 13e616
* Unify options listing in OpenSM usage message
Packit 13e616
Packit 13e616
* LFT buffers handling simplification
Packit 13e616
Packit 13e616
* Add 'dump_conf' console command
Packit 13e616
Packit 13e616
* OpenSM performs sweep on SIGCONT (coming out of suspend).
Packit 13e616
Packit 13e616
* When our SM is in Standby state and its priority is increased
Packit 13e616
  (via console command), notify master SM by sending Trap 144.
Packit 13e616
Packit 13e616
* When entering standby state (after discovery) notify master SM
Packit 13e616
  with Trap 144.
Packit 13e616
Packit 13e616
* support more PortInfo:CapabilityMask bits
Packit 13e616
Packit 13e616
* When babbling port policy is on disable the port with the least hop
Packit 13e616
  count.
Packit 13e616
Packit 13e616
1.3 Library API Changes
Packit 13e616
Packit 13e616
  None
Packit 13e616
Packit 13e616
1.4 Software Dependencies
Packit 13e616
Packit 13e616
OpenSM depends on the installation of either OFED 1.x, OpenIB gen2 (e.g.
Packit 13e616
IBG2 distribution), OpenIB gen1 (e.g. IBGD distribution), or Mellanox
Packit 13e616
VAPI stacks. The qualified driver versions are provided in Table 2,
Packit 13e616
"Qualified IB Stacks".
Packit 13e616
Packit 13e616
Also, building of QoS manager policy file parser requires flex, and either
Packit 13e616
bison or byacc installed.
Packit 13e616
Packit 13e616
1.5 Supported Devices Firmware
Packit 13e616
Packit 13e616
The main task of OpenSM is to initialize InfiniBand devices. The
Packit 13e616
qualified devices and their corresponding firmware versions
Packit 13e616
are listed in Table 3.
Packit 13e616
Packit 13e616
2 Known Issues And Limitations
Packit 13e616
------------------------------
Packit 13e616
Packit 13e616
* No Service / Key associations:
Packit 13e616
  There is no way to manage Service access by Keys.
Packit 13e616
Packit 13e616
* No SM to SM SMDB synchronization:
Packit 13e616
  Puts the burden of re-registering services, multicast groups, and
Packit 13e616
  inform-info on the client application (or IB access layer core).
Packit 13e616
Packit 13e616
3 Unsupported IB Compliance Statements
Packit 13e616
--------------------------------------
Packit 13e616
The following section lists all the IB compliance statements which
Packit 13e616
OpenSM does not support. Please refer to the IB specification for detailed
Packit 13e616
information regarding each compliance statement.
Packit 13e616
Packit 13e616
* C14-22 (Authentication):
Packit 13e616
  M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
Packit 13e616
  SubnSet method. As a work-around, an OpenSM option is provided for
Packit 13e616
  defining the protect bits.
Packit 13e616
Packit 13e616
* C14-67 (Authentication):
Packit 13e616
  On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
Packit 13e616
  the SM shall generate a SubnGetResp if the M_Key matches, or
Packit 13e616
  silently drop the packet if M_Key does not match.
Packit 13e616
Packit 13e616
* C15-0.1.23.4 (Authentication):
Packit 13e616
  InformInfoRecords shall always be provided with the QPN set to 0,
Packit 13e616
  except for the case of a trusted request, in which case the actual
Packit 13e616
  subscriber QPN shall be returned.
Packit 13e616
Packit 13e616
* o13-17.1.2 (Event-FWD):
Packit 13e616
  If no permission to forward, the subscription should be removed and
Packit 13e616
  no further forwarding should occur.
Packit 13e616
Packit 13e616
* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
Packit 13e616
  GUIDInfo - SM should enable assigning Port GUIDInfo.
Packit 13e616
Packit 13e616
* C14-44 (Initialization):
Packit 13e616
  If the SM discovers that it is missing an M_Key to update CA/RT/SW,
Packit 13e616
  it should notify the higher level.
Packit 13e616
Packit 13e616
* C14-62.1.1.12 (Initialization):
Packit 13e616
  PortInfo:M_Key - Set the M_Key to a node based random value.
Packit 13e616
Packit 13e616
* C14-62.1.1.13 (Initialization):
Packit 13e616
  PortInfo:M_KeyProtectBits - set according to an optional policy.
Packit 13e616
Packit 13e616
* C14-62.1.1.24 (Initialization):
Packit 13e616
  SwitchInfo:DefaultPort - should be configured for random FDB.
Packit 13e616
Packit 13e616
* C14-62.1.1.32 (Initialization):
Packit 13e616
  RandomForwardingTable should be configured.
Packit 13e616
Packit 13e616
* o15-0.1.12 (Multicast):
Packit 13e616
  If the JoinState is SendOnlyNonMember = 1 (only), then the endport
Packit 13e616
  should join as sender only.
Packit 13e616
Packit 13e616
* o15-0.1.8 (Multicast):
Packit 13e616
  If a request for creating an MCG with fields that cannot be met,
Packit 13e616
  return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
Packit 13e616
Packit 13e616
* C15-0.1.8.6 (SA-Query):
Packit 13e616
  Respond to SubnAdmGetTraceTable - this is an optional attribute.
Packit 13e616
Packit 13e616
* C15-0.1.13 Services:
Packit 13e616
  Reject ServiceRecord create, modify or delete if the given
Packit 13e616
  ServiceP_Key does not match the one included in the ServiceGID port
Packit 13e616
  and the port that sent the request.
Packit 13e616
Packit 13e616
* C15-0.1.14 (Services):
Packit 13e616
  Provide means to associate service name and ServiceKeys.
Packit 13e616
Packit 13e616
4 Bug Fixes
Packit 13e616
-----------
Packit 13e616
Packit 13e616
4.1 Major Bug Fixes
Packit 13e616
Packit 13e616
* Set SA attribute offset to 0 when no records are returned
Packit 13e616
Packit 13e616
* Send trap 64 only after new ports are in ACTIVE state.
Packit 13e616
Packit 13e616
* Fix in sending client reregistration bit
Packit 13e616
Packit 13e616
* Fix default OpenSM SM (and SA) Key byte order
Packit 13e616
Packit 13e616
* Fix in sending Multicast groups creation/deletion notification (Traps
Packit 13e616
  66,67)
Packit 13e616
Packit 13e616
* Don't startup automatically on SuSE based systems
Packit 13e616
Packit 13e616
4.2 Other Bug Fixes
Packit 13e616
Packit 13e616
* opensm/osm_console.c: fix seg fault when running "portstatus ca" in
Packit 13e616
  the console
Packit 13e616
Packit 13e616
* opensm: fix potential core dumps where osm_node_get_physp_ptr can
Packit 13e616
  return NULL
Packit 13e616
Packit 13e616
* opensm/osm_mcast_mgr: limit spanning tree creation recursion to value
Packit 13e616
  of max hops (64)
Packit 13e616
Packit 13e616
* opensm: switch LFTs incremental update fix
Packit 13e616
Packit 13e616
* opensm/osm_state_mgr.c: fix segmentation fault
Packit 13e616
Packit 13e616
* opensm: eliminate some potential NULL pointer dereferences
Packit 13e616
Packit 13e616
* opensm/osm_console.c: fix guid parsing
Packit 13e616
Packit 13e616
* opensm: fix off by 1 issue with max_lid and max_multicat_lid_ho
Packit 13e616
Packit 13e616
* opensm: fix potentially wrong port_guid initialization
Packit 13e616
Packit 13e616
* opensm/configure.in: fix wrong HAVE_DEFAULT_OPENSM_CONFIG_FILE define
Packit 13e616
  generation
Packit 13e616
Packit 13e616
* opensm: fix snprintf() usage
Packit 13e616
Packit 13e616
* opensm/osm_sa_lft_record: validate LFT block number
Packit 13e616
Packit 13e616
* opensm/osm_sa_lft_record: pass block parameter in host byte order
Packit 13e616
Packit 13e616
* opensm/include/Makefile.am: don't duplicate header files in EXTRA_DIST
Packit 13e616
Packit 13e616
* opensm/osm_sa_class_port_info.c: fix over bound array access
Packit 13e616
Packit 13e616
* osmtest/osmt_service.c: fix over bound array access
Packit 13e616
Packit 13e616
* osmtest: fix qpn encoding in osmtest_informinfo_request()
Packit 13e616
Packit 13e616
* opensm/osm_vendor_mlx_sa.c: handling attribute offset of 0
Packit 13e616
Packit 13e616
* opensm: fix segfault corner case when osm_console_init fails
Packit 13e616
Packit 13e616
* opensm/console: close console socket on cleanup path
Packit 13e616
Packit 13e616
* opensm/osm_ucast_lash: fix buffer overflow
Packit 13e616
Packit 13e616
* opensm: fix broken IPv6 SNM consolidation code
Packit 13e616
Packit 13e616
* opensm/osm_sa_lft_record.c: fix block number encoding byte order
Packit 13e616
Packit 13e616
* opensm/osm_sa: fix memory leak in SA responder
Packit 13e616
Packit 13e616
* opensm/osm_mcast_mgr: fix memory leak
Packit 13e616
Packit 13e616
* opensm: fix qos config parsing bugs
Packit 13e616
Packit 13e616
* opensm/osm_mcast_tbl.c: fix sending invalid MF block due to max mlid
Packit 13e616
  overflow
Packit 13e616
Packit 13e616
* opensm: log_max_size config parameter in MB
Packit 13e616
Packit 13e616
* opensm/osm_ucast_lash: fix extra memory allocations
Packit 13e616
Packit 13e616
* opensm: fix race in main OpenSM flow
Packit 13e616
Packit 13e616
* opensm/ftree: fix GUID check against cn_guid_file
Packit 13e616
Packit 13e616
* opensm/ftree: save FLT buffers memory allocations
Packit 13e616
Packit 13e616
* opensm/osm_sa_link_record.c: prevent potential endless recursion
Packit 13e616
Packit 13e616
* opensm: remove SM from sm_guid_tbl when IsSM port capability flag is
Packit 13e616
  not set
Packit 13e616
Packit 13e616
* opensm: fix QoS config bug
Packit 13e616
Packit 13e616
* opensm: don't reassign zeroed params from config file
Packit 13e616
Packit 13e616
* Other less critical or visible bugs were also fixed.
Packit 13e616
Packit 13e616
5 Main Verification Flows
Packit 13e616
-------------------------
Packit 13e616
Packit 13e616
OpenSM verification is run using the following activities:
Packit 13e616
* osmtest - a stand-alone program
Packit 13e616
* ibmgtsim (IB management simulator) based - a set of flows that
Packit 13e616
  simulate clusters, inject errors and verify OpenSM capability to
Packit 13e616
  respond and bring up the network correctly.
Packit 13e616
* small cluster regression testing - where the SM is used on back to
Packit 13e616
  back or single switch configurations. The regression includes
Packit 13e616
  multiple OpenSM dedicated tests.
Packit 13e616
* cluster testing - when we run OpenSM to setup a large cluster, perform
Packit 13e616
  hand-off, reboots and reconnects, verify routing correctness and SA
Packit 13e616
  responsiveness at the ULP level (IPoIB and SDP).
Packit 13e616
Packit 13e616
5.1 osmtest
Packit 13e616
Packit 13e616
osmtest is an automated verification tool used for OpenSM
Packit 13e616
testing. Its verification flows are described by list below.
Packit 13e616
Packit 13e616
* Inventory File: Obtain and verify all port info, node info, link and path
Packit 13e616
  records parameters.
Packit 13e616
Packit 13e616
* Service Record:
Packit 13e616
   - Register new service
Packit 13e616
   - Register another service (with a lease period)
Packit 13e616
   - Register another service (with service p_key set to zero)
Packit 13e616
   - Get all services by name
Packit 13e616
   - Delete the first service
Packit 13e616
   - Delete the third service
Packit 13e616
   - Added bad flows of get/delete  non valid service
Packit 13e616
   - Add / Get same service with different data
Packit 13e616
   - Add / Get / Delete by different component  mask values (services
Packit 13e616
     by Name & Key / Name & Data / Name & Id / Id only )
Packit 13e616
Packit 13e616
* Multicast Member Record:
Packit 13e616
   - Query of existing Groups (IPoIB)
Packit 13e616
   - BAD Join with insufficient comp mask (o15.0.1.3)
Packit 13e616
   - Create given MGID=0 (o15.0.1.4)
Packit 13e616
   - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
Packit 13e616
   - Create BAD MGID=0xFA. (o15.0.1.6)
Packit 13e616
   - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
Packit 13e616
   - New MGID with invalid join state (o15.0.1.9)
Packit 13e616
   - Retry of existing MGID - See JoinState update (o15.0.1.11)
Packit 13e616
   - BAD RATE when connecting to existing MGID (o15.0.1.13)
Packit 13e616
   - Partial JoinState delete request - removing FullMember (o15.0.1.14)
Packit 13e616
   - Full Delete of a group (o15.0.1.14)
Packit 13e616
   - Verify Delete by trying to Join deleted group (o15.0.1.14)
Packit 13e616
   - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
Packit 13e616
Packit 13e616
* GUIDInfo Record:
Packit 13e616
   - All GUIDInfoRecords in subnet are obtained
Packit 13e616
Packit 13e616
* MultiPathRecord:
Packit 13e616
   - Perform some compliant and noncompliant MultiPathRecord requests
Packit 13e616
   - Validation is via status in responses and IB analyzer
Packit 13e616
Packit 13e616
* PKeyTableRecord:
Packit 13e616
  - Perform some compliant and noncompliant PKeyTableRecord queries
Packit 13e616
  - Validation is via status in responses and IB analyzer
Packit 13e616
Packit 13e616
* LinearForwardingTableRecord:
Packit 13e616
  - Perform some compliant and noncompliant LinearForwardingTableRecord queries
Packit 13e616
  - Validation is via status in responses and IB analyzer
Packit 13e616
Packit 13e616
* Event Forwarding: Register for trap forwarding using reports
Packit 13e616
   - Send a trap and wait for report
Packit 13e616
   - Unregister non-existing
Packit 13e616
Packit 13e616
* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
Packit 13e616
  disconnecting/connecting ports) and wait for report, then unregister.
Packit 13e616
Packit 13e616
* Stress Test: send PortInfoRecord queries, both single and RMPP and
Packit 13e616
  check for the rate of responses as well as their validity.
Packit 13e616
Packit 13e616
Packit 13e616
5.2 IB Management Simulator OpenSM Test Flows:
Packit 13e616
Packit 13e616
The simulator provides ability to simulate the SM handling of virtual
Packit 13e616
topologies that are not limited to actual lab equipment availability.
Packit 13e616
OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
Packit 13e616
regressions use smaller (16 and 128 nodes clusters).
Packit 13e616
Packit 13e616
The following test flows are run on the IB management simulator:
Packit 13e616
Packit 13e616
* Stability:
Packit 13e616
  Up to 12 links from the fabric are randomly selected to drop packets
Packit 13e616
  at drop rates up to 90%. The SM is required to succeed in bringing the
Packit 13e616
  fabric up. The resulting routing is verified to be correct as well.
Packit 13e616
Packit 13e616
* LID Manager:
Packit 13e616
  Using LMC = 2 the fabric is initialized with LIDs. Faults such as
Packit 13e616
  zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
Packit 13e616
  randomly assigned to various nodes and other errors are randomly
Packit 13e616
  output to the guid2lid cache file. The SM sweep is run 5 times and
Packit 13e616
  after each iteration a complete verification is made to ensure that all
Packit 13e616
  LIDs that could possibly be maintained are kept, as well as that all nodes
Packit 13e616
  were assigned a legal LID range.
Packit 13e616
Packit 13e616
* Multicast Routing:
Packit 13e616
  Nodes randomly join the 0xc000 group and eventually the
Packit 13e616
  resulting routing is verified for completeness and adherence to
Packit 13e616
  Up/Down routing rules.
Packit 13e616
Packit 13e616
* osmtest:
Packit 13e616
  The complete osmtest flow as described in the previous table is run on
Packit 13e616
  the simulated fabrics.
Packit 13e616
Packit 13e616
* Stress Test:
Packit 13e616
  This flow merges fabric, LID and stability issues with continuous
Packit 13e616
  PathRecord, ServiceRecord and Multicast Join/Leave activity to
Packit 13e616
  stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
Packit 13e616
  were added to the test such both existing and non existing nodes
Packit 13e616
  perform them in random order.
Packit 13e616
Packit 13e616
5.3 OpenSM Regression
Packit 13e616
Packit 13e616
Using a back-to-back or single switch connection, the following set of
Packit 13e616
tests is run nightly on the stacks described in table 2. The included
Packit 13e616
tests are:
Packit 13e616
Packit 13e616
* Stress Testing: Flood the SA with queries from multiple channel
Packit 13e616
  adapters to check the robustness of the entire stack up to the SA.
Packit 13e616
Packit 13e616
* Dynamic Changes: Dynamic Topology changes, through randomly
Packit 13e616
  dropping SMP packets, used to test OpenSM adaptation to an unstable
Packit 13e616
  network & verify DB correctness.
Packit 13e616
Packit 13e616
* Trap Injection: This flow injects traps to the SM and verifies that it
Packit 13e616
  handles them gracefully.
Packit 13e616
Packit 13e616
* SA Query Test: This test exhaustively checks the SA responses to all
Packit 13e616
  possible single component mask. To do that the test examines the
Packit 13e616
  entire set of records the SA can provide, classifies them by their
Packit 13e616
  field values and then selects every field (using component mask and a
Packit 13e616
  value) and verifies that the response matches the expected set of records.
Packit 13e616
  A random selection using multiple component mask bits is also performed.
Packit 13e616
Packit 13e616
5.4 Cluster testing:
Packit 13e616
Packit 13e616
Cluster testing is usually run before a distribution release. It
Packit 13e616
involves real hardware setups of 16 to 32 nodes (or more if a beta site
Packit 13e616
is available). Each test is validated by running all-to-all ping through the IB
Packit 13e616
interface. The test procedure includes:
Packit 13e616
Packit 13e616
* Cluster bringup
Packit 13e616
Packit 13e616
* Hand-off between 2 or 3 SM's while performing:
Packit 13e616
  - Node reboots
Packit 13e616
  - Switch power cycles (disconnecting the SM's)
Packit 13e616
Packit 13e616
* Unresponsive port detection and recovery
Packit 13e616
Packit 13e616
* osmtest from multiple nodes
Packit 13e616
Packit 13e616
* Trap injection and recovery
Packit 13e616
Packit 13e616
Packit 13e616
6 Qualified Software Stacks and Devices
Packit 13e616
---------------------------------------
Packit 13e616
Packit 13e616
OpenSM Compatibility
Packit 13e616
--------------------
Packit 13e616
Note that OpenSM version 3.2.1 and earlier used a value of 1 in host
Packit 13e616
byte order for the default SM_Key, so there is a compatibility issue
Packit 13e616
with these earlier versions of OpenSM when the 3.2.2 or later version
Packit 13e616
is running on a little endian machine. This affects SM handover as well
Packit 13e616
as SA queries (saquery tool in infiniband-diags).
Packit 13e616
Packit 13e616
Packit 13e616
Table 2 - Qualified IB Stacks
Packit 13e616
=============================
Packit 13e616
Packit 13e616
Stack                                    | Version
Packit 13e616
-----------------------------------------|--------------------------
Packit 13e616
OFED                                     |   1.4
Packit 13e616
OFED                                     |   1.3
Packit 13e616
OFED                                     |   1.2
Packit 13e616
OFED                                     |   1.1
Packit 13e616
OFED                                     |   1.0
Packit 13e616
OpenIB Gen2 (IBG2 distribution)          |   1.0
Packit 13e616
OpenIB Gen1 (IBGD distribution)          |   1.8.0
Packit 13e616
VAPI (Mellanox InfiniBand HCA Driver)    |   3.2 and later
Packit 13e616
Packit 13e616
Table 3 - Qualified Devices and Corresponding Firmware
Packit 13e616
======================================================
Packit 13e616
Packit 13e616
Mellanox
Packit 13e616
Device                              |   FW versions
Packit 13e616
------------------------------------|-------------------------------
Packit 13e616
InfiniScale                         | fw-43132  5.2.000 (and later)
Packit 13e616
InfiniScale III                     | fw-47396  0.5.000 (and later)
Packit 13e616
InfiniScale IV                      | fw-48436  7.1.000 (and later)
Packit 13e616
InfiniHost                          | fw-23108  3.5.000 (and later)
Packit 13e616
InfiniHost III Lx                   | fw-25204  1.2.000 (and later)
Packit 13e616
InfiniHost III Ex (InfiniHost Mode) | fw-25208  4.8.200 (and later)
Packit 13e616
InfiniHost III Ex (MemFree Mode)    | fw-25218  5.3.000 (and later)
Packit 13e616
ConnectX IB                         | fw-25408  2.3.000 (and later)
Packit 13e616
Packit 13e616
QLogic/PathScale
Packit 13e616
Device  |   Note
Packit 13e616
--------|-----------------------------------------------------------
Packit 13e616
iPath   | QHT6040 (PathScale InfiniPath HT-460)
Packit 13e616
iPath   | QHT6140 (PathScale InfiniPath HT-465)
Packit 13e616
iPath   | QLE6140 (PathScale InfiniPath PE-880)
Packit 13e616
iPath   | QLE7240
Packit 13e616
iPath   | QLE7280
Packit 13e616
Packit 13e616
Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
Packit 13e616
QP0 and QP1. However, it does support it as a device on the subnet.
Packit 13e616
Packit 13e616
Note 2: QoS firmware and Mellanox devices
Packit 13e616
Packit 13e616
HCAs: QoS supported by ConnectX. QoS-enabled FW release is 2_5_000 and
Packit 13e616
later.
Packit 13e616
Packit 13e616
Switches: QoS supported by InfiniScale III
Packit 13e616
Any InfiniScale III FW that is supported by OpenSM supports QoS.