OpenSM Performance manager HOWTO ================================ Introduction ============ OpenSM now includes a performance manager which collects port counters from the subnet and stores them internally in OpenSM. Some of the features of the performance manager are: 1) Collect port data and error counters per v1.2.1 spec and store in 64 bit internal counts. 2) Automatic reset of counters when they reach approximately 3/4 full. (While not guaranteeing that counts will not be missed, this does keep counts incrementing as best as possible given the current spec limitations.) 3) Basic warnings in the OpenSM log on "critical" errors like symbol errors. 4) Automatically detects "outside" resets of counters and adjusts to continue collecting data. 5) Can be run when OpenSM is in standby or inactive states in addition to master state. Known issues are: 1) Data counters will be lost on high data rate links. Sweeping the fabric fast enough for even a DDR link is not practical. 2) Default partition support only. Setup and Usage =============== Using the Performance Manager consists of 3 steps: 1) compiling in support for the perfmgr (Optionally: the console socket as well) 2) enabling the perfmgr and console in opensm.conf 3) retrieving data which has been collected. 3a) using console to "dump data" 3b) using a plugin module to store the data to your own "database" Step 1: Compile in support for the Performance Manager ------------------------------------------------------ At this time, it is really best to enable the console socket option as well. OpenSM can be run in an "interactive" mode. But with the console socket option turned on one can also make a connection to a running OpenSM. By default, only "loopback" is enabled with the console with socket being a compile time option. Regardless, please be aware of your network security configuration for as the commands presented in the console can affect the operation of your subnet. Step 2: Enable the perfmgr and console in opensm.conf ----------------------------------------------------- Turning the Performance Manager on is pretty easy, set the following options in the opensm.conf config file. (Default location is /usr/local/etc/opensm/opensm.conf) # Turn it all on perfmgr TRUE # redirection enable perfmgr_redir TRUE # sweep time in seconds perfmgr_sweep_time_s 180 # Max outstanding queries perfmgr_max_outstanding_queries 500 # Ignore CAs on sweep perfmgr_ignore_cas FALSE # Remove missing nodes from DB perfmgr_rm_nodes TRUE # Log error counters to opensm.log perfmgr_log_errors TRUE # Query PerfMgt Get(ClassPortInfo) for extended capabilities # Extended capabilities include 64 bit extended counters # and transmit wait support perfmgr_query_cpi TRUE # Log xmit_wait errors perfmgr_xmit_wait_log FALSE # If logging xmit_wait's; set threshold perfmgr_xmit_wait_threshold 65535 # Dump file to dump the events to event_db_dump_file /var/log/opensm_port_counters.log Also, enable the console socket and configure the port for it to listen to if desired. # console [off|local|loopback|socket] console socket # Telnet port for console (default 10000) console_port 10000 "local" is only useful if you run OpenSM in the foreground. Step 3: Retrieve data which has been collected ---------------------------------------------- Step 3a: Using console dump function ------------------------------------ The console command "perfmgr dump_counters" will dump counters to the file specified in the opensm.conf file. In the example above "/var/log/opensm_port_counters.log" Example output is below: "SW1 wopr ISR9024D (MLX4 FW)" 0x8f10400411f56 port 1 (Since Mon May 12 13:27:14 2008) symbol_err_cnt : 0 link_err_recover : 0 link_downed : 0 rcv_err : 0 rcv_rem_phys_err : 0 rcv_switch_relay_err : 2 xmit_discards : 0 xmit_constraint_err : 0 rcv_constraint_err : 0 link_integrity_err : 0 buf_overrun_err : 0 vl15_dropped : 0 xmit_data : 470435 rcv_data : 405956 xmit_pkts : 8954 rcv_pkts : 6900 unicast_xmit_pkts : 0 unicast_rcv_pkts : 0 multicast_xmit_pkts : 0 multicast_rcv_pkts : 0 Step 3b: Using a plugin module ------------------------------ If you want a more automated method of retrieving the data OpenSM provides a plugin interface to extend OpenSM. The header file is osm_event_plugin.h. The functions you register with this interface will be called when data is collected. You can then use that data as appropriate. An example plugin can be configured at compile time using the "--enable-default-event-plugin" option on the configure line. This plugin is very simple. It logs "events" received from the performance manager to a log file. I don't recommend using this directly but rather use it as a template to create your own plugin.