Blame doc/performance-manager-HOWTO.txt

Packit 13e616
OpenSM Performance manager HOWTO
Packit 13e616
================================
Packit 13e616
Packit 13e616
Introduction
Packit 13e616
============
Packit 13e616
Packit 13e616
OpenSM now includes a performance manager which collects port counters from
Packit 13e616
the subnet and stores them internally in OpenSM.
Packit 13e616
Packit 13e616
Some of the features of the performance manager are:
Packit 13e616
Packit 13e616
	1) Collect port data and error counters per v1.2.1 spec and store in
Packit 13e616
	   64 bit internal counts.
Packit 13e616
	2) Automatic reset of counters when they reach approximately 3/4 full.
Packit 13e616
	   (While not guaranteeing that counts will not be missed, this does
Packit 13e616
	   keep counts incrementing as best as possible given the current
Packit 13e616
	   spec limitations.)
Packit 13e616
	3) Basic warnings in the OpenSM log on "critical" errors like symbol
Packit 13e616
	   errors.
Packit 13e616
	4) Automatically detects "outside" resets of counters and adjusts to
Packit 13e616
	   continue collecting data.
Packit 13e616
	5) Can be run when OpenSM is in standby or inactive states in
Packit 13e616
	   addition to master state.
Packit 13e616
Packit 13e616
Known issues are:
Packit 13e616
Packit 13e616
	1) Data counters will be lost on high data rate links.  Sweeping the
Packit 13e616
	   fabric fast enough for even a DDR link is not practical.
Packit 13e616
	2) Default partition support only.
Packit 13e616
Packit 13e616
Packit 13e616
Setup and Usage
Packit 13e616
===============
Packit 13e616
Packit 13e616
Using the Performance Manager consists of 3 steps:
Packit 13e616
Packit 13e616
	1) compiling in support for the perfmgr (Optionally: the console
Packit 13e616
	   socket as well)
Packit 13e616
	2) enabling the perfmgr and console in opensm.conf
Packit 13e616
	3) retrieving data which has been collected.
Packit 13e616
	   3a) using console to "dump data"
Packit 13e616
	   3b) using a plugin module to store the data to your own
Packit 13e616
	       "database"
Packit 13e616
Packit 13e616
Step 1: Compile in support for the Performance Manager
Packit 13e616
------------------------------------------------------
Packit 13e616
Packit 13e616
At this time, it is really best to enable the console socket option as well.
Packit 13e616
OpenSM can be run in an "interactive" mode.  But with the console socket
Packit 13e616
option turned on one can also make a connection to a running OpenSM.  By
Packit 13e616
default, only "loopback" is enabled with the console with socket being a
Packit 13e616
compile time option.  Regardless, please be aware of your network security
Packit 13e616
configuration for as the commands presented in the console can affect the
Packit 13e616
operation of your subnet.
Packit 13e616
Packit 13e616
Packit 13e616
Step 2: Enable the perfmgr and console in opensm.conf
Packit 13e616
-----------------------------------------------------
Packit 13e616
Packit 13e616
Turning the Performance Manager on is pretty easy, set the following options in
Packit 13e616
the opensm.conf config file.  (Default location is
Packit 13e616
/usr/local/etc/opensm/opensm.conf)
Packit 13e616
Packit 13e616
	# Turn it all on
Packit 13e616
	perfmgr TRUE
Packit 13e616
Packit 13e616
	# redirection enable
Packit 13e616
	perfmgr_redir TRUE
Packit 13e616
Packit 13e616
	# sweep time in seconds
Packit 13e616
	perfmgr_sweep_time_s 180
Packit 13e616
Packit 13e616
	# Max outstanding queries
Packit 13e616
	perfmgr_max_outstanding_queries 500
Packit 13e616
Packit 13e616
	# Ignore CAs on sweep
Packit 13e616
	perfmgr_ignore_cas FALSE
Packit 13e616
Packit 13e616
	# Remove missing nodes from DB
Packit 13e616
	perfmgr_rm_nodes TRUE
Packit 13e616
Packit 13e616
	# Log error counters to opensm.log
Packit 13e616
	perfmgr_log_errors TRUE
Packit 13e616
Packit 13e616
	# Query PerfMgt Get(ClassPortInfo) for extended capabilities
Packit 13e616
	# Extended capabilities include 64 bit extended counters
Packit 13e616
	# and transmit wait support
Packit 13e616
	perfmgr_query_cpi TRUE
Packit 13e616
Packit 13e616
	# Log xmit_wait errors
Packit 13e616
	perfmgr_xmit_wait_log FALSE
Packit 13e616
Packit 13e616
	# If logging xmit_wait's; set threshold
Packit 13e616
	perfmgr_xmit_wait_threshold 65535
Packit 13e616
Packit 13e616
	# Dump file to dump the events to
Packit 13e616
	event_db_dump_file /var/log/opensm_port_counters.log
Packit 13e616
Packit 13e616
Also, enable the console socket and configure the port for it to listen to if
Packit 13e616
desired.
Packit 13e616
Packit 13e616
	# console [off|local|loopback|socket]
Packit 13e616
	console socket
Packit 13e616
Packit 13e616
	# Telnet port for console (default 10000)
Packit 13e616
	console_port 10000
Packit 13e616
Packit 13e616
	"local" is only useful if you run OpenSM in the foreground.
Packit 13e616
Packit 13e616
Packit 13e616
Step 3: Retrieve data which has been collected
Packit 13e616
----------------------------------------------
Packit 13e616
Packit 13e616
Step 3a: Using console dump function
Packit 13e616
------------------------------------
Packit 13e616
Packit 13e616
The console command "perfmgr dump_counters" will dump counters to the file
Packit 13e616
specified in the opensm.conf file.  In the example above
Packit 13e616
"/var/log/opensm_port_counters.log"
Packit 13e616
Packit 13e616
Example output is below:
Packit 13e616
Packit 13e616
<snip>
Packit 13e616
"SW1 wopr ISR9024D (MLX4 FW)" 0x8f10400411f56 port 1 (Since Mon May 12 13:27:14 2008)
Packit 13e616
     symbol_err_cnt       : 0
Packit 13e616
     link_err_recover     : 0
Packit 13e616
     link_downed          : 0
Packit 13e616
     rcv_err              : 0
Packit 13e616
     rcv_rem_phys_err     : 0
Packit 13e616
     rcv_switch_relay_err : 2
Packit 13e616
     xmit_discards        : 0
Packit 13e616
     xmit_constraint_err  : 0
Packit 13e616
     rcv_constraint_err   : 0
Packit 13e616
     link_integrity_err   : 0
Packit 13e616
     buf_overrun_err      : 0
Packit 13e616
     vl15_dropped         : 0
Packit 13e616
     xmit_data            : 470435
Packit 13e616
     rcv_data             : 405956
Packit 13e616
     xmit_pkts            : 8954
Packit 13e616
     rcv_pkts             : 6900
Packit 13e616
     unicast_xmit_pkts    : 0
Packit 13e616
     unicast_rcv_pkts     : 0
Packit 13e616
     multicast_xmit_pkts  : 0
Packit 13e616
     multicast_rcv_pkts   : 0
Packit 13e616
</snip>
Packit 13e616
Packit 13e616
Packit 13e616
Step 3b: Using a plugin module
Packit 13e616
------------------------------
Packit 13e616
Packit 13e616
If you want a more automated method of retrieving the data OpenSM provides a
Packit 13e616
plugin interface to extend OpenSM.  The header file is osm_event_plugin.h.
Packit 13e616
The functions you register with this interface will be called when data is
Packit 13e616
collected.  You can then use that data as appropriate.
Packit 13e616
Packit 13e616
An example plugin can be configured at compile time using the
Packit 13e616
"--enable-default-event-plugin" option on the configure line.  This plugin is
Packit 13e616
very simple.  It logs "events" received from the performance manager to a log
Packit 13e616
file.  I don't recommend using this directly but rather use it as a template to
Packit 13e616
create your own plugin.
Packit 13e616