|
Packit |
13e616 |
OpenSM Performance manager HOWTO
|
|
Packit |
13e616 |
================================
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Introduction
|
|
Packit |
13e616 |
============
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
OpenSM now includes a performance manager which collects port counters from
|
|
Packit |
13e616 |
the subnet and stores them internally in OpenSM.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Some of the features of the performance manager are:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1) Collect port data and error counters per v1.2.1 spec and store in
|
|
Packit |
13e616 |
64 bit internal counts.
|
|
Packit |
13e616 |
2) Automatic reset of counters when they reach approximately 3/4 full.
|
|
Packit |
13e616 |
(While not guaranteeing that counts will not be missed, this does
|
|
Packit |
13e616 |
keep counts incrementing as best as possible given the current
|
|
Packit |
13e616 |
spec limitations.)
|
|
Packit |
13e616 |
3) Basic warnings in the OpenSM log on "critical" errors like symbol
|
|
Packit |
13e616 |
errors.
|
|
Packit |
13e616 |
4) Automatically detects "outside" resets of counters and adjusts to
|
|
Packit |
13e616 |
continue collecting data.
|
|
Packit |
13e616 |
5) Can be run when OpenSM is in standby or inactive states in
|
|
Packit |
13e616 |
addition to master state.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Known issues are:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1) Data counters will be lost on high data rate links. Sweeping the
|
|
Packit |
13e616 |
fabric fast enough for even a DDR link is not practical.
|
|
Packit |
13e616 |
2) Default partition support only.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Setup and Usage
|
|
Packit |
13e616 |
===============
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Using the Performance Manager consists of 3 steps:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
1) compiling in support for the perfmgr (Optionally: the console
|
|
Packit |
13e616 |
socket as well)
|
|
Packit |
13e616 |
2) enabling the perfmgr and console in opensm.conf
|
|
Packit |
13e616 |
3) retrieving data which has been collected.
|
|
Packit |
13e616 |
3a) using console to "dump data"
|
|
Packit |
13e616 |
3b) using a plugin module to store the data to your own
|
|
Packit |
13e616 |
"database"
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Step 1: Compile in support for the Performance Manager
|
|
Packit |
13e616 |
------------------------------------------------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
At this time, it is really best to enable the console socket option as well.
|
|
Packit |
13e616 |
OpenSM can be run in an "interactive" mode. But with the console socket
|
|
Packit |
13e616 |
option turned on one can also make a connection to a running OpenSM. By
|
|
Packit |
13e616 |
default, only "loopback" is enabled with the console with socket being a
|
|
Packit |
13e616 |
compile time option. Regardless, please be aware of your network security
|
|
Packit |
13e616 |
configuration for as the commands presented in the console can affect the
|
|
Packit |
13e616 |
operation of your subnet.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Step 2: Enable the perfmgr and console in opensm.conf
|
|
Packit |
13e616 |
-----------------------------------------------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Turning the Performance Manager on is pretty easy, set the following options in
|
|
Packit |
13e616 |
the opensm.conf config file. (Default location is
|
|
Packit |
13e616 |
/usr/local/etc/opensm/opensm.conf)
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# Turn it all on
|
|
Packit |
13e616 |
perfmgr TRUE
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# redirection enable
|
|
Packit |
13e616 |
perfmgr_redir TRUE
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# sweep time in seconds
|
|
Packit |
13e616 |
perfmgr_sweep_time_s 180
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# Max outstanding queries
|
|
Packit |
13e616 |
perfmgr_max_outstanding_queries 500
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# Ignore CAs on sweep
|
|
Packit |
13e616 |
perfmgr_ignore_cas FALSE
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# Remove missing nodes from DB
|
|
Packit |
13e616 |
perfmgr_rm_nodes TRUE
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# Log error counters to opensm.log
|
|
Packit |
13e616 |
perfmgr_log_errors TRUE
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# Query PerfMgt Get(ClassPortInfo) for extended capabilities
|
|
Packit |
13e616 |
# Extended capabilities include 64 bit extended counters
|
|
Packit |
13e616 |
# and transmit wait support
|
|
Packit |
13e616 |
perfmgr_query_cpi TRUE
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# Log xmit_wait errors
|
|
Packit |
13e616 |
perfmgr_xmit_wait_log FALSE
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# If logging xmit_wait's; set threshold
|
|
Packit |
13e616 |
perfmgr_xmit_wait_threshold 65535
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# Dump file to dump the events to
|
|
Packit |
13e616 |
event_db_dump_file /var/log/opensm_port_counters.log
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Also, enable the console socket and configure the port for it to listen to if
|
|
Packit |
13e616 |
desired.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# console [off|local|loopback|socket]
|
|
Packit |
13e616 |
console socket
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
# Telnet port for console (default 10000)
|
|
Packit |
13e616 |
console_port 10000
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
"local" is only useful if you run OpenSM in the foreground.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Step 3: Retrieve data which has been collected
|
|
Packit |
13e616 |
----------------------------------------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Step 3a: Using console dump function
|
|
Packit |
13e616 |
------------------------------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
The console command "perfmgr dump_counters" will dump counters to the file
|
|
Packit |
13e616 |
specified in the opensm.conf file. In the example above
|
|
Packit |
13e616 |
"/var/log/opensm_port_counters.log"
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Example output is below:
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
<snip>
|
|
Packit |
13e616 |
"SW1 wopr ISR9024D (MLX4 FW)" 0x8f10400411f56 port 1 (Since Mon May 12 13:27:14 2008)
|
|
Packit |
13e616 |
symbol_err_cnt : 0
|
|
Packit |
13e616 |
link_err_recover : 0
|
|
Packit |
13e616 |
link_downed : 0
|
|
Packit |
13e616 |
rcv_err : 0
|
|
Packit |
13e616 |
rcv_rem_phys_err : 0
|
|
Packit |
13e616 |
rcv_switch_relay_err : 2
|
|
Packit |
13e616 |
xmit_discards : 0
|
|
Packit |
13e616 |
xmit_constraint_err : 0
|
|
Packit |
13e616 |
rcv_constraint_err : 0
|
|
Packit |
13e616 |
link_integrity_err : 0
|
|
Packit |
13e616 |
buf_overrun_err : 0
|
|
Packit |
13e616 |
vl15_dropped : 0
|
|
Packit |
13e616 |
xmit_data : 470435
|
|
Packit |
13e616 |
rcv_data : 405956
|
|
Packit |
13e616 |
xmit_pkts : 8954
|
|
Packit |
13e616 |
rcv_pkts : 6900
|
|
Packit |
13e616 |
unicast_xmit_pkts : 0
|
|
Packit |
13e616 |
unicast_rcv_pkts : 0
|
|
Packit |
13e616 |
multicast_xmit_pkts : 0
|
|
Packit |
13e616 |
multicast_rcv_pkts : 0
|
|
Packit |
13e616 |
</snip>
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
Step 3b: Using a plugin module
|
|
Packit |
13e616 |
------------------------------
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
If you want a more automated method of retrieving the data OpenSM provides a
|
|
Packit |
13e616 |
plugin interface to extend OpenSM. The header file is osm_event_plugin.h.
|
|
Packit |
13e616 |
The functions you register with this interface will be called when data is
|
|
Packit |
13e616 |
collected. You can then use that data as appropriate.
|
|
Packit |
13e616 |
|
|
Packit |
13e616 |
An example plugin can be configured at compile time using the
|
|
Packit |
13e616 |
"--enable-default-event-plugin" option on the configure line. This plugin is
|
|
Packit |
13e616 |
very simple. It logs "events" received from the performance manager to a log
|
|
Packit |
13e616 |
file. I don't recommend using this directly but rather use it as a template to
|
|
Packit |
13e616 |
create your own plugin.
|
|
Packit |
13e616 |
|