ITBusiness.ca

Maintaining the health of SAN configurations

Today, the high-end SAN configurations involve multiple hosts, each with multiple host bus adapters (HBAs), multiple fabrics, consisting of multiple fibre channel switches, and multiple storage subsystems with multiple fibre channel ports.

While these multiple components provide the ability

to create highly redundant configurations with multiple data paths between the hosts and the storage, it is very important to keep track of the end-to-end availability of these paths and to ensure that the logical unit numbers (LUNs) configured on the storage subsystems are balanced across these multiple paths.

The redundancy of paths between hosts and storage allows for continued functionality even if an individual component fails.

In order to validate the redundancy of a SAN configuration, it is imperative to confirm that the number of end-to-end paths and balance of LUNs is in accordance with the SAN design.

It is also very important to confirm this before and after performing regular maintenance of components of the configuration.

Today, the high-end SAN configurations involve multiple hosts, each with multiple host bus adapters (HBAs), multiple fabrics, consisting of multiple fibre channel switches, and multiple storage subsystems with multiple fibre channel ports.

While these multiple components provide the ability to create highly redundant configurations with multiple data paths between the hosts and the storage, it is very important to keep track of the end-to-end availability of these paths and to ensure that the logical unit numbers (LUNs) configured on the storage subsystems are balanced across these multiple paths.

The redundancy of paths between hosts and storage allows for continued functionality even if an individual component fails.

In order to validate the redundancy of a SAN configuration, it is imperative to confirm that the number of end-to-end paths and balance of LUNs is in accordance with the SAN design. It is also very important to confirm this before and after performing regular maintenance of components of the configuration.

The Asymmetric LUN Presentation architecture of the HSV110 and HSG80 (storage array controllers for the EVA and EMA storage subsystems respectively) controllers makes a LUN configured on these controllers accessible for I/O from (or is online to) only one of the controllers from the controller pair at a time.

For a balanced performance, half of the LUNs should be online to each of the controllers from the pair.

There are various mechanisms to define the preferred path (or the preferred controller) for a LUN to come online to. Please refer to the EVA and EMA documentation for more details concerning path preference mechanisms.

While it is important to give directives to the controllers for the path preference of each LUN (thereby reaching a LUN balance), it is also important to confirm from the host that the LUN balance is being respected.

Refer to appendix 4 for a script called show_lun_balance that uses various utilities available on Tru64 UNIX and prints the controller WWID (Fibre ]\ Channel World Wide Identifier), followed by Top/Bot for HSG80 and A/B for HSV110, and the disk devices that are online to that controller from the pair.

For a healthy configuration this output should match the directives given to the controllers during LUN configuration and should stay that way in the absence of any path outages.

In case of an imbalance, it is possible to flip the LUN from a controller to its partner in the pair. Refer to appendix 5 for a script called flip_ctrl_disk that takes a list of disk devices and attempts to flip the corresponding LUN.

If LUNs are getting out of balance and are not staying on a controller despite of flipping, it should be a cause of concern and it indicates the necessity to perform a detailed analysis of the SAN configuration.

While providing an exhaustive list of troubleshooting guidelines is not within the scope of this document, the following checks often lead to the root cause of the issue:

• Log into the fibre channel switches and check the following: the fabric is correctly formed, every device port is logged in correctly, the port speed for every port is set or negotiated correctly if zoning is present, the intended zoning configuration and the actual zoning configuration match, there are no excessive errors on any of the fabric ports

• Find out from each of the HSV110 and/or HSG80 configurations that: the LUNs are correctly presented to all the hosts, all of the host HBA ports are defined, the host functionality mode is appropriately set.

All of these checks provide fundamental information about the health of the SAN configuration and should be performed on a regular basis (for example, once a day)as well as before and after any SAN configuration changes.

Exit mobile version