Saturday, March 3, 2012

Cluster XL HA - Going ACTIVE/ACTIVE - BOTH ACTIVE

HA Cluster XL - Going ACTIVE/ACTIVE=============
Model : Power-1/UTM-1/Secure Platform
Things need to be checked to make the pairs identical and to avoid Active / Active stituation and an outage
1. Check Cable of Sync - it should be either cross cable (I have seen the use of a convertor in straight cable to make the cable cross, this can be used) or straight cable thru a dedicated switch/Vlan - THIS IS MOST important
2. Check the cluster XL method - broadcast /multicast , it should be same in both the members
Verify :
[Expert@gehfgmuswaudc31]# cat $FWDIR/boot/ha_boot.conf ha_installed 1 ccp_mode broadcast [Expert@gehfgmuswaudc31]#
in above example it is broadcast mode, even the default is multicast, in cisco gear I found issues with multicast having a low priority and packet dropped eventually in busy networks.
To make it broadcast mode "cphaconf set_ccp broadcast" To make it multicast mode "cphaconf set_ccp multicast"
You will find error logs in tracker when interface is flapping, you must set the mode to broadcast
3. check the values in $FWDIR/boot/modules/fwkern.conf, this value should be same in all members
verify :
[Expert@gehfgmuswaudc31]# cat /opt/CPsuite-R70/fw1/boot/modules/fwkern.conf fwha_mac_magic=0x1f fwha_mac_forward_magic=0x20
5. Disable all interfaces which are not used
you can either disable in webui or use the CLI command as follows
ifconfig down ifconfig --save
You may require reboot after this
I have observed that some case, need to explicitly tell clusterXL about the unused interafces - sk30060 - I never used it, always diabled interfaces
ie , declare the interface which are not used in the below file cpstop
$FWDIR/conf/discntd.if
cpstart
to get the interface name, use command : fw getifs
6. Check CoreXL disabled or enabled in all boxes - coreXL can be enabled in boxes with have min 4 core or more. Licenses should be there for those many cores
- So better disable it if not used in all boxes
7. If the above steps do not resolve the reported behavior, then open ticket with CP with following info
A) make sure the cluster is enabled on both members
B) make sure the problem is replicated
C) collect at the same time CPinfo file from both members
D) collect at the same time CPinfo file from MGMT server
E) run the following debug on both members for 5 minutes
# fw ctl debug 0 # fw ctl debug -buf 32000 # fw ctl debug -m cluster + conf if pnote stat # fw ctl kdebug -T -f 1>> /var/log/debug.txt 2>> /var/log/debug.txt let the debug run for 5 minutes press CTRL+C # fw ctl debug 0
Collect /var/log/debug.txt from each member

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.