Monday, April 6, 2015

QRadar High Availability (HA) considerations and tips

In this post my intention is just to give some quick points on QRadar High Availability (HA)

    1.    HA Overview
        -    Uses Primary and Secondary HA hosts
        -    Uses Virtual IPs
        -    Network connectivity is tested via hearbeat (pings) to all managed hosts
        -    HA Can be configured for either  console or managed host
        -    Both devices must have the same versions of the software
        -    Both devices must support the same DSM, scanner and protocols RPMs
        -    Uses data synchronization or shared external storage
        -    Consistency is maintained locally by using Distributed Replicated Block Device (DRDB)
        -    If using external storage data consistency is maintained through iSCSI or Fibre Channel
        -    Data is synchronized in real time
        -    Note: Asset profiler can impact DRDB speed
        -    "/store" partition on secondary is automatically replicated to the secondary host
        -    Ensure min 1 Gbps between primary and secondary HA hosts
        -    Initial synchronization can take greater than 24 hours
            This may be an understatement. I've seen initial synchronization take upwards of 72 hours.
        -    Secondary host goes into "standby" after synchronization
        -    Primary HA hosts status becomes "offline" when restored from a failover
        -    Primary needs to be placed "online" before it becomes active
        -    Disk replication is enabled while primary is "offline"
        -    Post disk failover synchronization is faster
        -    Basically uses deltas
        -     When the primary host is restored, only the data collected by the secondary during the period the primary was unavailable is synchronized
        -    Replacing or reformating the disk on the primary can result in longer synchronization time in the event of a failback
       
       
   
        IP Considerations
            -    Uses Virtual IPs
            -    Needs 3 IP address - VIP, Primary and Secondary
            -    The IP address initially configured on the primary host is automatically made the cluster VIP
            -    A new IP will need to be assigned to the primary once HA configuration is started
            -    Primary host can act as a standby for secondary
            -    VIP is used by a host that has a status of active
            -    All IPs must be in the same subnet
            -    Latency must be less than 2ms for traffic crosing the WAN
           
           
        HA Wizard
            -    Used to configure Primary, Secondary and cluster VIP
            -    Verifies the secondary has a valid HA activation key
            -    Verifies the secondary is not part of an existing HA cluster
            -    Verifies software version is the same on both devices
            -    Verifies external storage (if configured) on primary and then secondary
            -    Verifies both support the same DSM, scanner and protocol RPMS
           
           
        Failover scenarios
            -    Power supply failure
            -    network failure (detected by connectivity tests)
            -    OS malfunction that delays or stops hearbeat tests
            -    RAID failure
            -    Manual failover
            -     Management interface failure on primary hosts
           
            -    Primary does not take back its role as primary in the case of a failover.
            -    Secondary stays as primary while primary acts as standy
            -    Primary must be switched to "active" to take over its role
           
            -    No failover for software errors or disk capacity issues
            -    If both primary and secondary are unable to ping a managed hosts no failover occurs
            -    If primary cannot but secondary can ping a managed host, failover occurs
           
           
       
       
        HA Failover event sequence
            -    File systems are mounted
            -    Management interface alias is created eth0 is eth0:0
            -    VIP is assigned to the alias
            -    QRadar services are started
            -    Secondary connects to console and downloads configuration files
           
           
        Tips for manual synchronization
            -    Ensure primary and secondary hosts are sync'd
            -    Secondary must be in standby
            -    Secondary to offline and power off the primary
            -    DO NOT MANUALLY FORCE FAILOVER DURING PATCHES OR SOFTWARE UPGRADES
                     
       
    2.    HA Planning
        -    File systems on both devices much match - ext-3, etc
        -    Secondary's "/store" partion must be equal to or greater than the primary
        -    Both devices should have the same number of interfaces
        -    Both must use the same management interface
        -    Only 1 VIP
        -    Port 7789 is needed for Distributed Replicated Block Device (DRDB)
        -    DRBD traffic is bidirectional
        -    Disk replication ensures software updates are applied to the secondary
        -    Ensure the host has a valid activation key
       
   
    3.    HA Management
        -    Uses System and License management window to:
            -    monitor HA
            -    Force failover
            -    Disconnect cluster
            -    Modify cluster settings
            -    Modify heartbeat interval
            -    Place the device in "offline" mode before maintenance



Further further information, please see the relevant IBM QRadar High Availability Guide available here.

8 comments:

  1. how to remove ha cluster with ha_setup.sh?

    ReplyDelete
  2. ha_setup.sh -uninstall from the primary

    ReplyDelete
  3. This article is very informative and very well to the point.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Hello Nik,

    Thanks for the post.

    I am trying to install Qradar HA on Console.

    While running the Wizard Which option should I choose on primary x Software Install or x High Availability.

    And Which option should i choose on step 2 of Wizard Normal Install or HA recovery?

    Kindly help.

    ReplyDelete
    Replies
    1. Can I see a screenshot? Send me a mail at my email address.

      Delete