Monday, April 6, 2015

QRadar High Availability (HA) considerations and tips

In this post my intention is just to give some quick points on QRadar High Availability (HA)

    1.    HA Overview
        -    Uses Primary and Secondary HA hosts
        -    Uses Virtual IPs
        -    Network connectivity is tested via hearbeat (pings) to all managed hosts
        -    HA Can be configured for either  console or managed host
        -    Both devices must have the same versions of the software
        -    Both devices must support the same DSM, scanner and protocols RPMs
        -    Uses data synchronization or shared external storage
        -    Consistency is maintained locally by using Distributed Replicated Block Device (DRDB)
        -    If using external storage data consistency is maintained through iSCSI or Fibre Channel
        -    Data is synchronized in real time
        -    Note: Asset profiler can impact DRDB speed
        -    "/store" partition on secondary is automatically replicated to the secondary host
        -    Ensure min 1 Gbps between primary and secondary HA hosts
        -    Initial synchronization can take greater than 24 hours
            This may be an understatement. I've seen initial synchronization take upwards of 72 hours.
        -    Secondary host goes into "standby" after synchronization
        -    Primary HA hosts status becomes "offline" when restored from a failover
        -    Primary needs to be placed "online" before it becomes active
        -    Disk replication is enabled while primary is "offline"
        -    Post disk failover synchronization is faster
        -    Basically uses deltas
        -     When the primary host is restored, only the data collected by the secondary during the period the primary was unavailable is synchronized
        -    Replacing or reformating the disk on the primary can result in longer synchronization time in the event of a failback
        IP Considerations
            -    Uses Virtual IPs
            -    Needs 3 IP address - VIP, Primary and Secondary
            -    The IP address initially configured on the primary host is automatically made the cluster VIP
            -    A new IP will need to be assigned to the primary once HA configuration is started
            -    Primary host can act as a standby for secondary
            -    VIP is used by a host that has a status of active
            -    All IPs must be in the same subnet
            -    Latency must be less than 2ms for traffic crosing the WAN
        HA Wizard
            -    Used to configure Primary, Secondary and cluster VIP
            -    Verifies the secondary has a valid HA activation key
            -    Verifies the secondary is not part of an existing HA cluster
            -    Verifies software version is the same on both devices
            -    Verifies external storage (if configured) on primary and then secondary
            -    Verifies both support the same DSM, scanner and protocol RPMS
        Failover scenarios
            -    Power supply failure
            -    network failure (detected by connectivity tests)
            -    OS malfunction that delays or stops hearbeat tests
            -    RAID failure
            -    Manual failover
            -     Management interface failure on primary hosts
            -    Primary does not take back its role as primary in the case of a failover.
            -    Secondary stays as primary while primary acts as standy
            -    Primary must be switched to "active" to take over its role
            -    No failover for software errors or disk capacity issues
            -    If both primary and secondary are unable to ping a managed hosts no failover occurs
            -    If primary cannot but secondary can ping a managed host, failover occurs
        HA Failover event sequence
            -    File systems are mounted
            -    Management interface alias is created eth0 is eth0:0
            -    VIP is assigned to the alias
            -    QRadar services are started
            -    Secondary connects to console and downloads configuration files
        Tips for manual synchronization
            -    Ensure primary and secondary hosts are sync'd
            -    Secondary must be in standby
            -    Secondary to offline and power off the primary
    2.    HA Planning
        -    File systems on both devices much match - ext-3, etc
        -    Secondary's "/store" partion must be equal to or greater than the primary
        -    Both devices should have the same number of interfaces
        -    Both must use the same management interface
        -    Only 1 VIP
        -    Port 7789 is needed for Distributed Replicated Block Device (DRDB)
        -    DRBD traffic is bidirectional
        -    Disk replication ensures software updates are applied to the secondary
        -    Ensure the host has a valid activation key
    3.    HA Management
        -    Uses System and License management window to:
            -    monitor HA
            -    Force failover
            -    Disconnect cluster
            -    Modify cluster settings
            -    Modify heartbeat interval
            -    Place the device in "offline" mode before maintenance

Further further information, please see the relevant IBM QRadar High Availability Guide available here.