24x7 Scheduler Java Edition - High Availability and Fail-over

Overview

The 24x7 Scheduler is an enterprise class scheduling system designed to handle thousands of jobs. It can be configured as a High Availability system.

What Is High Availability?

High Availability (HA) means access to data, jobs and applications whenever needed and with an acceptable level of performance. HA deals with the service aspect of the "scheduling system" as an unbroken whole and as perceived by its end-users. In this context, reliability (of hardware and software components) and performance (response-time/through-put, tpm, etc.) are parts of system availability.

HA is the proportion of time a system is productive and is usually expressed as a percentage. In the spectrum of system availability, HA systems usually fall between 99.9% and 100% availability.

24x7 Scheduler supports 2 fail-over levels: 1. Job fail-over can be implemented using built-in job processing redundancy features. 2. Scheduling system fail-over can be implemented using clustering solutions.

Job Fail-over

24x7 Scheduler can run both local and remote jobs using remote agents deployed on other computers. Each job can be assigned to a primary agent and a list of backup agents. In case the primary agent is not available the job is automatically submitted to the to first available or least busy backup agent.

Scheduler Fail-over

Several solutions can be used for the system fail-over. We recommend Linux High Availability Project (Linux HA). This is a kind of inexpensive clustering solution based on heartbeat method which is easy to install and configure. For more information about Linux HA please read http://linux-ha.org/download/GettingStarted.html

Linux HA heartbeat system (active/passive configuration) can be setup to monitor TCP port 1097 (this is the default port used by 24x7 Scheduler for TCP requests.. Each node in a cluster needs to have 2 network cards - one for the regular network and another for the private network. Nodes can be then linked by Cat-5 crossover cable. Using private network allows excluding all common network dependencies (such as routers, switches, disconnects) and avoid false fail-overs caused by other component failures.

IMPORTANT: The Linux High Availability Project method can be used and run on any other Unix system, including Sun Solaris, HP-UX and other. It is not specific to Linux operation system!