A business continuity plan ensures business critical resource availability and systems necessary for organization survival and success. Fault tolerance, high availability and redundancy are system attributes essential to a successful business continuity plan. Understand the fault tolerance, redundancy, and high availability types required by the organization, defined by your business continuity plan, and supported by each system before procurement and deployment.
Fault tolerance refers to systems, software or networks designed to resist failure when an adverse event such as network connection outage or hard drive failure occurs. Fault tolerance is an asset attribute or property highly desirable for disaster recovery and business continuity planning. For example, storage systems configured with JBOD (Just a Bunch Of Disks) volumes will likely lose all data when a single drive in the array fails. However, storage systems configured in a RAID 6 array rather than a JBOD volume possess a fault tolerant feature (property) that enables them to lose up to two disks without losing either the entire disk array or any data.
The purpose behind fault tolerant features is “high-availability”. High-availability refers to the descriptive measurement of system up-time in relation to the service provided. For example, a single web server provides little in the way of fault tolerance if it were to fail and go offline due to a power outage or DoS attack. However, a group of three web servers configured as a cluster and each powered from a different power grid and running RAID 6 storage arrays provide a high degree of fault tolerance as a system. Each web server is susceptible to power outage, however the system as a whole is still capable of serving HTTP requests should one or two of the web servers fail due to power outage. To the user accessing the web pages the system would be considered “highly available” because it appears as if the web pages are always online. To the web server administrator the system would be “highly fault tolerant” because it is capable of handling up to two separate power outages (able to tolerate a large number of faults) in two separate grids while still remaining available.
Redundancy describes a type of fault tolerance that helps deliver high-availability. Redundancy applies to both disaster recovery and business continuity planning because the redundancy countermeasures mitigate risk identified during business continuity planning if both applicable to the system and justifiable through cost analysis. Redundancy is a disaster recovery component because it’s used to make a system more fault tolerant (a property of a system), delivering high-availability (the goal of both fault tolerance and a business continuity plan) by providing a recovery path in the form of standby, clustered (such as mentioned earlier) or other fail-over technique that ensures recovery when an incident occurs.
Contact us for more information about how fault tolerant, highly available, and redundant systems can ensure the business continuity of your organization.