Essay Undergraduate 769 words Human Written

Improving Fault Tolerance in Cloud

Last reviewed: ~4 min read Technology › Cloud Computing
80% visible
Read full paper →
Paper Overview

¶ … ability of the system to perform its function even in the presence of failures," fault tolerance is a critical component of effective cloud computing systems (Jhawar & Piuri, 2013, p. 3). To improve fault tolerance, a system must be designed to anticipate errors, which is why fault tolerance can be considered a form of risk...

Full Paper Example 769 words · 80% shown · Sign up to read all

¶ … ability of the system to perform its function even in the presence of failures," fault tolerance is a critical component of effective cloud computing systems (Jhawar & Piuri, 2013, p. 3). To improve fault tolerance, a system must be designed to anticipate errors, which is why fault tolerance can be considered a form of risk management. Usually, cloud systems with solid fault tolerance can handle functioning at reduced capacity in order to maintain critical systems components defined by the core stakeholders or end users (Amin, Sethi & Singh, 2015).

Because each cloud system will be differently constructed, fault tolerance architecture improvements must reflect the specific features and needs of the individual system. Generally speaking, four main types of clouds inform the methodologies of fault tolerant designs. Those four types of clouds include public, private, hybrid, and community clouds. All of these types will offer various services loosely grouped into three categories: software as service, platform as service, and infrastructure as service. Depending on the services and type of cloud in consideration, fault tolerance may be reactive, proactive, or both.

Moreover, the means by which fault tolerance is designed and measured for its effectiveness varies depending on needs for response times, throughput, and other measures including usability and even cost effectiveness (Amin, Sethi & Singh, 2015). Types of faults may also be taken into consideration when designing effective fault tolerant systems; network faults, physical faults, media faults, processor, process and service expiry faults are all possible and can be classified as being intermittent, permanent, or transient in nature (Saikia & Devi, 2014).

Improving fault tolerance in the cloud necessitates an understanding of the specifications of the system and its components. Reactive fault tolerance can be used when failures are relatively regular and predictable. Types of reactive fault tolerance include job migration, in which a task can be physically designated to a predetermined backup machine using HA proxy or other proven methods (Bala & Chana, 2012). Check pointing is an effective reactive technique particularly useful for "long running and big applications," (Saikia & Devi, 2014, p. 4). Bilal, et al.

(2015) note that fault tolerant systems can use a roll-forward or roll-back method in order to revert to a system state prior to the error. Checkpoint mechanisms, although not always cost-effective, can be particularly useful fault tolerance methods. Various types of checkpoint mechanisms include checkpoint placement schemes, in which checkpoints are inserted strategically in the system to allow for maximum task performance and completion in anticipation of failures (Bilal, et al., 2015).

With checkpoints, long and complex tasks do not need to be restarted in the case of error or failure, as the system can revert to a specific point in the workflow. Effective fault tolerance can anticipate and respond to both software and hardware failures. Dynamic fault tolerance strategies have been shown to be useful in large scale cloud systems with complex system architectures (Bilal, et al., 2015, p. 8). Self-healing, replication, and safety bag checks are alternative fault tolerance techniques used in cloud systems (Saikia & Devi, 2014).

All effective and reliable fault tolerance methods are flexible and address both proactive and reactive needs. To predict errors, software rejuvenation and preemptive migration may be combined with other reactive techniques to create a robust system that is responsive to changes in workflow (Bala & Chana, 2012). To ensure maximum reliability and accessibility for the end-user, a service layer can be integrated into existing cloud computing architecture. With a service layer, integrated techniques can be applied regardless of the unique specifications of the system and the software hosted on it.

Furthermore, clouds that interact with one another -- or interfaces between public and private clouds.

154 words remaining — Conclusions

You're 80% through this paper

The remaining sections cover Conclusions. Subscribe for $1 to unlock the full paper, plus 130,000+ paper examples and the PaperDue AI writing assistant — all included.

$1 full access trial
130,000+ paper examples AI writing assistant included Citation generator Cancel anytime
Sources Used in This Paper
source cited in this paper
6 sources cited in this paper
Sign up to view the full reference list — includes live links and archived copies where available.
Cite This Paper
"Improving Fault Tolerance In Cloud" (2016, July 03) Retrieved April 21, 2026, from
https://www.paperdue.com/essay/improving-fault-tolerance-in-cloud-2161605

Always verify citation format against your institution's current style guide.

80% of this paper shown 154 words remaining