This paper examines computer clustering β the linking of multiple computers, storage devices, and interconnections to form a single integrated system β and its major applications in parallel processing, batch processing, load balancing, and high availability. Beginning with the origins of clustering in the 1980s and the landmark Beowulf cluster of 1994, the paper traces how commodity server clusters displaced expensive supercomputers and mainframes. It also addresses key implementation challenges, including achieving transparency, reducing network latency, and resolving the split-brain problem through resource-based fencing and STOMITH techniques. The paper concludes by situating cluster computing within the broader evolution toward grid computing.
Computer clustering involves the use of multiple computers β typically personal computers (PCs) or UNIX workstations β along with multiple storage devices and redundant interconnections, to form what appears to users as a single integrated system. Clustering has been available since the 1980s, when it was used in Digital Equipment Corp's VMS systems. Today, virtually all leading hardware and software companies, including Microsoft, Sun Microsystems, Hewlett-Packard, and IBM, offer clustering technology.
This paper describes why and how clustering is commonly used for parallel processing, batch processing, load balancing, and high availability. Despite some challenges β such as achieving transparency, mitigating network latency, and resolving the split-brain problem β clustering has proven to be a significant success for bringing scale and availability to computing applications. Hungry for even more efficient resource use, IT departments are now turning their attention to the next evolution of clustering: grid computing.
Parallel processing is the processing of program instructions by dividing them among multiple processors with the objective of running a program in less time. It is normally applied to rendering and high-computation-based applications. Rather than using expensive specialized supercomputers, implementers have begun using large clusters of small, commodity servers. Each server runs its own operating system, takes on a number of jobs, processes them, and sends the output to the primary system (Shah, 1999). Clusters provide the ability to handle a large task in small pieces β or large numbers of small tasks across an entire cluster β making a system both more affordable and more scalable.
The first PC cluster to be described in scientific literature was named Beowulf and was developed in 1994 at the NASA Goddard Space Flight Center. Beowulf initially consisted of sixteen PCs, standard Ethernet, and a modified version of Linux, and achieved seventy million floating-point operations per second. For only $40,000 in hardware, Beowulf had produced the processing power of a small supercomputer that would have cost approximately $400,000 at that time. By 1996, researchers had achieved one billion floating-point operations per second at a cost of less than $50,000. Later, in 1999, the University of New Mexico clustered 512 Intel Pentium III processors to create the 80th-fastest supercomputing system in the world, with a performance of 237 gigaflops.
Just as clustering has reduced the importance of supercomputers for parallel processing, clusters are making the mainframe less relevant for batch applications. A batch job is a program assigned to a computer to run without further user interaction. Common batch-oriented applications include data mining, 3-D rendering, and engineering simulations. Before clustering, batch applications were typically the domain of mainframes, which involved high costs of ownership. Now, with clusters and a scheduler, large batch jobs can easily be processed on a less expensive cluster.
Load balancing is the distribution of the amount of work a computer must do between two or more computers, so that more work gets done in the same amount of time and all users are served faster. For load balancing purposes, computers are used together in such a way that traffic and load on the network are distributed across the computers in the cluster (D'Souza, 2001). Load balancing is commonly used in applications where the load on the system cannot be predicted and varies over time.
One common example is web servers, where two or more servers are configured so that when one server becomes overburdened with requests, those requests are passed on to other servers in the cluster, thus evening out the workload. In a business network for Internet applications, a cluster β often called a Web farm β might provide services such as centralized access control, file access, printer sharing, and backup for workstation users. The servers may run individual operating systems or a shared operating system, and can provide load balancing when there are many server requests. A web page request is sent to a "manager" server that determines which of several identical or similar web servers should handle the request, allowing traffic to be processed more quickly.
"Achieving continuous uptime through redundancy"
"Key technical obstacles in cluster implementation"
"Fencing, STONITH, and quorum-based solutions"
"From cluster computing toward distributed grid systems"
You’re 32% through this paper. Sign up to read the remaining 4 sections.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.