Intro

Introduction To Grid and Clusters

Cluster and Grid computing is a new face on a problem that is as old as computing itself, namely how to get another computer to do your work. Whether it is through vanilla Remote Procedure Calls (RPC), DCOM, CORBA or more recently web services the problem is important.

In the last few years there has been renewed vigour in the field on the back of Internet based success and the continued reign of Moore’s Law.

This portal is intended to provide you with a one stop shop for information about clusters and grids, allow you to solicit opinion and allow you to share your experiences. The value of the sharing your hard-won experiences lies in realising that most of our audience are not cluster administrators but have a job that only peripherally depends on clusters.)

First let me do a little admin: what is the difference between grids and clusters? We see the difference as clusters as being a more or less homogenous group of locally connected computers under your control. A grid need not be homogenous and the degree of distribution can be more extreme.

To the outside world, a "supercomputer" appears to be a single system. In fact, it's a cluster of computers that share a local area network and have the ability to work together on a single problem as a team. Many businesses used to consider supercomputing beyond the reach of their budgets, but new Linux applications have made high-performance clusters more affordable than ever. These days, the promise of low-cost supercomputing is one of the main reasons many businesses choose Linux over other operating systems.

Clusters

From Wikipedia: A computer cluster is a group of loosely coupled computers that work together closely so that in many respects it can be viewed as though it were a single computer. Clusters are commonly (but not always) connected through fast local area networks. Clusters are usually deployed to improve speed and/or reliability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or reliability.

High-availability (HA) clusters

High-availability clusters are implemented primarily for the purpose of improving the availability of services which the cluster provides. They operate by having redundant nodes which are then used to provide service when system components fail. The most common size for an HA cluster is two nodes, since that's the minimum required to provide redundancy. HA cluster implementations attempt to manage the redundancy inherent in a cluster to eliminate single points of failure. There are many commercial implementations of High-Availability clusters for many operating systems. The Linux-HA project is one commonly used free software HA package for the Linux OS.

Load balancing clusters

Load balancing clusters operate by having all workload come through one or more load-balancing front ends, which then distribute it to a collection of back end servers. Although they are implemented primarily for improved performance, they commonly include high-availability features as well. Such a cluster of computers is sometimes referred to as a server farm. There are many commercial load balancers available including Moab Cluster Suite and Maui Cluster Scheduler. The Linux Virtual Server project provides one commonly used free software package for the Linux OS.

High-performance (HPC) clusters

High-performance clusters are implemented primarily to provide increased performance by splitting a computational task across many different nodes in the cluster, and are most commonly used in scientific computing. One of the more popular HPC implementations is a cluster with nodes running Linux as the OS and free software to implement the parallelism. This configuration is often referred to as a Beowulf cluster. Such clusters commonly run custom programs which have been designed to exploit the parallelism available on HPC clusters. Many such programs use libraries such as MPI which are specially designed for writing scientific applications for HPC computers.