Distributed computing is the process of aggregating the power of several computing entities to collaboratively run a single computational task in a transparent and coherent way, so that they appear as a single, centralized system.

Introduction

In computer science, a distributed system is an application that consists of components running on different computers concurrently. These components must be able to communicate and be designed to operate separately. As stated by Andrew S. Tanenbaum, "Distributed systems need radically different software than centralized systems do." The software he was discussing must be implemented in some form of distributed programming language.

Distributed computing is very attractive in part because interactive operation leaves most computers idle most of the time. The process which implements the distributed aspect — that is, one running on a machine normally devoted to other work — is usually specially designed to be a low priority process, using only computing power that would be wasted otherwise.

However, having a low-priority process constantly running prevents operating system power management routines from putting the processor into a low-power mode, resulting in increased electricity consumption. For some typically recent and high speed CPUs, the difference can be on the order of tens of watts for a single system.

Distributed computing is also an active area of computer science research with abundant literature. So promising is distributed computing technology that numerous US Department of Energy labs are linked for shared computational resources.

Goal

There are many different types of distributed computing systems and many challenges to overcome in successfully architecting one. The main goal of a distributed computing system is to connect users and resources in a transparent, open, and scalable way. Ideally this arangement is drastically more fault tolerant and more powerful than many combinations of stand-alone computer systems.

Examples

An example of a distributed system is the World Wide Web. As you are reading a web page, you are actually using the distributed system that comprises the site. As you are browsing the web, your web browser running on your own computer communicates with different web servers that provide web pages. Possibly, your browser uses a proxy server to access the web contents stored on web servers faster and more securely. To find these servers, it also uses the distributed domain name system. Your web browser communicates with all of these servers over the Internet, via a system of routers which are themselves part of a large distributed routing system.

Openness

Openness is the property of distributed systems that measures the extent to which it offers a standardized interface that allows it to be extended and scaled. It is clear that a system that easily allows more computing entities to be plugged into it and more features to be easily added to it has an advantage over a perfectly closed and self-contained system.

Scalability

Main article Scalable

A scalable system is one that can easily be altered to accommodate changes in the amount of users, resources and computing entities affected to it. Scalability can measured in three different dimensions:

Load scalability — A distributed system should make it easy for us to expand and contract its resource pool to accommodate heavier or lighter loads.
Geographic scalability — A geographically scalable system is one that maintains its usefulness and usability, regardless of how far apart its users or resources are.
Administrative scalability — No matter how many different organizations need to share a single distributed system, it should still be easy to use and manage.

Some loss of performance may occur in a system that allows itself to scale in one or more of these dimensions.

Multiprocessor systems

A multiprocessor system is simply a computer that has more than one CPU on its motherboard or inside its own die. If the operating system is built to take advantage of this, it can run different processes on different CPUs, or different threads belonging to the same process.

Over the years, many different multiprocessing options have been explored for use in distributed computing. CPUs can be connected by bus or switch networks, use shared memory or their own private RAM, or even a hybrid approach.

These days, multiprocessor systems are available commercially for end-users, and mainstream operating systems like Windows and Linux already have built-in support for this. Additionally, recent Intel CPUs have begun to employ a technology called Hyperthreading that allows more than one thread to run on the same CPU.

Multicomputer systems

A multicomputer system is a system made up of several independent computers interconnected by a telecommunications network.

Multicomputer systems can be homogeneous or heterogeneous: A homogeneous distributed system is one where all CPUs are similar and are connected by a single type of network. They are often used for parallel computing which is a kind of distributed computing where every computer is working on different parts of a single problem.

In contrast a heterogeneous distributed system is one that can be made up of all sorts of different computers, eventually with vastly differing memory sizes, processing power and even basic underlying architecture. They are in widespread use today, with many companies adopting this architecture due to the speed with which hardware goes obsolete and the cost of upgrading a whole system simultaneously.

Architecture

Various hardware and software architectures exist that are usually used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of that network being printed onto a circuit board or made up of several loosely-coupled devices and cables. At a higher level, it is necessary to interconnect processes running on those CPUs with some sort of communication system.

Client-server — Smart client code contacts the server for data, then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change.
3-tier architecture — Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are 3-Tier.
N-tier architecture — N-Tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers.
Tightly coupled (clustered) — refers typically to a set of highly integrated machines that run the same process in parallel, subdividing the task in parts that are made individually by each one, and then put back together to make the final result.

Parallelism

Main article Parallel computing

Computing Taxonomies

The types of distributed computers are based on Flynn's taxonomy of systems; single instruction, single data (SISD), multiple instruction, single data (MISD), single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD). Other taxonomies and architectures availible at Computer architecture and in Category:Computer architecture.

Computer clusters

Main article Cluster computing

A cluster is multiple stand-alone machines acting in parallel across a local high speed network. Distributed computing differs from cluster computing in that computers in a distributed computing environment are typically not exclusively running "group" tasks, whereas clustered computers are usually much more tightly coupled. The difference makes distributed computing attractive because, when properly configured, it can use computational resources that would otherwise be unused. It can also make available computing resources which would otherwise be impossible.

The Second Life grid is a heterogeneous multicomputer and so are most Beowulf clusters.

Grid computing

Main article Grid computing

A grid consists of multiple computers sharing information over the Internet. Most use idle time on many thousands of computers throughout the world. Such arrangements permit handling of data that would otherwise require the power of expensive supercomputers or would have been impossible to analyze otherwise.

Distributed computing projects also often involve competition with other distributed systems. This competition may be for prestige, or it may be a means of enticing users to donate processing power to a specific project. For example, stat races are a measure of what the most distributed work a project has been able to compute over the past day or week. This has been found to be so important in practice that virtually all distributed computing projects offer online statistical analyses of their performances, updated at least daily if not in real-time.

See List of distributed computing projects for more information on specific projects.