Google Cluster Architecture "Web Search for a Term Paper

Total Length: 1157 words ( 4 double-spaced pages)

Total Sources: 1

Page 1 of 4

Google Cluster Architecture

"Web Search for a Planet: The Google Cluster Architecture," IEEE Micro, Mar-Apr., 2003, 22-28.

The central idea of this IEEE Micro article is that Google has designed a search engine which is energy efficient, reliable, and so cost effective that it allows them to provide superior service. The article begins by pointing out that every request to a search engine requires complex computations. When Google gets a request, it reads hundreds of megabytes of data and uses tens of billions of CPU cycles. With thousands of such requests happening every second, Google's infrastructure compares in size to a supercomputer installation! Energy efficiency and price-performance ratio are the most important factors to its design. Easy parallelism is the main priority so that different queries can run on different processors (the overall index is partitioned so that a single query can use multiple processors).

Google's architecture provides reliability by using many commodity PCs to build computing clusters. The design is tailored for "best aggregate request throughput" rather than peak server response time -- reponse times are managed by parallelizing individual requests. Thus, a reliable computing infrastructure is fashioned from clusters of unreliable commodity PCs. At the software level reliability is achieved by replicating services across many different machines and automatically detecting failures.

When a user queries Google, the user's browser first identifies the nearest domain. Multiple clusters are distributed worldwide with sufficient capacity to handle query traffic. The system selects the nearest cluster. This minimizes the time required to respond to the user's query. The user's browser sends a hypertext transport protocol (HTTP) request to that cluster which processes the query. Each cluster has its own load-balancer and distributes requests across the available Google Web Servers (GWS). The GWS coordinates query and formats it into Hypertext Markup Language (HTML).

Stuck Writing Your "Google Cluster Architecture "Web Search for A" Term Paper?



There are two major phases to query execution. In the first phase, each query word is mapped to a matching list of documents. This forms a hit list. Then the hit list is compared to relevant documents, and a relevance score is computed for each document. The relevance score determines the order of results on the output page. This was accomplished when the index server consulted an inverted index comprised of many terabytes of data. The huge amount of data makes the search process very challenging, but the final result of the first phase is an ordered list of In the second phase of the query execution, this list of identifiers is taken and computed to produce a query-specific document summary. Document servers examine each document for the title and keyword. As in the first phase, to do this, the documents are randomly distributed into smaller shards; multiple server replicas handle each shard; and requests are routed through a load balancer. Google stores dozens of copies of the Web across its cluster to insure adequate replication in all the clusters. When both phases are complete, a GWS sends the HTML to the output page and returns it to the user's browser.

By parallelizing the search over many machines, the wait is reduced to answer a query. Most accesses are read-only. When updates are done (infrequently), queries are diverted. The main thing is that the inherent parallelism of the system is aggressively exploited. Big look-ups of matching documents are transformed into smaller indices and then merged afterwards. The query stream is divided too. Machines are added to each pool to increase the capacity. The total computation is divided across….....

Show More ⇣


     Open the full completed essay and source list


OR

     Order a one-of-a-kind custom essay on this topic


sample essay writing service

Cite This Resource:

Latest APA Format (6th edition)

Copy Reference
"Google Cluster Architecture Web Search For A" (2005, August 06) Retrieved June 5, 2026, from
https://www.aceyourpaper.com/essays/google-cluster-architecture-web-search-67107

Latest MLA Format (8th edition)

Copy Reference
"Google Cluster Architecture Web Search For A" 06 August 2005. Web.5 June. 2026. <
https://www.aceyourpaper.com/essays/google-cluster-architecture-web-search-67107>

Latest Chicago Format (16th edition)

Copy Reference
"Google Cluster Architecture Web Search For A", 06 August 2005, Accessed.5 June. 2026,
https://www.aceyourpaper.com/essays/google-cluster-architecture-web-search-67107