Borg (Google's Cluster Management System)

A Borg (Google's Cluster Management System) is a cluster management system developed by Google to efficiently manage resources across its massive server farms.

Context:
- It can be considered a predecessor to Kubernetes, with its design principles and operational insights significantly influencing Kubernetes' and also Docker's development.
- It can (typically) manage clusters of tens of thousands of machines, running hundreds of thousands of jobs from many different applications.
- It can (often) optimize resource utilization through features like admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation.
- It can range from supporting high-availability applications with runtime features that minimize fault-recovery time to scheduling policies aimed at reducing the risk of correlated failures.
- It can simplify life for its users by providing a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior.
- It operates with very high reliability and supports applications that require similar levels of availability.
- ...
Example(s):
- A large Google data center that utilizes Borg to balance and manage computational loads efficiently (Google Research).
- A scenario where Borg handles multiple applications across different clusters, ensuring optimized resource allocation without direct user involvement in the low-level details of resource management (High Scalability).
- Borg Cluster Scheduling System,
- Borg Container Orchestration System.
- ...
Counter-Example(s):
- Apache Mesos,
- Docker Swarm,
- Kubernetes.
See: Borg Cell, Borglet, borgcfg, Distributed Computing System, OS-level Virtualization Platform, Container Orchestration System, Cloud Computing System, Cloud Orchestration System, Scheduling Algorithm, Virtual Machine Hypervisor.

References

2024a

(Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Borg_(cluster_manager) Retrieved:2024-5-12.
- Borg is a cluster manager used by Google. It led to widespread use of similar approaches, such as Docker and Kubernetes.

2024b

(ChatGPT-OpenAI, 2024) ⇒ Prompt: Summarize the concept Google's Borg in 1- 2 paragraphs. ChatGPT (model:gpt-4-turbo-preview). Retrieved: 2024-05-12.
- QUOTE: Google's Borg is an internal cluster management system that the tech giant uses to orchestrate and manage its vast array of services across its data centers. Borg enables Google to efficiently allocate resources, manage workloads, handle failures, and schedule tasks across thousands of machines, ensuring that Google's services are highly available and scalable. It acts as the brain behind Google's operations, automating the deployment and operation of applications, managing containers, and optimizing the use of hardware resources. Introduced over a decade ago, Borg has been a pioneering force in the evolution of distributed systems and has influenced the design of several cloud orchestration systems that followed, including Kubernetes, an open-source container-orchestration system that was inspired by Borg's principles and designed by Google to automate application deployment, scaling, and management across a cluster of machines.
  Borg's architecture is designed to hide the complexities of hardware management from developers, allowing them to focus on writing code without worrying about the underlying infrastructure. It provides high-level abstractions for jobs and services, manages task dependencies, and offers features for monitoring and maintaining the health of applications. Borg also implements sophisticated scheduling algorithms to optimize resource utilization and ensure fairness among users, along with mechanisms for fault tolerance and recovery. Through the years, Borg has not only supported Google's exponential growth but also contributed to the development of large-scale computing methodologies and technologies, significantly influencing the industry's approach to distributed computing and cloud services.

2018

(Souptik, 2018) ⇒ Souptik Sen. (2018). "Large-scale cluster management at Google with Borg: Review.". In: Google Blog post.
- QUOTE: Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines. Borg provides three main benefits: it (1) hides the details of resource management and failure handling so its users can focus on application development instead; (2) operates with very high reliability and availability, and supports applications that do the same; and (3) lets us run workloads across tens of thousands of machines effectively.

2015

(Verma et al., 2015) ⇒ Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. (2015). "Large-scale cluster management at Google with Borg.". In: Proceedings of the European Conference on Computer Systems (EuroSys), ACM, Bordeaux, France.
- QUOTE: Borg's users are Google developers and system administrators (site reliability engineers or SREs) that run Google's applications and services. Users submit their work to Borg in the form of jobs, each of which consists of one or more tasks that all run the same program (binary). Each job runs in one Borg cell, a set of machines that are managed as a unit. The remainder of this section describes the main features exposed in the user view of Borg.
  
  Figure 1: The high-level architecture of Borg. Only a tiny fraction of the thousands of worker nodes are shown.

Borg (Google's Cluster Management System)

References

2024a

2024b

2018

2015

Navigation menu

Search