The Argo design supports hierarchical views on the entire exascale system. The global view enables Argo to combine live performance data, active control interfaces, and machine-learning techniques to dynamically manage power across the entire system, respond to fault, or tune application performance. Only with a whole-system perspective can power budget goals be reached and cascading failures halted to avoid a system crash. At the other end of the spectrum is the local view. For scalability, compute nodes must have a measure of autonomy to manage and optimize massive intranode parallelism, schedule low-latency messages on embedded network adapters, and adapt to new memory technologies. Bringing together these multiple perspectives, and the corresponding software components operating within our hierarchical view, is our strategy for addressing the four key exascale challenges: power, parallelism, memory hierarchy, and resilience
The Argo Architecture
The BEACON backplane The Argo architecture is divided into four parts: (1) The Node operating system, (2) the Argo runtime, (3) the Global OS (Argo Crew) and (4) the Global Information Bus
The Node Operating System (NodeOS)
The NodeOS is the operating system running on each node of an Argo machine. It is a based on the Linux kernel, tuned and extended for HPC use on future architectures. In particular, we leverage the control groups interface and extend it to provide lightweight A-containers. The compute containers within the A-containers provide exclusive access to hardware resources. To limit OS noise on the node, system services are restricted to a small dedicated share of cores and memory nodes. Additionally, the NodeOS provides custom memory and scheduling policies, as well as specialized interfaces for parallel runtimes.
Click here for more information on NodeOS.
The Argo Runtime:
Argobots is the runtime component of Argo. Argobots implements a low-level threading and tasking framework entirely in user-space, giving users total control over their resource utilization, and provides data movement infrastructure and tasking libraries for massively concurrent systems. The Argobots runtime provides the Argobots interface for higher-level applications and libraries.
The Argo Runtime team has extended existing MPI implementations such as MPICH to be Argo-aware, in order to support MPI-based applications.
Click here for more information on the Argo runtime.
The GlobalOS (Argo Crew)
The Argo Crew is a part of the GlobalOS. It is a collection of lightweight services interacting with the Global Information Bus, the NodeOS, and Vendor-provided hardware interfaces to create the command and control infrastructure of Argo.
It is a hierarchical infrastructure, directly responsible for managing enclaves, and distributing its control along the enclave hierarchy, each enclave having at least one master node dedicated to it. The Argo crew is currently composed of 3 key components:
- Jason – which orchestrates compute node provisioning, by providing the glue between a job scheduler allocating nodes to a user job, and GlobalOS configuring the nodes as an enclave. It is built on top of OpenStack and Ansible.
- Echion – which functions as a distributed tracker of nodes and enclave state, built on top of the Argo key-value store
- Argus – which handles describing and launching HPC applications across enclaves and on top of the NodeOS containers.
At its core, this infrastructure aims to distribute control across the enclave tree so that configurable actions and policies can be controlled by masters down the hierarchy while privileged actions trickle down for the root nodes. At the same time, failures will be handled as exceptions going up this enclave tree, letting master closest to a failure try to
Click here for more information about the Argo Crew.
The Global Information Bus (GIB)
The Global Information Bus is a scalable infrastructure with focus on providing common interfaces for communication and management of system data across all layers of the Argo stack. The GIB consists of
- Backplane for Event and Control Notification (BEACON) (developed within the Argo project)
- Exascale Performance and Observability Backplane (EXPOSÉ) (developed within the Argo project)
- Point-to-point communication framework (third-party)
- Key-value store (being developed by the HOBBES Xstack project)
Click here for more information on the GIB and its components.
The BEACON backplane
The Backplane for Event And Control Notification (BEACON) provides an infrastructure and interfaces for gathering event data, based on which components can take appropriate action. BEACON is a lightweight framework that provides interfaces for sharing event information as well as other supplementary services, in the ARGO system.
The main idea of BEACON, based on publish-subscribe frameworks, encompasses backplane end-points (also called BEEPs) – across node, enclave, and system levels – that are responsible for detecting and generating information (including, but not limited to faults), which will then be propagated via BEACON throughout the system. Other BEEPs, which subscribe to this information, can generate appropriate response actions, if needed. However, actual response actions are initiated and performed by the various entities themselves, and not by the BEACON framework. We expect this approach to provide a comprehensive method for detecting, disseminating, and handling information on a system-wide basis.
The BEACON API provides interfaces for BEEPs to connect and disconnect from BEACON, publish an event, set boundaries to control flow of information across enclaves through scoping and subscribe to events using synchronous and asynchronous methods.