Disruptive new technologies—such as stacked memory, embedded network controllers, and massively parallel low-power cores—are being integrated into future CPUs. Furthermore, changing workflows and programming environments are making new demands on the low-level system software. Power and resiliency are beginning to constrain system size. As noted by various DOE workshops and reports as well as the Exascale Computing Project (ECP), today’s operating system and runtime (OS/R) software cannot be incrementally extended and grown into an exascale solution for these issues. A new approach is required.
The Argo project proposes to improve or augment existing OS/R components for use in High Performance Computing. For more than a decade Argonne and LLNL have led the development of improved OS/R components for production HPC systems. The goal of Argo is to improve or augment existing OS/R components for use in production HPC systems, providing portable, open source software that improves the performance and scalability and that provides increased functionality to exascale applications.
We focus on the four areas of the OS/R stack where the need is the most urgent:
- Support for hierarchical memory,
- Dynamic management of power and CPU clock speed to meet performance targets,
- Containers for managing resources within a node, and
- Internode interfaces for collectively managing resources across groups of nodes