menu
arrow_back

Deploy an Auto-Scaling HPC Cluster with Slurm

Deploy an Auto-Scaling HPC Cluster with Slurm

45 分钟 7 个积分

GSP690

Google Cloud Self-Paced Labs

Overview

Introduction

slurm logo

Welcome to the Google Qwiklab for running a Slurm cluster on Google Cloud! By the end of this lab you should have a solid understanding of the ease of provisioning and operating an auto-scaling Slurm cluster.

Google Cloud teamed up with SchedMD to release a set of tools that make it easier to launch the Slurm workload manager on Compute Engine, and to expand your existing cluster dynamically when you need extra resources. This integration was built by the experts at SchedMD in accordance with Slurm best practices.

If you're planning on using the Slurm on Google Cloud integrations, or if you have any questions, please consider joining our Google Cloud & Slurm Community Discussion Group!

About Slurm

a739730a41acff0a.png

Basic architectural diagram of a stand-alone Slurm Cluster in Google Cloud.

Slurm is one of the leading workload managers for HPC clusters around the world. Slurm provides an open-source, fault-tolerant, and highly-scalable workload management and job scheduling system for small and large Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions:

  1. It allocates exclusive or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work.

  2. It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes.

  3. It arbitrates contention for resources by managing a queue of pending work.

Objectives

In this lab, you will learn how to:

  • Use Google Cloud's Deployment Manager Service.
  • Run a job using SLURM.
  • Query cluster information and monitor running jobs in SLURM.
  • Autoscale nodes to accommodate specific job parameters and requirements.
  • Find help with Slurm.

加入 Qwiklabs 即可阅读本实验的剩余内容…以及更多精彩内容!

  • 获取对“Google Cloud Console”的临时访问权限。
  • 200 多项实验,从入门级实验到高级实验,应有尽有。
  • 内容短小精悍,便于您按照自己的节奏进行学习。
加入以开始此实验
分数

—/100

Create a deployment Manager

运行步骤

/ 50

Run a Slurm Job

运行步骤

/ 25

Scale a Slurm Cluster

运行步骤

/ 25