Welcome To The COE High Performance Computing Cluster

The College of Engineering operates a high performance computing (HPC) cluster that is supported by the engineering community. Our goal is to provide a resource that is available to all engineering faculty and students. The college supports the cluster by providing rack space, disk space, administration, and configuration of all systems, along with standard software. Furthermore, the college provides CPU and GPU systems for general research and class use. Faculty that purchase systems to add to the cluster get the benefit of reserved resources along with access to the rest of the cluster. COE IT maintains these resources to free faculty up from system administration needs.

Non-COE faculty and students can request COE HPC accounts. Students not enrolled in engineering courses must have sponsorship from a faculty member in their college or department.

All new HPC users must attend an Intro to HPC training session with COE HPC Manager, Robert Yelle. Non-COE users will also receive a link to sign up information in the email confirmation after submitting the HPC access request form.

Getting Started With The COE High Performance Computing Cluster

Request HPC access

If you are COE faculty, staff, or student, you can proceed to the next step (Enable HPC Account in TEACH)
All other OSU staff, faculty and students must first request access to COE HPC cluster using this webform

Enable HPC Account in TEACH

Log into the engineering TEACH website with your engineering account: https://teach.engr.oregonstate.edu/
Click the right-hand menu link "High Performance Computing (HPC)"
Click the "create" button
Your HPC account has been created!

As a new HPCC user, you may wish to subscribe to the Cluster mailing list to receive news or status updates regarding the cluster. You may also check for news or the status of the cluster here.

Attend Intro to HPC Training

After enabling your HPC acount you must attend a mandatory training session with COE HPC manager, Robert Yelle. If you are not a COE staff/fac/student you will receive a link to sign up for the training after requesting access in step 1 above, but you may also click this link to access the training signup. It is advisable to have your account activated before attending the training session.
There are also free HPC training resources available from NVIDIA and Mark III - please see the menus on the right side of this page.

Connecting From Off Campus

There are two ways to get on the campus network

Connect to OSU VPN. This is the recommended method. Once you are on the OSU VPN you can connect directly to the CoE HPC cluster using your SSH client, or to the HPC portal using your web browser.
Connect to one of the CoE gateway servers.
Alternatively, you can first connect to a CoE gateway host (access.engr.oregonstate.edu) via SSH. If you are using a Mac or a Linux computer, then you can just launch a terminal window and use the ssh command, e.g.:
```
ssh username@access.engr.oregonstate.edu
```
where myONID = your ONID, or OSU Network ID. If you are using Windows, you need to run an SSH client like MobaXterm or Putty, then open an SSH session to access.engr.oregonstate.edu.

Connect to the COE HPC Cluster

If you are on campus, or are connected to the OSU VPN or to a COE Gateway host as described in Step 3, then you may connect directly to one of the cluster login or submit nodes via SSH or via the HPC portal using your ONID credentials. If you are using a Mac, or a Linux host such as one of the flip servers, then from a terminal window or shell prompt you can SSH directly to one of three submit hosts (submit-a,b,c) as follows.:

ssh username@submit.hpc.engr.oregonstate.edu

If you are connecting from a Windows computer, you need to run an SSH client like MobaXterm or Putty, and open an SSH session to one of the submit nodes (e.g. submit.hpc.engr.oregonstate.edu).

To access the HPC Portal, click on this link or launch a web browser and put in this URL:

https://submit.hpc.engr.oregonstate.edu

Note that the submit nodes are not for running long or large jobs or calculations, but serve as a gateway for the rest of the cluster. From a submit node, you may request compute resources (e.g. CPUs, RAM and GPUs) from available compute nodes, either via an interactive session, or by submitting one or more batch jobs. See the Slurm section below (Step 5) for how to reserve resources and run jobs on the cluster.

Request Compute Resources Using Slurm

Once you are connected to a submit host, you can reserve and use cluster resources. Be advised that direct ssh access to a cluster compute node is not required, and in fact is not permitted unless you are granted access to a compute node using Slurm. Slurm Workload Manager is the batch-queue system used to gain access to or run jobs on the COE HPC cluster, including the Nvidia DGX systems and other compute nodes. To use Slurm, you will need to be assigned to a Slurm account corresponding to your department, class, or research group, which is done by enabling your HPC account in TEACH (see Step 2).

For quick, interactive shell access to a compute node (e.g. using bash), do:

srun --pty bash

***if you get the error "srun: command not found", check out FAQ #B.2 for how to resolve.

If you want interactive access to a GPU, and prefer tcsh shell over bash, do:

srun --gres=gpu:1 --pty tcsh

To confirm that you have access to one or more GPUs:

nvidia-smi

Many more options are available which include reserving multiple cores, multiple compute nodes, more memory, additional time and requesting other pools of resources. In addition, jobs may be submitted in batch mode instead of interactive mode. Check out the Slurm HOWTO for more examples and information on using Slurm on the COE cluster.

If you have completed the above steps, you should now have a functioning COE HPC environment. However if you would like to get the most out of your use of the cluster, here are some more helpful hints to optimize your experience.

Information Technology and Computing Support

Getting Started With The COE High Performance Computing Cluster

HPC Training and Resources

Nvida Resources

Mark III Training Opportunities

Contact Info