ASC Unity Cluster
Quick Introduction
go.osu.edu/unitycompute
Keith Stewart ASCTECH
2
What is the ASC Unity Cluster?
The Unity cluster is a local high-performance computing
(HPC) environment maintained by Arts and Sciences
Technology Services (ASCTech). Unity is designed to
accommodate researchers with their intense computational
and storage needs.
3
Why would I need to use it?
Calculation takes too long on your laptop/desktop
Calculation has too many runs
Data is too large (disk/memory) for your computer
Keeps your computer free to do daily tasks
Licensed software or specific version needed
4
Who can use it?
Any Arts and Sciences affiliated customer
Undergrad/Grad/Post Doc
Faculty
Staff
Sponsored Guest Accounts
* Must be in Unity-Users group. Requests made via Support
Request. http://go.osu.edu/unitysupportticket
5
What runs in the cluster?
Any executable that runs on RHEL7/8. This includes
and is not limited to:
CUDA, OpenACC, OpenCL
OpenMP, MPI
Matlab, Mathematica
R, Python, C, C++, Fortran, Perl, Lua, Julia, etc...
Spark/Hadoop
Machine Learning (TensorFlow, Caffe)
Containers!
Any compiled software that has a finite compute limit
6
What does not run in the cluster?
Service based applications that should be a VM or
separate hardware
Apache (web service)
MySQL/Postgres (databases)
Any software that is a service that must be running
permanently.
Many of these should be a VM unless computational
intensive.
7
Are there limits?
Jobs are limited by a walltime of 2 weeks (extension?)
max 1000 jobs submitted at a time
There is a limit of 30 running jobs (# of nodes).
Login/Head node has a 20 min compute limit
Home directories have a 100GB limit (project space
available upon request)
8
Unity Hardware
~107 nodes (29 Shared nodes, ~4000 cores)
Heterogeneous cluster mixed architectures
16 NVIDIA GPUs
512GB -> 16GB RAM (private 1.5TB and 1.0TB)
56core -> 16core
OSC resources
Pitzer 29,344 cores on 646 nodes (164 GPUs)
Owens 23,392 cores on 824 nodes (160 GPUs)
9
How does it work?
10
Accessing Unity off-campus
You must be on a campus network or tunnel in
Jumphost or VPN will tunnel to ASC networks
See http://go.osu.edu/jumpunity/
ssh -J jump.asc.ohio-state.edu unity.asc.ohio-state.edu
OR
ssh -J jump.asc.ohio-state.edu:2200 unity.asc.ohio-state.edu
11
Logging in via SSH
Putty/WSL on Windows or terminal on Linux and Mac OS X
12
SLURM (BATCH manager)
SLURM controls where/when you can run
sinteractive for interactive shell
sbatch submit batch jobs
scancel delete your job
squeue status of the queue
seff see efficiency of job
13
Interactive shell
Default values (1hour wall, 1 core, 3GB mem)
bash> sinteractive
14
Batch script
sbatch data pulled from file
bash> sbatch mycalc.sbatch
15
Sample mycalc.sbatch script
#SBATCH --job-name=mycalc-run42
#SBATCH --time=02:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=10g
#SBATCH --output=mycalc-run42_%j.log
#SBATCH --mail-type=ALL
#SBATCH --mail-user=[email protected]
module load matlab
matlab -nodisplay -nosplash < matlab-bench.m
16
Troubleshooting
Make note of the Job ID
Check your output and error files
Try your job interactively
Submit ticket request via web or email to ASCTech
17
Modules (lmod)
Lmod is used to manage multiple versions of software
and their dependencies.
module avail (list available modules)
module spider cuda (search for a module)
module load gnu/6.1.0 (load module)
module list (list loaded modules)
18
Contact info:
Unity Website
https://go.osu.edu/unitycompute
Direct ticket
http://go.osu.edu/unitysupportticket