Performetrica HPC – Getting Started & System Overview

Accessing Performetrica HPC

From inside Performetrica / local network

ssh <username>@login.superbilgisayar.tr

From outside (internet)

ssh <username>@login.superbilgisayar.tr

If VPN or firewall restrictions apply, follow Performetrica access policy.

Cluster Overview

Performetrica HPC is a CPU-based high performance computing cluster designed for:

Scientific computing
Parallel workloads (MPI / OpenMP)
Simulation, modeling, and data processing

Inspecting Cluster Resources

To list nodes and partitions:

sinfo --long --Node "%#N %.6D %#P %6t"

To check detailed node info:

scontrol show node <nodename>

Understanding CPU Capabilities

Performance depends heavily on CPU features.

Example CPU capabilities (Intel Xeon class CPUs):

AVX / AVX2 / AVX-512
FMA (Fused Multiply-Add)
SIMD vectorization
NUMA architecture

You can inspect CPU flags:

lscpu

or:

cat /proc/cpuinfo

Why this matters

AVX/AVX512 → faster vector math
NUMA → memory locality is critical
Cache hierarchy → affects scaling

Performance Optimization Guidelines

1. Match workload to architecture

Use OpenMP for shared memory scaling
Use MPI for distributed scaling

2. CPU Binding (Critical)

Avoid CPU migration:

#SBATCH --cpu-bind=cores

3. NUMA Awareness

Keep threads on same socket if possible
Avoid cross-socket memory traffic

4. Thread Configuration

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

5. Memory Optimization

Request realistic memory
Avoid over-allocation (increases queue time)
Monitor usage with:

sacct -j <jobid> --format=MaxRSS

6. I/O Considerations

Use local /tmp when possible
Avoid heavy I/O on shared storage
Batch writes instead of frequent small writes

Typical Workload Types

Workload Type	Recommended Slurm Settings
Serial	ntasks=1, cpus-per-task=1
OpenMP	ntasks=1, cpus-per-task=N
MPI	ntasks=N
Hybrid	ntasks + cpus-per-task

Example CPU Job

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --partition=defq
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=32G
#SBATCH --time=02:00:00

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

./my_application

Key Takeaways

Use correct Slurm parameters
Respect NUMA and cache locality
Optimize total runtime (queue + execution)

References

https://slurm.schedmd.com/
https://hpc-wiki.info/