Performetrica HPC – Getting Started & System Overview
Accessing Performetrica HPC
From inside Performetrica / local network
ssh <username>@login.superbilgisayar.tr
From outside (internet)
ssh <username>@login.superbilgisayar.tr
If VPN or firewall restrictions apply, follow Performetrica access policy.
Cluster Overview
Performetrica HPC is a CPU-based high performance computing cluster designed for:
- Scientific computing
- Parallel workloads (MPI / OpenMP)
- Simulation, modeling, and data processing
Inspecting Cluster Resources
To list nodes and partitions:
sinfo --long --Node "%#N %.6D %#P %6t"
To check detailed node info:
scontrol show node <nodename>
Understanding CPU Capabilities
Performance depends heavily on CPU features.
Example CPU capabilities (Intel Xeon class CPUs):
- AVX / AVX2 / AVX-512
- FMA (Fused Multiply-Add)
- SIMD vectorization
- NUMA architecture
You can inspect CPU flags:
lscpu
or:
cat /proc/cpuinfo
Why this matters
- AVX/AVX512 → faster vector math
- NUMA → memory locality is critical
- Cache hierarchy → affects scaling
Performance Optimization Guidelines
1. Match workload to architecture
- Use OpenMP for shared memory scaling
- Use MPI for distributed scaling
2. CPU Binding (Critical)
Avoid CPU migration:
#SBATCH --cpu-bind=cores
3. NUMA Awareness
- Keep threads on same socket if possible
- Avoid cross-socket memory traffic
4. Thread Configuration
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
5. Memory Optimization
- Request realistic memory
- Avoid over-allocation (increases queue time)
- Monitor usage with:
sacct -j <jobid> --format=MaxRSS
6. I/O Considerations
- Use local
/tmpwhen possible - Avoid heavy I/O on shared storage
- Batch writes instead of frequent small writes
Typical Workload Types
| Workload Type | Recommended Slurm Settings |
|---|---|
| Serial | ntasks=1, cpus-per-task=1 |
| OpenMP | ntasks=1, cpus-per-task=N |
| MPI | ntasks=N |
| Hybrid | ntasks + cpus-per-task |
Example CPU Job
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --partition=defq
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=32G
#SBATCH --time=02:00:00
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./my_application
Key Takeaways
- Use correct Slurm parameters
- Respect NUMA and cache locality
- Optimize total runtime (queue + execution)
References
- https://slurm.schedmd.com/
- https://hpc-wiki.info/