3rd Gen AMD EPYC™ processors raise the bar once more for workload performance, with up to 19% more instructions per clock (IPC)1. No matter the job, you can drive faster time to results, provide more and better data for decisions, and achieve better business outcomes. With our leadership approach, the world’s highest performance server CPU, AMD EPYC 7763,2 and AMD Infinity Architecture deliver innovation —up to 32MB of L3 cache per core, synchronized fabric and memory clock speeds designed for improved performance, plus hardware and virtual security features to help safeguard your business—right out of the box
Join CEO Dr. Lisa Su, CTO Mark Papermaster, Senior VP and GM of Datacenter and Embedded Solutions Business Group, Forrest Norrod, Senior VP and GM of Server Business Unit, Dan McNamara, and appearances by industry-leading data center strategic partners and customers in this digital launch of the 3rd Gen AMD EPYC™ Processors.
00:00 – Intro 01:00 – Introducing 3rd Gen AMD EPYC 07:48 – “Zen 3” Architecture for Data Center 15:24 – 3rd Gen AMD EPYC Portfolio & Performance: 20:44 – HPC Performance Leadership & Exascale Computing 25:44 – Powering the Most Important Cloud Services 35:54 – Accelerating Enterprise Workloads 40:42 – AMD EPYC Solution Ecosystem 49:22 – Conclusion
Step 1: Turn off swap
Turn off swap to prevent accidental swapping. Do not that disabling swap without sufficient memory can have undesired effects
Step 2: Turn off NUMA balancing
NUMA balancing can have undesired effects and since it is possible to bind the ranks and memory in HPC, this setting is not needed
echo 0 > /proc/sys/kernel/numa_balancing
Step 3: Disable ASLR (Address Space Layout Ranomization) is a security feature used to prevent the exploitation of memory vulnerabilities
echo 0 > /proc/sys/kernel/randomize_va_space
Step 4: Set CPU governor to performance and disable cc6. Setting the CPU perfomance to governor to perfomrnaces ensures max performances at all times. Disabling cc6 ensures that deeper CPU sleep states are not entered.
cpupower frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
cpupower idle-set -d 2
Idlestate 2 disabled on CPU 0
Idlestate 2 disabled on CPU 1
Idlestate 2 disabled on CPU 2
Selected Explanation of Setting. (See Document for FULL explanation)
1. Simultaneous Mult-Threading (SMT) or HyperThreading (HT)
IN HPC Workload, the SMT are usually turned off
This option helps the operating system deal with interrupt more efficiently in high cores count configuration. It is recommended to enable this option. This option must be enabled if using more than 255 threads
3. Numa Per Socket (NPS)
In many HPC applications, ranks and memory can be pinned to cores and NUMA Nodes. The recommended value should be NPS4 option. However, if the workload is not NUMA aware or suffers when the NUMA complexity increase, we can experiment with NSP1.
4. Memory Frequency, Infinity Fabric Frequency, and coupled ve uncoupled mode
Memory Clock and Infinity Fabric Clock can run at synchronous frequencies (coupled mode) or at asynchronous frequencies (uncoupled mode)
If the memory is clocked at lower than 2933 MT/s, the memory and fabric will run in coupled mode which has the lowest memory latency
If the memory is clocked at 3200 MT/s, the memory and fabric clock will run in asynchronous mode has higher bandwidth but increased memory latency.
Make sure APBDIS is set to 1 and fixed SOC Pstate is set to P0
5. Preferred IO
Preferred IO allows one PCIe device in the system to be configured in a preferred mode. This device gets preferential treant on the infinity fabric
6. Determinism Slider
Recommended to choose Power Option. For this mode, the CPUs in the system performance at the maximum capability of each silicon device. Due to the natural variation existing during the manufacturing process, some CPUs performances may be varied, but will never fall below “Performance Determinism mode”