Ways to speed up your program

This is an interesting writeup on various ways to speed up your application. This is useful if you are getting into HPC for the first time. The author POR IVICA BOGOSAVLJEVIĆ suggested various ways

  • Distributing workload to multiple CPU cores
  • Distributing workload to accelerators
  • Usage of vectorization capabilities of your CPU
  • Optimizing for the memory subsystem
  • Optimizing for the CPU’s branch prediction unit


Great Tools for AMD EPYC

  1. AMD EPYC™ Processor Selector Tool with Kit Configurator
    Compare your current CPU with AMD EPYC™ CPUs on price, cores, and performance, then build out your ideal server.
  2. AMD EPYC™ Server Virtualization TCO Estimation Tool
    See the potential value AMD EPYC™ CPUs may deliver for your datacenter. Input your VM requirements and environment factors like power, real estate cost, select your virtualization license, and more. Compare your current x86 based server solution to a solution powered by AMD EPYC™ processors.
  3. AMD EPYC™ Bare Metal TCO Estimation Tool
    Discover the potential value that AMD EPYC™ CPUs can deliver for your bare metal server environment. Compare by server count, performance, or total budget. Then select your filter, your processor comparisons, and system memory requirements. Choose 3, 4, or 5 year time frames for your AMD EPYC™ Bare Metal TCO estimation.
  4. AMD Cloud Cost Advisor
    Discover the  potential value AMD EPYC™ CPUs bring to the cloud with the latest cost analysis tool. AMD Cloud Cost Advisor helps with real-time insights into estimated cost savings when switching to cloud instances powered by AMD within the same cloud service provider.

IBM claimed to have made the world’s first 2nm chip

IBM claims that the 2nm chip could achieve 45 percent higher performance, or 75 percent lower energy use, than “today’s most advanced 7nm node chips.”

Darío Gil, SVP and director of IBM Research.

A single human hair spans a whopping 50,000-75,000 nanometers. A human red blood cell is 6,000-8,000nm. Covid-19 is 50-140nm…….. To build nodes 3nm and below requires extremely expensive and sensitive equipment, and a rethink on how nodes are laid out – hence the different metrics now used to measure smaller nodes.

DataDynamics (https://www.datacenterdynamics.com/en/news/ibm-claims-to-have-made-the-worlds-first-2nm-chip/)


  1. IBM claims to have made the world’s first 2nm chip (DataDynamics)
  2. IBM Creates First 2nm Chip (Anandtech)
  3. IBM 2nm chip breakthrough claims more power with less energy (BBC)

Product Brief with AMD EPYC 7003

3rd Gen AMD EPYC™ processors raise the bar once more for workload performance, with up to 19% more instructions per clock (IPC)1. No matter the job, you can drive faster time to results, provide more and better data for decisions, and achieve better business outcomes. With our leadership approach, the world’s highest performance server CPU, AMD EPYC 7763,2 and AMD Infinity Architecture deliver innovation —up to 32MB of L3 cache per core, synchronized fabric and memory clock speeds designed for improved performance, plus hardware and virtual security features to help safeguard your business—right out of the box

AMD EPYC Product Brief (Technical | In-depth details about your new AMD EPYC 7003 Series Processers (pathfactory.com))


Introducing 3rd Gen AMD Processors for the Modern Data Centre

Join CEO Dr. Lisa Su, CTO Mark Papermaster, Senior VP and GM of Datacenter and Embedded Solutions Business Group, Forrest Norrod, Senior VP and GM of Server Business Unit, Dan McNamara, and appearances by industry-leading data center strategic partners and customers in this digital launch of the 3rd Gen AMD EPYC™ Processors.


00:00​ – Intro
01:00​ – Introducing 3rd Gen AMD EPYC
07:48​ – “Zen 3” Architecture for Data Center
15:24​ – 3rd Gen AMD EPYC Portfolio & Performance:
20:44​ – HPC Performance Leadership & Exascale Computing
25:44​ – Powering the Most Important Cloud Services
35:54​ – Accelerating Enterprise Workloads
40:42​ – AMD EPYC Solution Ecosystem
49:22​ – Conclusion

Analyzing Memory and Threading Correctness for GPU-Offloaded Code

Modern workloads are diverse—and so are architectures. No single architecture is best for every workload. Maximizing performance takes a mix of scalar, vector, matrix, and spatial architectures deployed in CPU, GPU, FPGA, and other future accelerators. Heterogeneity adds complexity that can be difficult to debug. This article introduces the new features of Intel® Inspector that support the analysis of code that’s offloaded to accelerators.

For more information: Analyzing Memory and Threading Correctness for GPU-Offloaded Code

Hewlett Packard Enterprise selected to build new supercomputer for the National Supercomputing Centre Singapore

The next generation national supercomputer for Singapore will be a green, warm water-cooled system – one of the first known deployments of such a system in a tropical environment. When operational the supercomputer is expected to provide an aggregate of up to 10 PFLOPS of raw compute power and is eight times more powerful than the current ASPIRE1 supercomputer. ASPIRE1, which was commissioned in 2016, has been running at near full capacity in support of local advanced research that requires high-end computing resources. The new system is the first in a series of supercomputers that will be deployed in phases from now till 2025 to expand and upgrade Singapore’s high-performance computing (HPC) capabilities for the research community here.

– National Supercomputing Centre Singapore –

For more information,