Leveraging IBM Power Systems and AIX for High-Performance Matrix Multiplication

Submitted Apr 18, 2025

Type of submission: 30 mins talk Topic of your submission: Performance engineering I am submitting for: Rootconf Annual Conference 2025

{My talk/session in 2-3 paragraphs.}

At the Rootconf 2025 annual conference, I want to talk about Matrix Multiplication Assist on IBM’s Power platform and how we integrate the same into the open-source world/AIX toolbox for the enterprise server platform. My submission is for the platform engineering track for building infrastructure.

The first part of my talk will be about IBM Power System processors with the Power10 architecture and higher, which have a feature called MMA (Matrix Multiplication Assist) that speeds up high-performance computing activities like vector operations, matrix algebra, and AI inference. By avoiding conventional interfaces/operations, applications can offload certain activities straight to the hardware via MMA for low-latency, high-bandwidth access for computational acceleration. Because of this, I will talk about how MMA is helpful for workloads that call for dense matrix multiplication, like scientific simulations or AI/ML models.

The second part of my talk is about AIX (Advanced Interactive eXecutive), IBM’s UNIX-based enterprise OS on Power systems, which can leverage MMA when the application stack or libraries (such as OpenBLAS or AI frameworks) are compiled to take advantage of the Power10 instruction set and the underlying accelerator features. I want to talk about how a system designed to run OLTP (online transaction processing) can now run transaction inferencing as well in one LPAR, which is a benefit to the real world. AIX’s stable, performance-oriented kernel allows tight integration with hardware capabilities, and IBM Power Systems provides Enhanced Performance Tools and Accelerator Libraries that expose MMA features to user applications in a way that’s configurable. This combination makes AIX on Power10 a strong platform for industries needing reliable, accelerated performance for compute-heavy workloads. I also want to talk about one of our experiments, which can run prompt processing of 75 (approx.) tokens per second using eight or sixteen threads using CPU + OpenBLAS + MMA in the AIX operating system of power servers. This is a benefit for example workloads with entity extraction that require matrix multiplications during embeddings or message passing.

The final part of my talk is about open-source packages where MMA code is available for power systems. I will talk about MMA acceleration on Power systems that has the potential to enhance inference engines (which are software components that run artificial intelligence) by speeding up compute-intensive matrix operations like dot products and convolutions. These tasks can be delegated to the MMA units on AIX using inference engines such as Llamacpp, ONNX runtime, or other AI/deep learning frameworks like PyTorch via OpenBlas that have been converted and optimized for Power architecture. Because of the quicker inference times, lower CPU load, and improved throughput, I will conclude my talk with how AIX on Power is a platform for contemporary AI-driven applications in addition to being a dependable operating system for conventional enterprise workloads.

Slide and Reference Links:

My slides for the presentation:
https://docs.google.com/presentation/d/1IhjjPt9fNwvDnBFyPShL6hCqCcltEAIb/edit?slide=id.p1#slide=id.p1

About Matrix Multiplication Assist: https://www.redbooks.ibm.com/redpapers/pdfs/redp5612.pdf

About AIX Toolbox: https://www.ibm.com/support/pages/aix-toolbox-open-source-software-downloads-alpha

{1-2 takeaways from the session.}

Boosted Performance for Transaction Inferencing: By offloading compute-intensive operations to MMA units, inference engines operating on AIX may greatly increase the speed and efficiency of AI model inference while freeing up CPU resources for other workloads.
Hardware-Accelerated Matrix Math: By providing direct, low-latency access to specialized accelerator hardware, MMA on IBM Power Systems makes it possible to execute matrix operations incredibly quickly, making it perfect for AI, ML, and high-performance computing jobs.
Seamless Integration of Code to the Open-Source World for AIX: AIX OS can leverage MMA through optimized libraries and tools in the open-source world, enabling enterprise applications to benefit from hardware acceleration without rewriting core logic.

{The audience segment the talk/session going to beneficial for.}

AI/ML Practitioners & Data Scientists
• Faster matrix calculations and reduced latency are advantageous for those executing inference workloads or implementing models in production.
• MMA-optimized frameworks can shorten the time-to-result for AI applications such as predictive maintenance, picture recognition, and fraud detection.
Infrastructure Engineers
• Organisations/Individuals running mission-critical workloads can now integrate AI/ML workloads in Power Platform
• It enables a hybrid setup where traditional applications and AI inference can coexist efficiently on the same platform in AIX.
HPC & Scientific Computing Developers
• Those working on simulation, modelling, or large-scale numerical analysis benefit from MMA’s matrix acceleration, improving computation times.
• Industries like finance, healthcare, and engineering that require precision and performance can leverage MMA through optimized libraries on AIX operating systems.

{My bio}

I am Aditya Kamath, a software engineer at IBM India Software Development Labs, specializing in open-source development. My work focuses on building robust systems, toolchains, and platforms capable of handling enterprise-level workloads.
My key contributions in the open-source world are for the Meson, CMake, and FAISS communities, and I am currently contributing to PyTorch projects. I am also the maintainer for the GDB debugger on AIX. I enjoy contributing to open source by working at the intersection of system software, performance tuning, and open-source collaboration.

LinkedIn: https://www.linkedin.com/in/aditya-kamath-574732169/
Github: https://github.com/KamathForAIX

Rootconf 2025 Annual Conference CfP

Leveraging IBM Power Systems and AIX for High-Performance Matrix Multiplication

Comments