Optimizing BLAS Level 3 with Machine Learning

ADSALA dynamically selects optimal thread configurations for BLAS operations, achieving 1.5× to 3.0× speedups across multi-core systems.

Get the Library

Read the Paper

Default

ADSALA

OpenBLAS

MKL

Performance comparison on Intel Xeon Gold 6248R (3.0GHz)

Average 2.1× speedup over default configuration

Research Overview

Adaptive Thread Configuration for BLAS

Machine learning-driven optimization of Basic Linear Algebra Subprograms (BLAS) Level 3 operations

Performance Gains

Achieves 1.5× to 3.0× speedups compared to traditional maximum-thread approaches across various hardware platforms.

ML-Powered

Machine learning models trained during installation to predict optimal thread counts for each operation on your specific hardware.

Comprehensive Coverage

Expanded ADSALA library includes all single- and double-precision BLAS Level 3 operations with adaptive optimization.

Performance

Benchmark Results

Speedup Comparison

DGEMM 2.8×

DSYMM 2.1×

DTRMM 1.7×

Speedup factors compared to default maximum-thread configuration across different BLAS Level 3 operations.

System Compatibility

Intel Xeon

Gold/Platinum series processors

AMD EPYC

Rome/Milan series processors

Consumer CPUs

Intel Core i7/i9, AMD Ryzen

Performance Across Matrix Sizes

ADSALA shows consistent performance improvements across varying matrix dimensions, with particularly strong gains for medium-sized matrices (512×512 to 2048×2048).

Implementation

How ADSALA Works

Installation Process

1
Download and compile ADSALA library
2
Training phase executes benchmarks to profile system performance
3
Machine learning models are trained on the collected data
4
Optimized library is ready for production use

Runtime Operation

BLAS Call

Application calls a BLAS Level 3 routine
Model Prediction

ADSALA predicts optimal thread count based on operation parameters
Execution

Operation executes with optimized thread configuration

adsala_install.sh

# Installation and training process
./configure --prefix=/usr/local/adsala
make
make train   # Executes benchmark suite
make install # Installs optimized library

# Training output example
[ADSALA] Training on Intel Xeon Gold 6248R
[ADSALA] Benchmarking DGEMM... 2.4× potential
[ADSALA] Benchmarking DSYMM... 1.9× potential
[ADSALA] Generated optimization models
[ADSALA] Installation complete

Technical Details

Machine Learning Models

Random Forest and Gradient Boosted Decision Trees trained on operation parameters (matrix dimensions, operation type) and hardware counters.

Features Used

Matrix dimensions, operation type, memory hierarchy characteristics, core utilization patterns.

Overhead

Less than 1% runtime overhead for model prediction, with substantial gains in execution time.

Get ADSALA

Download and Installation

Install via package manager

ADSALA is available through popular package managers for easy installation.

sudo add-apt-repository ppa:adsala/optimized-blas
sudo apt-get update
sudo apt-get install libadsala

sudo yum-config-manager --add-repo https://adsala.org/repo/centos/adsala.repo
sudo yum install libadsala

brew tap adsala/optimized-blas
brew install libadsala

Source Code

Compile from source for maximum customization

Download .tar.gz

GitHub Repository

Contribute or report issues

View on GitHub

Documentation

Full API reference and usage examples

Read Docs

Research Paper

Cite Our Work

The complete technical details and evaluation of ADSALA are available in our peer-reviewed paper.

Download PDF View Poster

Abstract

We present ADSALA, an adaptive BLAS Level 3 library that uses machine learning to dynamically select the optimal number of threads for each operation. Our approach achieves 1.5× to 3.0× speedups compared to the traditional maximum-thread configuration across various multi-core systems. The library automatically trains models during installation that capture the performance characteristics of the target hardware, then uses these models at runtime to predict the best thread configuration for each BLAS operation.

Published: June 2023

Optimizing BLAS Level 3 with Machine Learning

Research Overview

Performance Gains

ML-Powered

Comprehensive Coverage

Performance

Speedup Comparison

System Compatibility

Intel Xeon

AMD EPYC

Consumer CPUs

Performance Across Matrix Sizes

Implementation

Installation Process

Runtime Operation

BLAS Call

Model Prediction

Execution

Technical Details

Machine Learning Models

Features Used

Overhead

Get ADSALA

Install via package manager

Source Code

GitHub Repository

Documentation

Research Paper

Abstract

IPDPS 2024

IPDPS 2023

Optimizing BLAS Level 3 with Machine Learning

Research Overview

Performance Gains

ML-Powered

Comprehensive Coverage

Performance

Speedup Comparison

System Compatibility

Intel Xeon

AMD EPYC

Consumer CPUs

Performance Across Matrix Sizes

Implementation

Installation Process

Runtime Operation

BLAS Call

Model Prediction

Execution

Technical Details

Machine Learning Models

Features Used

Overhead

Get ADSALA

Install via package manager

Source Code

GitHub Repository

Documentation

Research Paper

Abstract

BibTeX Entries

IPDPS 2024

IPDPS 2023