ADSALA dynamically selects optimal thread configurations for BLAS operations, achieving 1.5× to 3.0× speedups across multi-core systems.
Performance comparison on Intel Xeon Gold 6248R (3.0GHz)
Average 2.1× speedup over default configuration
Adaptive Thread Configuration for BLAS
Machine learning-driven optimization of Basic Linear Algebra Subprograms (BLAS) Level 3 operations
Achieves 1.5× to 3.0× speedups compared to traditional maximum-thread approaches across various hardware platforms.
Machine learning models trained during installation to predict optimal thread counts for each operation on your specific hardware.
Expanded ADSALA library includes all single- and double-precision BLAS Level 3 operations with adaptive optimization.
Benchmark Results
Speedup factors compared to default maximum-thread configuration across different BLAS Level 3 operations.
Gold/Platinum series processors
Rome/Milan series processors
Intel Core i7/i9, AMD Ryzen
ADSALA shows consistent performance improvements across varying matrix dimensions, with particularly strong gains for medium-sized matrices (512×512 to 2048×2048).
How ADSALA Works
Download and compile ADSALA library
Training phase executes benchmarks to profile system performance
Machine learning models are trained on the collected data
Optimized library is ready for production use
Application calls a BLAS Level 3 routine
ADSALA predicts optimal thread count based on operation parameters
Operation executes with optimized thread configuration
# Installation and training process
./configure --prefix=/usr/local/adsala
make
make train # Executes benchmark suite
make install # Installs optimized library
# Training output example
[ADSALA] Training on Intel Xeon Gold 6248R
[ADSALA] Benchmarking DGEMM... 2.4× potential
[ADSALA] Benchmarking DSYMM... 1.9× potential
[ADSALA] Generated optimization models
[ADSALA] Installation complete
Random Forest and Gradient Boosted Decision Trees trained on operation parameters (matrix dimensions, operation type) and hardware counters.
Matrix dimensions, operation type, memory hierarchy characteristics, core utilization patterns.
Less than 1% runtime overhead for model prediction, with substantial gains in execution time.
Download and Installation
ADSALA is available through popular package managers for easy installation.
sudo add-apt-repository ppa:adsala/optimized-blas
sudo apt-get update
sudo apt-get install libadsala
Compile from source for maximum customization
Contribute or report issues
Full API reference and usage examples
Cite Our Work
The complete technical details and evaluation of ADSALA are available in our peer-reviewed paper.
We present ADSALA, an adaptive BLAS Level 3 library that uses machine learning to dynamically select the optimal number of threads for each operation. Our approach achieves 1.5× to 3.0× speedups compared to the traditional maximum-thread configuration across various multi-core systems. The library automatically trains models during installation that capture the performance characteristics of the target hardware, then uses these models at runtime to predict the best thread configuration for each BLAS operation.