Overview
About ArrayFire

ArrayFire is a high performance library for parallel computing with an easy-to-use API. It enables users to write scientific computing code that is portable across CUDA, OpenCL and CPU devices. This project provides Python bindings for the ArrayFire library.
Installing ArrayFire
- Install ArrayFire using either a binary installer for Windows, OSX, or Linux or download it from source:
Using ArrayFire
The array object is beautifully simple.
Array-based notation effectively expresses computational algorithms in readable math-resembling notation. Expertise in parallel programming is not required to use ArrayFire.
A few lines of ArrayFire code accomplishes what can take 100s of complicated lines in CUDA, oneAPI, or OpenCL kernels.
Support for multiple domains
ArrayFire contains hundreds of functions across various domains including:
Vector Algorithms Image Processing Computer Vision Signal Processing Linear Algebra Statistics and more. Each function is hand-tuned by ArrayFire developers with all possible low-level optimizations.
Support for various data types and sizes
ArrayFire operates on common data shapes and sizes, including vectors, matrices, volumes, and
It supports common data types, including single and double precision floating point values, complex numbers, booleans, and 32-bit signed and unsigned integers.
Extending ArrayFire
ArrayFire can be used as a stand-alone application or integrated with existing CUDA, oneAPI, or OpenCL code.
With support for x86, ARM, CUDA, oneAPI, and OpenCL devices, ArrayFire supports for a comprehensive list of devices.
Each ArrayFire installation comes with:
a CUDA backend (named ‘libafcuda’) for NVIDIA GPUs
a oneAPI backend (named ‘libafoneapi’) for oneAPI devices
an OpenCL backend (named ‘libafopencl’) for OpenCL devices,
a CPU backend (named ‘libafcpu’) to fall back to when CUDA, oneAPI, or OpenCL devices are unavailable.
Vectorized and Batched Operations
ArrayFire supports batched operations on N-dimensional arrays. Batch operations in ArrayFire are run in parallel ensuring an optimal usage of CUDA, oneAPI, or OpenCL devices.
Best performance with ArrayFire is achieved using vectorization techniques.
ArrayFire can also execute loop iterations in parallel with the gfor function.
Just in Time compilation
ArrayFire performs run-time analysis of code to increase arithmetic intensity and memory throughput, while avoiding unnecessary temporary allocations. It has an awesome internal JIT compiler to make important optimizations.
Read more about how ArrayFire JIT. can improve the performance in your application.
Simple Example
Here is an example of ArrayFire code that performs a Monte Carlo estimation of PI.
# Monte Carlo estimation of pi
def calc_pi_device(samples) -> float:
# Simple, array based API
# Generate uniformly distributed random numers
x = af.randu(samples)
y = af.randu(samples)
# Supports Just In Time Compilation
# The following line generates a single kernel
within_unit_circle = (x * x + y * y) < 1
# Intuitive function names
return 4 * af.count(within_unit_circle) / samples
Product Support
Free Community Options
ArrayFire Mailing List (recommended)
Contact Us
If you need to contact us, visit our Contact Us page.
Engineering: technical@arrayfire.com
Sales: sales@arrayfire.com
Citations and Acknowledgements
If you redistribute ArrayFire, please follow the terms established in the license. If you wish to cite ArrayFire in an academic publication, please use the following reference: