Overview

About ArrayFire

ArrayFire is a high performance library for parallel computing with an easy-to-use API. It enables users to write scientific computing code that is portable across CUDA, OpenCL and CPU devices. This project provides Python bindings for the ArrayFire library.

Installing ArrayFire

Install ArrayFire using either a binary installer for Windows, OSX, or Linux or download it from source:

Using ArrayFire

The array object is beautifully simple.

Array-based notation effectively expresses computational algorithms in readable math-resembling notation. Expertise in parallel programming is not required to use ArrayFire.

A few lines of ArrayFire code accomplishes what can take 100s of complicated lines in CUDA, oneAPI, or OpenCL kernels.

Support for multiple domains

ArrayFire contains hundreds of functions across various domains including:

Vector Algorithms Image Processing Computer Vision Signal Processing Linear Algebra Statistics and more. Each function is hand-tuned by ArrayFire developers with all possible low-level optimizations.

Support for various data types and sizes

ArrayFire operates on common data shapes and sizes, including vectors, matrices, volumes, and

It supports common data types, including single and double precision floating point values, complex numbers, booleans, and 32-bit signed and unsigned integers.

Extending ArrayFire

ArrayFire can be used as a stand-alone application or integrated with existing CUDA, oneAPI, or OpenCL code.

With support for x86, ARM, CUDA, oneAPI, and OpenCL devices, ArrayFire supports for a comprehensive list of devices.

Each ArrayFire installation comes with:

a CUDA backend (named ‘libafcuda’) for NVIDIA GPUs

a oneAPI backend (named ‘libafoneapi’) for oneAPI devices

an OpenCL backend (named ‘libafopencl’) for OpenCL devices,

a CPU backend (named ‘libafcpu’) to fall back to when CUDA, oneAPI, or OpenCL devices are unavailable.

Vectorized and Batched Operations

ArrayFire supports batched operations on N-dimensional arrays. Batch operations in ArrayFire are run in parallel ensuring an optimal usage of CUDA, oneAPI, or OpenCL devices.

Best performance with ArrayFire is achieved using vectorization techniques.

ArrayFire can also execute loop iterations in parallel with the gfor function.

Just in Time compilation

ArrayFire performs run-time analysis of code to increase arithmetic intensity and memory throughput, while avoiding unnecessary temporary allocations. It has an awesome internal JIT compiler to make important optimizations.

Read more about how ArrayFire JIT. can improve the performance in your application.

Simple Example

Here is an example of ArrayFire code that performs a Monte Carlo estimation of PI.

# Monte Carlo estimation of pi
def calc_pi_device(samples) -> float:
    # Simple, array based API
    # Generate uniformly distributed random numers
    x = af.randu(samples)
    y = af.randu(samples)
    # Supports Just In Time Compilation
    # The following line generates a single kernel
    within_unit_circle = (x * x + y * y) < 1
    # Intuitive function names
    return 4 * af.count(within_unit_circle) / samples

Product Support

Free Community Options

ArrayFire Mailing List (recommended)

StackOverFlow

Premium Support

Phone Support - available for purchase(request a quote)

Contact Us

If you need to contact us, visit our Contact Us page.

Email

Engineering: technical@arrayfire.com

Sales: sales@arrayfire.com

Citations and Acknowledgements

If you redistribute ArrayFire, please follow the terms established in the license. If you wish to cite ArrayFire in an academic publication, please use the following reference: