Getting Started

ArrayFire is a high performance software library for parallel computing with an easy-to-use API. ArrayFire abstracts away much of the details of programming parallel architectures by providing a high-level container object, the array, that represents data stored on a CPU, GPU, FPGA, or other type of accelerator. This abstraction permits developers to write massively parallel applications in a high-level language where they need not be concerned about low-level optimizations that are frequently required to achieve high throughput on most parallel architectures.

ArrayFire provides one generic container object, the array on which functions and mathematical operations are performed. The `array`

can represent one of many different basic data types:

- f32 real single-precision (
`float`

) - c32 complex single-precision (
`cfloat`

) - f64 real double-precision (
`double`

) - c64 complex double-precision (
`cdouble`

) - f16 real half-precision (
`half_float::half`

) - b8 8-bit boolean values (
`bool`

) - s32 32-bit signed integer (
`int`

) - u32 32-bit unsigned integer (
`unsigned`

) - u8 8-bit unsigned values (
`unsigned char`

) - s64 64-bit signed integer (
`intl`

) - u64 64-bit unsigned integer (
`uintl`

) - s16 16-bit signed integer (
`short`

) - u16 16-bit unsigned integer (
`unsigned short`

)

Most of these data types are supported on all modern GPUs; however, some older devices may lack support for double precision arrays. In this case, a runtime error will be generated when the array is constructed.

If not specified otherwise, `array`

s are created as single precision floating point numbers (`f32`

).

ArrayFire arrays represent memory stored on the device. As such, creation and population of an array will consume memory on the device which cannot freed until the `array`

object goes out of scope. As device memory allocation can be expensive, ArrayFire also includes a memory manager which will re-use device memory whenever possible.

Arrays can be created using one of the array constructors. Below we show how to create 1D, 2D, and 3D arrays with uninitialized values:

// Arrays may be created using the array constructor and dimensioned

// as 1D, 2D, 3D; however, the values in these arrays will be undefined

array undefined_1D(100); // 1D array with 100 elements

array undefined_2D(10, 100); // 2D array of size 10 x 100

array undefined_3D(10, 10, 10); // 3D array of size 10 x 10 x 10

However, uninitialized memory is likely not useful in your application. ArrayFire provides several convenient functions for creating arrays that contain pre-populated values including constants, uniform random numbers, uniform normally distributed numbers, and the identity matrix:

// Generate an array of size three filled with zeros.

// If no data type is specified, ArrayFire defaults to f32.

// The constant function generates the data on the device.

array zeros = constant(0, 3);

// Generate a 1x4 array of uniformly distributed [0,1] random numbers

// The randu function generates the data on the device.

array rand1 = randu(1, 4);

// Generate a 2x2 array (or matrix, if you prefer) of random numbers

// sampled from a normal distribution.

// The randn function generates data on the device.

array rand2 = randn(2, 2);

// Generate a 3x3 identity matrix. The data is generated on the device.

array iden = identity(3, 3);

// Lastly, create a 2x1 array (column vector) of uniformly distributed

// 32-bit complex numbers (c32 data type):

array randcplx = randu(2, 1, c32);

A complete list of ArrayFire functions that automatically generate data on the device may be found on the functions to create arrays page. As stated above, the default data type for arrays is f32 (a 32-bit floating point number) unless specified otherwise.

ArrayFire `array`

s may also be populated from data found on the host. For example:

// Create a six-element array on the host

float hA[] = {0, 1, 2, 3, 4, 5};

// Which can be copied into an ArrayFire Array using the pointer copy

// constructor. Here we copy the data into a 2x3 matrix:

array A(2, 3, hA);

// ArrayFire provides a convenince function for printing array

// objects in case you wish to see how the data is stored:

af_print(A);

// This technique can also be used to populate an array with complex

// data (stored in {{real, imaginary}, {real, imaginary}, ... } format

// as found in C's complex.h and C++'s <complex>.

// Below we create a 3x1 column vector of complex data values:

array dB(3, 1, (cfloat *)hA); // 3x1 column vector of complex numbers

af_print(dB);

ArrayFire also supports array initialization from memory already on the GPU. For example, with CUDA one can populate an `array`

directly using a call to `cudaMemcpy`

:

// Create an array on the host, copy it into an ArrayFire 2x3 ArrayFire

// array

float host_ptr[] = {0, 1, 2, 3, 4, 5};

array a(2, 3, host_ptr);

// Create a CUDA device pointer, populate it with data from the host

float *device_ptr;

cudaMalloc((void **)&device_ptr, 6 * sizeof(float));

cudaMemcpy(device_ptr, host_ptr, 6 * sizeof(float), cudaMemcpyHostToDevice);

// Convert the CUDA-allocated device memory into an ArrayFire array:

array b(2, 3, device_ptr, afDevice); // Note: afDevice (default: afHost)

// Note that ArrayFire takes ownership over `device_ptr`, so memory will

// be freed when `b` id destructed. Do not call cudaFree(device_ptr)!

Similar functionality exists for OpenCL too. If you wish to intermingle ArrayFire with CUDA or OpenCL code, we suggest you consult the CUDA interoperability or OpenCL interoperability pages for detailed instructions.

ArrayFire provides several functions to determine various aspects of arrays. This includes functions to print the contents, query the dimensions, and determine various other aspects of arrays.

The af_print function can be used to print arrays that have already been generated or any expression involving arrays:

// Generate two arrays

array a = randu(2, 2);

array b = constant(1, 2, 1);

// Print them to the console using af_print

af_print(a);

af_print(b);

// Print the results of an expression involving arrays:

af_print(a.col(0) + b + .4);

The dimensions of an array may be determined using either a dim4 object or by accessing the dimensions directly using the dims() and numdims() functions:

// Create a 4x5x2 array of uniformly distributed random numbers

array a = randu(4, 5, 2);

// Determine the number of dimensions using the numdims() function:

printf("numdims(a) %d\n", a.numdims()); // 3

// We can also find the size of the individual dimentions using either

// the `dims` function:

printf("dims = [%lld %lld]\n", a.dims(0), a.dims(1)); // 4,5

// Or the elements of a dim4 object:

dim4 dims = a.dims();

printf("dims = [%lld %lld]\n", dims[0], dims[1]); // 4,5

In addition to dimensions, arrays also carry several properties including methods to determine the underlying type and size (in bytes). You can even determine whether the array is empty, real/complex, a row/column, or a scalar or a vector:

// Get the type stored in the array. This will be one of the many

// `af_dtype`s presented above:

printf("underlying type: %d\n", a.type());

// Arrays also have several conveience functions to determine if

// an Array contains complex or real values:

printf("is complex? %d is real? %d\n", a.iscomplex(), a.isreal());

// if it is a column or row vector

printf("is vector? %d column? %d row? %d\n", a.isvector(), a.iscolumn(),

a.isrow());

// and whether or not the array is empty and how much memory it takes on

// the device:

printf("empty? %d total elements: %lld bytes: %zu\n", a.isempty(),

a.elements(), a.bytes());

For further information on these capabilities, we suggest you consult the full documentation on the array.

ArrayFire features an intelligent Just-In-Time (JIT) compilation engine that converts expressions using arrays into the smallest number of CUDA/OpenCL kernels. For most operations on arrays, ArrayFire functions like a vector library. That means that an element-wise operation, like `c[i] = a[i] + b[i]`

in C, would be written more concisely without indexing, like `c = a + b`

. When there are multiple expressions involving arrays, ArrayFire's JIT engine will merge them together. This "kernel fusion" technology not only decreases the number of kernel calls, but, more importantly, avoids extraneous global memory operations. Our JIT functionality extends across C/C++ function boundaries and only ends when a non-JIT function is encountered or a synchronization operation is explicitly called by the code.

ArrayFire provides hundreds of functions for element-wise operations. All of the standard operators (e.g. +,-,*,/) are supported as are most transcendental functions (sin, cos, log, sqrt, etc.). Here are a few examples:

array R = randu(3, 3);

af_print(constant(1, 3, 3) + complex(sin(R))); // will be c32

// rescale complex values to unit circle

array a = randn(5, c32);

af_print(a / abs(a));

// calculate L2 norm of vectors

array X = randn(3, 4);

af_print(sqrt(sum(pow(X, 2)))); // norm of every column vector

af_print(sqrt(sum(pow(X, 2), 0))); // same as above

af_print(sqrt(sum(pow(X, 2), 1))); // norm of every row vector

To see the complete list of functions please consult the documentation on mathematical, linear algebra, signal processing, and statistics.

ArrayFire contains several platform-independent constants, like Pi, NaN, and Inf. If ArrayFire does not have a constant you need, you can create your own using the af::constant array constructor.

Constants can be used in all of ArrayFire's functions. Below we demonstrate their use in element selection and a mathematical expression:

array A = randu(5, 5);

A(where(A > .5)) = NaN;

array x = randu(10e6), y = randu(10e6);

double pi_est = 4 * sum<float>(hypot(x, y) < 1) / 10e6;

printf("estimation error: %g\n", fabs(Pi - pi_est));

Please note that our constants may, at times, conflict with macro definitions in standard header files. When this occurs, please refer to our constants using the `af::`

namespace.

Like all functions in ArrayFire, indexing is also executed in parallel on the OpenCL/CUDA devices. Because of this, indexing becomes part of a JIT operation and is accomplished using parentheses instead of square brackets (i.e. as `A(0)`

instead of `A[0]`

). To index `af::array`

s you may use one or a combination of the following functions:

- integer scalars
- seq() representing a linear sequence
- end representing the last element of a dimension
- span representing the entire dimension
- row(i) or col(i) specifying a single row/column
- rows(first,last) or cols(first,last) specifying a span of rows or columns

Please see the indexing page for several examples of how to use these functions.

Memory in `af::array`

s may be accessed using the host() and device() functions. The `host`

function *copies* the data from the device and makes it available in a C-style array on the host. As such, it is up to the developer to manage any memory returned by `host`

. The `device`

function returns a pointer/reference to device memory for interoperability with external CUDA/OpenCL kernels. As this memory belongs to ArrayFire, the programmer should not attempt to free/deallocate the pointer. For example, here is how we can interact with both OpenCL and CUDA:

// Create an array consisting of 3 random numbers

array a = randu(3, f32);

// Copy an array on the device to the host:

float *host_a = a.host<float>();

// access the host data as a normal array

printf("host_a[2] = %g\n", host_a[2]); // last element

// and free memory using freeHost:

freeHost(host_a);

// Get access to the device memory for a CUDA kernel

float *d_cuda = a.device<float>(); // no need to free this

float value;

cudaMemcpy(&value, d_cuda + 2, sizeof(float), cudaMemcpyDeviceToHost);

printf("d_cuda[2] = %g\n", value);

a.unlock(); // unlock to allow garbage collection if necessary

// Because OpenCL uses references rather than pointers, accessing memory

// is similar, but has a somewhat clunky syntax. For the C-API

cl_mem d_opencl = (cl_mem)a.device<float>();

// for the C++ API, you can just wrap this object into a cl::Buffer

// after calling clRetainMemObject.

ArrayFire also provides several helper functions for creating `af::array`

s from OpenCL `cl_mem`

references and `cl::Buffer`

objects. See the `include/af/opencl.h`

file for further information.

Lastly, if you want only the first value from an `af::array`

you can use get it using the scalar() function:

array a = randu(3);

float val = a.scalar<float>();

printf("scalar value: %g\n", val);

In addition to supporting standard mathematical functions, arrays that contain integer data types also support bitwise operators including and, or, and shift:

int h_A[] = {1, 1, 0, 0, 4, 0, 0, 2, 0};

int h_B[] = {1, 0, 1, 0, 1, 0, 1, 1, 1};

array A = array(3, 3, h_A), B = array(3, 3, h_B);

af_print(A);

af_print(B);

array A_and_B = A & B;

af_print(A_and_B);

array A_or_B = A | B;

af_print(A_or_B);

array A_xor_B = A ^ B;

af_print(A_xor_B);

The ArrayFire API is wrapped into a unified C/C++ header. To use the library simply include the `arrayfire.h`

header file and start coding!

#include <arrayfire.h>

// Generate random data and sum and print the result

int main(void)

{

// generate random values

af_array a;

int n_dims = 1;

dim_t dims[] = {10000};

// sum all the values

double result;

af_sum_all(&result, 0, a);

printf("sum: %g\n", result);

return 0;

}

AFAPI af_err af_randu(af_array *out, const unsigned ndims, const dim_t *const dims, const af_dtype type)

AFAPI af_err af_sum_all(double *real, double *imag, const af_array in)

C Interface to sum array elements over all dimensions.

#include <arrayfire.h>

// Generate random data, sum and print the result.

int main(void)

{

// Generate 10,000 random values

// Sum the values and copy the result to the CPU:

double sum = af::sum<float>(a);

printf("sum: %g\n", sum);

return 0;

}

AFAPI array randu(const dim4 &dims, const dtype ty, randomEngine &r)

C++ Interface to create an array of random numbers uniformly distributed.

Now that you have a general introduction to ArrayFire, where do you go from here? In particular you might find these documents useful

- Building an ArrayFire program on Linux
- Building an Arrayfire program on Windows
- Timing ArrayFire code

- Google Groups: https://groups.google.com/forum/#!forum/arrayfire-users
- ArrayFire Services: Consulting | Support | Training
- ArrayFire Blogs: http://arrayfire.com/blog/
- Email: techn.nosp@m.ical.nosp@m.@arra.nosp@m.yfir.nosp@m.e.com