Introduction to Vectorization

Programmers and Data Scientists want to take advantage of fast and parallel computational devices. Writing vectorized code is necessary to get the best performance out of the current generation parallel hardware and scientific computing software. However, writing vectorized code may not be immediately intuitive. ArrayFire provides many ways to vectorize a given code segment. In this tutorial, we present several methods to vectorize code using ArrayFire and discuss the benefits and drawbacks associated with each method.

Generic/Default Vectorization

By its very nature, ArrayFire is a vectorized library. Most functions operate on arrays as a whole – on all elements in parallel. Wherever possible, existing vectorized functions should be used opposed to manually indexing into arrays. For example consider the following code:

import arrayfire as af

# Create an ArrayFire array 'a' using af.range()
a = af.range(10)  # Creates an array [0, 1, 2, ..., 9]

# Loop through the elements of 'a' and increment each element by 1
for i in range(a.dims()[0]):
    a[i] = a[i] + 1  # Increment each element by 1

# Print the modified array 'a'
print("Modified array 'a':")
print(a)

Although completely valid, the code is very inefficient as it results in a kernel kernels that operate on one datum. Instead, the developer should have used ArrayFire’s overload of the + operator:

import arrayfire as af

# [0, 9]
a = af.range(10)

# [1, 10]
a = a + 1

This code will result in a single kernel that operates on all 10 elements of a in parallel.

Most ArrayFire functions are vectorized. A small subset of these include:

Operator Category	Functions
Arithmetic Operations	+, -, *, /, %, >, <
Logical Operations	&&, \|\|(or), <, >, ==, != etc.
Numeric functions	abs(), floor(), round(), min(), max(), etc.
Complex Operations	real(), imag(), conj(), etc.
Exponential and logarithmic functions	exp(), log(), expm1(), log1p(), etc.
Logical Operations	sin(), cos(), tan(), etc.
Hyperbolic Functions	sinh(), cosh(), tanh(), etc.

In addition to element-wise operations, many other functions are also vectorized in ArrayFire.

Notice that even that perform some form of aggregation (e.g. sum() or min()), signal processing (like convolve()), and even image processing functions (i.e. rotate()) all support vectorization on different columns or images. For example, if we have NUM images of size WIDTH by HEIGHT, one could convolve each image in a vector fashion as follows:

import arrayfire as af

# Define the filter coefficients as a list
g_coef = [1, 2, 1, 2, 4, 2, 1, 2, 1]

# Convert the coefficients list to an ArrayFire array and scale it
filter = (1.0 / 16.0) * af.Array(3, 3, g_coef)

# Generate a random signal array of dimensions WIDTH x HEIGHT x NUM
WIDTH = 100
HEIGHT = 100
NUM = 3
signal = af.randu(WIDTH, HEIGHT, NUM)

# Perform 2D convolution of signal with filter
conv = af.convolve2(signal, filter)

# Print the result if needed
print("Convolution result:")
print(conv)

Similarly, one can rotate 100 images by 45 degrees in a single call using code like the following:

import arrayfire as af

# Define dimensions
WIDTH = 256
HEIGHT = 256
NUM_IMAGES = 100

# Generate an array of 100 WIDTH x HEIGHT images of random numbers
imgs = af.randu((NUM_IMAGES, (WIDTH, HEIGHT)))

# Rotate all of the images in a single command (rotate by 45 degrees)
rot_imgs = af.rotate(imgs, 45)

# Print the shape of rot_imgs to verify the result
print("Shape of rotated images:", rot_imgs.shape())

# Optionally, print the rotated images
# print(rot_imgs)

# Optionally, display or further process `rot_imgs` as needed

Although most functions in ArrayFire do support vectorization, some do not. Most notably, all linear algebra functions. Even though they are not vectorized linear algebra operations still execute in parallel on your hardware.

Using the built in vectorized operations should be the first and preferred method of vectorizing any code written with ArrayFire.

Batching

Consider the following example. Here we create a filter which we would like to apply to each of the weight vectors. The naive solution would be using a for-loop as we have seen previously:

import arrayfire as af

# Create the filter and weight vectors
filter = af.randu((1, 5))
weights = af.randu((5, 5))

# Apply the filter using a for-loop equivalent
filtered_weights = af.constant(0, (5, 5))
for i in range(weights.shape[1]):
    filtered_weights[:, i] = af.matmul(filter, weights[:, i])

# Print the filtered weights array
print("Filtered weights:")
print(filtered_weights)

However, as we have discussed above, this solution will be very inefficient. One may be tempted to implement a vectorized solution as follows:

import arrayfire as af

# Create the filter and weight vectors
filter = af.randu((1, 5))  # Shape: 1x5
weights = af.randu((5, 5))  # Shape: 5x5

# Transpose the filter to align dimensions for broadcasting
filter_transposed = filter.T  # Shape: 5x1

# Element-wise multiplication with broadcasting
filtered_weights = filter_transposed * weights

# Print the filtered weights array
print("Filtered weights:")
print(filtered_weights)  # Incorrect

However, the dimensions of filter and weights do not match, thus ArrayFire will generate a runtime error.

The correct solution is to use batch operations with matmul By tiling on the third dimension, matmul is done in the first 2 dimensions and batched on the third to obtain the filtered weights:

import arrayfire as af

# Create the filter and weight vectors
filter = af.randu((1, 5))  # Shape: 1x5
batched_filter = af.tile(filter, (1, 1, 5))  # batch on the third dimension
weights = af.randu((5, 5))  # Shape: 5x5

# Leverage matmul batching
filtered_weights = af.matmul(batched_filter, weights)  # shape 1x5x5
filtered_weights = af.moddims(filtered_weights, (5, 5))  # reshape to 2d 5x5

# Print the filtered weights array
print("Filtered weights:")
print(filtered_weights)

Advanced Vectorization

We have seen the different methods ArrayFire provides to vectorize our code. Tying them all together is a slightly more involved process that needs to consider data dimensionality and layout, memory usage, nesting order, etc. An excellent example and discussion of these factors can be found on our blog: How to write vectorized code (Ignore GFOR section as it is not applicable to Python).