Although ArrayFire is quite extensive, there remain many cases in which you may want to write custom kernels in CUDA or OpenCL. For example, you may wish to add ArrayFire to an existing code base to increase your productivity, or you may need to supplement ArrayFire's functionality with your own custom implementation of specific algorithms.
ArrayFire manages its own memory, runs within its own CUDA stream, and creates custom IDs for devices. As such, most of the interoperability functions focus on reducing potential synchronization conflicts between ArrayFire and CUDA.
It is fairly straightforward to interface ArrayFire with your own custom CUDA code. ArrayFire provides several functions to ease this process including:
Function | Purpose |
---|---|
af::array(...) | Construct an ArrayFire Array from device memory |
af::array.device() | Obtain a pointer to the device memory (implies lock() ) |
af::array.lock() | Removes ArrayFire's control of a device memory pointer |
af::array.unlock() | Restores ArrayFire's control over a device memory pointer |
af::getDevice() | Gets the current ArrayFire device ID |
af::setDevice() | Switches ArrayFire to the specified device |
afcu::getNativeId() | Converts an ArrayFire device ID to a CUDA device ID |
afcu::setNativeId() | Switches ArrayFire to the specified CUDA device ID |
afcu::getStream() | Get the current CUDA stream used by ArrayFire |
Below we provide two worked examples on how ArrayFire can be integrated into new and existing projects.
By default, ArrayFire manages its own memory and operates in its own CUDA stream. Thus there is a slight amount of bookkeeping that needs to be done in order to integrate your custom CUDA kernel.
If your kernels can share the ArrayFire CUDA stream, you should:
af::eval()
array::device()
nvcc
, linking with the afcuda
library.Notice that since ArrayFire and your kernels are sharing the same CUDA stream, there is no need to perform any synchronization operations as operations within a stream are executed in order.
This process is best illustrated with a fully worked example:
If your kernels needs to operate in their own CUDA stream, the process is essentially identical, except you need to instruct ArrayFire to complete its computations using the af::sync() function prior to launching your own kernel and ensure your kernels are complete using cudaDeviceSynchronize()
(or similar) commands prior to returning control of the memory to ArrayFire:
af::eval()
cudaDeviceSyncronize()
or similar commands.nvcc
, linking with the afcuda
library.Adding ArrayFire to an existing CUDA application is slightly more involved and can be somewhat tricky due to several optimizations we implement. The most important are as follows:
We will discuss the implications of these items below. To add ArrayFire to existing code you need to:
arrayfire.h
and af/cuda.h
in your source file-lafcuda
flags.To create the af::array objects, you should use one of the following constructors with src=afDevice
:
NOTE: With all of these constructors, ArrayFire's memory manager automatically assumes responsibility for any memory provided to it. Thus ArrayFire could free or reuse the memory at any later time. If this behavior is not desired, you may call array::unlock()
and manage the memory yourself. However, if you do so, please be cautious not to free memory when ArrayFire might be using it!
The seven steps above are best illustrated using a fully-worked example:
If you are using multiple devices with ArrayFire and CUDA kernels, there is one "gotcha" of which you should be aware. ArrayFire implements its own internal order of compute devices, thus a CUDA device ID may not be the same as an ArrayFire device ID. Thus when switching between devices it is important that you use our interoperability functions to get/set the correct device IDs. Below is a quick listing of the various functions needed to switch between devices along with some disambiguation as to the device identifiers used with each function:
Function | ID Type | Purpose |
---|---|---|
cudaGetDevice() | CUDA | Gets the current CUDA device ID |
cudaSetDevice() | CUDA | Sets the current CUDA device |
af::getDevice() | AF | Gets the current ArrayFire device ID |
af::setDevice() | AF | Sets the current ArrayFire device |
afcu::getNativeId() | AF -> CUDA | Convert an ArrayFire device ID to a CUDA device ID |
afcu::setNativeId() | CUDA -> AF | Set the current ArrayFire device from a CUDA ID |