Interoperability with OpenCL

Although ArrayFire is quite extensive, there remain many cases in which you may want to write custom kernels in CUDA or OpenCL. For example, you may wish to add ArrayFire to an existing code base to increase your productivity, or you may need to supplement ArrayFire's functionality with your own custom implementation of specific algorithms.

ArrayFire manages its own context, queue, memory, and creates custom IDs for devices. As such, most of the interoperability functions focus on reducing potential synchronization conflicts between ArrayFire and OpenCL.

Basics

It is fairly straightforward to interface ArrayFire with your own custom code. ArrayFire provides several functions to ease this process including:

FunctionPurpose
Array::new_from_device_ptrConstruct an ArrayFire Array from cl_mem
Array::device_ptrObtain a pointer to the device memory (implies lock)
Array::lockRemoves ArrayFire's control of a device memory pointer
Array::unlockRestores ArrayFire's control over a device memory pointer
get_platformGet ArrayFire's current cl_platform
get_deviceGets the current ArrayFire device ID
get_device_idGet ArrayFire's current cl_device_id
set_device_idSet ArrayFire's device from a cl_device_id
set_deviceSwitches ArrayFire to the specified device
get_contextGet ArrayFire's current cl_context
get_queueGet ArrayFire's current cl_command_queue
get_device_typeGet the current DeviceType

Note that the pointer returned by Array::device_ptr should be cast to cl_mem before using it with OpenCL opaque types. The pointer is a cl_mem internally that is force casted to pointer type by ArrayFire before returning the value to caller.

Additionally, the OpenCL backend permits the programmer to add and remove custom devices from the ArrayFire device manager. These permit you to attach ArrayFire directly to the OpenCL queue used by other portions of your application.

FunctionPurpose
add_device_contextAdd a new device to ArrayFire's device manager
set_device_contextSet ArrayFire's device from cl_device_id & cl_context
delete_device_contextRemove a device from ArrayFire's device manager

Below we provide two worked examples on how ArrayFire can be integrated into new and existing projects.

Adding custom OpenCL kernels to an existing ArrayFire application

By default, ArrayFire manages its own context, queue, memory, and creates custom IDs for devices. Thus there is some bookkeeping that needs to be done to integrate your custom OpenCL kernel.

If your kernels can share operate in the same queue as ArrayFire, you should:

  1. Obtain the OpenCL context, device, and queue used by ArrayFire
  2. Obtain cl_mem references to Array objects
  3. Load, build, and use your kernels
  4. Return control of Array memory to ArrayFire

Note, ArrayFire uses an in-order queue, thus when ArrayFire and your kernels are operating in the same queue, there is no need to perform any synchronization operations.

This process is best illustrated with a fully worked example:

//! A trivial example. Copied from ocl-core crate repository.
use af_opencl_interop as afcl;
use arrayfire as af;

use ocl_core::{ArgVal, Event};

use std::ffi::CString;

fn main() {
    af::info();
    let dims = af::dim4!(8);
    let af_buffer = af::constant(0f32, dims.clone());
    af::af_print!("af_buffer", af_buffer);

    let src = r#"
        __kernel void add(__global float* buffer, float scalar) {
            buffer[get_global_id(0)] += scalar;
        }
    "#;

    let af_did = afcl::get_device_id();
    let af_ctx = afcl::get_context(false);
    let af_que = afcl::get_queue(false);

    let _devid = unsafe { ocl_core::DeviceId::from_raw(af_did) };
    let contx = unsafe { ocl_core::Context::from_raw_copied_ptr(af_ctx) };
    let queue = unsafe { ocl_core::CommandQueue::from_raw_copied_ptr(af_que) };

    // Define which platform and device(s) to use. Create a context,
    // queue, and program then define some dims..
    let src_cstring = CString::new(src).unwrap();
    let program = ocl_core::create_program_with_source(&contx, &[src_cstring]).unwrap();
    ocl_core::build_program(
        &program,
        None::<&[()]>,
        &CString::new("").unwrap(),
        None,
        None,
    )
    .unwrap();

    // Fetch cl_mem from ArrayFire Array
    let ptr = unsafe { af_buffer.device_ptr() };
    let buffer = unsafe { ocl_core::Mem::from_raw_copied_ptr(ptr) };

    // Create a kernel with arguments matching those in the source above:
    let kernel = ocl_core::create_kernel(&program, "add").unwrap();
    ocl_core::set_kernel_arg(&kernel, 0, ArgVal::mem(&buffer)).unwrap();
    ocl_core::set_kernel_arg(&kernel, 1, ArgVal::scalar(&10.0f32)).unwrap();

    let ocl_dims: [usize; 3] = [dims[0] as usize, dims[1] as usize, dims[2] as usize];
    unsafe {
        ocl_core::enqueue_kernel(
            &queue,
            &kernel,
            1,
            None,
            &ocl_dims,
            None,
            None::<Event>,
            None::<&mut Event>,
        )
        .unwrap();
    }
    ocl_core::finish(&queue).unwrap();
    af_buffer.unlock(); //Give back control of cl_mem to ArrayFire memory manager

    af::af_print!("af_buffer after running Custom Kernel on it", af_buffer);
}

If your kernels needs to operate in their own OpenCL queue, the process is essentially identical, except you need to instruct ArrayFire to complete its computations using the sync function prior to launching your own kernel and ensure your kernels are complete using clFinish (or similar) commands prior to returning control of the memory to ArrayFire:

  1. Obtain the OpenCL context, device, and queue used by ArrayFire
  2. Obtain cl_mem references to Array objects
  3. Instruct ArrayFire to finish operations using sync
  4. Load, build, and use your kernels
  5. Instruct OpenCL to finish operations using clFinish() or similar commands.
  6. Return control of Array memory to ArrayFire

Adding ArrayFire to an existing OpenCL application

Adding ArrayFire to an existing application is slightly more involved and can be somewhat tricky due to several optimizations we implement. The most important are as follows:

  • ArrayFire assumes control of all memory provided to it.
  • ArrayFire does not (in general) support in-place memory transactions.

We will discuss the implications of these items below. To add ArrayFire to existing code you need to:

  1. Instruct OpenCL to complete its operations using clFinish (or similar)
  2. Instruct ArrayFire to use the user-created OpenCL Context
  3. Create ArrayFire arrays from OpenCL memory objects
  4. Perform ArrayFire operations on the Arrays
  5. Instruct ArrayFire to finish operations using sync
  6. Obtain cl_mem references for important memory
  7. Continue your OpenCL application

ArrayFire's memory manager automatically assumes responsibility for any memory provided to it. If you are creating an array from another RAII style object, you should retain it to ensure your memory is not deallocated if your RAII object were to go out of scope.

If you do not wish for ArrayFire to manage your memory, you may call the Array::unlock function and manage the memory yourself; however, if you do so, please be cautious not to call clReleaseMemObj on a cl_mem when ArrayFire might be using it!

Given below is a fully working example:

//! A trivial example. Copied from ocl-core crate repository.
use af_opencl_interop as afcl;
use arrayfire as af;

use ocl_core::{ContextProperties, Event};

fn main() {
    // Choose platform & device(s) to use. Create a context, queue,
    let platform_id = ocl_core::default_platform().unwrap();
    let device_ids = ocl_core::get_device_ids(&platform_id, None, None).unwrap();
    let device_id = device_ids[0];
    let context_properties = ContextProperties::new().platform(platform_id);
    let context =
        ocl_core::create_context(Some(&context_properties), &[device_id], None, None).unwrap();
    let queue = ocl_core::create_command_queue(&context, &device_id, None).unwrap();
    let dims = [8, 1, 1];

    // Create a `Buffer`:
    let mut vec = vec![0.0f32; dims[0]];
    let buffer = unsafe {
        ocl_core::create_buffer(
            &context,
            ocl_core::MEM_READ_WRITE | ocl_core::MEM_COPY_HOST_PTR,
            dims[0],
            Some(&vec),
        )
        .unwrap()
    };
    ocl_core::finish(&queue).unwrap(); //sync up before switching to arrayfire

    // Add custom device, context and associated queue to ArrayFire
    afcl::add_device_context(device_id.as_raw(), context.as_ptr(), queue.as_ptr());
    afcl::set_device_context(device_id.as_raw(), context.as_ptr());
    af::info();

    let mut af_buffer = af::Array::new_from_device_ptr(
        buffer.as_ptr() as *mut f32,
        af::Dim4::new(&[dims[0] as u64, 1, 1, 1]),
    );

    af::af_print!("GPU Buffer before modification:", af_buffer);

    af_buffer = af_buffer + 10f32;

    af::sync(af::get_device());
    unsafe {
        let ptr = af_buffer.device_ptr();
        let obuf = ocl_core::Mem::from_raw_copied_ptr(ptr);

        // Read results from the device into a vector:
        ocl_core::enqueue_read_buffer(
            &queue,
            &obuf,
            true,
            0,
            &mut vec,
            None::<Event>,
            None::<&mut Event>,
        )
        .unwrap();
    }
    println!("GPU buffer on host after ArrayFire operation: {:?}", vec);

    // Remove device from ArrayFire management towards Application Exit
    af::set_device(0); // Cannot pop when in Use, hence switch to another device
    afcl::delete_device_context(device_id.as_raw(), context.as_ptr());
}