Interoperability with OpenCL
Although ArrayFire is quite extensive, there remain many cases in which you may want to write custom kernels in CUDA or OpenCL. For example, you may wish to add ArrayFire to an existing code base to increase your productivity, or you may need to supplement ArrayFire's functionality with your own custom implementation of specific algorithms.
ArrayFire manages its own context, queue, memory, and creates custom IDs for devices. As such, most of the interoperability functions focus on reducing potential synchronization conflicts between ArrayFire and OpenCL.
Basics
It is fairly straightforward to interface ArrayFire with your own custom code. ArrayFire provides several functions to ease this process including:
Function | Purpose |
---|---|
Array::new_from_device_ptr | Construct an ArrayFire Array from cl_mem |
Array::device_ptr | Obtain a pointer to the device memory (implies lock ) |
Array::lock | Removes ArrayFire's control of a device memory pointer |
Array::unlock | Restores ArrayFire's control over a device memory pointer |
get_platform | Get ArrayFire's current cl_platform |
get_device | Gets the current ArrayFire device ID |
get_device_id | Get ArrayFire's current cl_device_id |
set_device_id | Set ArrayFire's device from a cl_device_id |
set_device | Switches ArrayFire to the specified device |
get_context | Get ArrayFire's current cl_context |
get_queue | Get ArrayFire's current cl_command_queue |
get_device_type | Get the current DeviceType |
Note that the pointer returned by Array::device_ptr should be cast to cl_mem
before using
it with OpenCL opaque types. The pointer is a cl_mem
internally that is force casted to pointer
type by ArrayFire before returning the value to caller.
Additionally, the OpenCL backend permits the programmer to add and remove custom devices from the ArrayFire device manager. These permit you to attach ArrayFire directly to the OpenCL queue used by other portions of your application.
Function | Purpose |
---|---|
add_device_context | Add a new device to ArrayFire's device manager |
set_device_context | Set ArrayFire's device from cl_device_id & cl_context |
delete_device_context | Remove a device from ArrayFire's device manager |
Below we provide two worked examples on how ArrayFire can be integrated into new and existing projects.
Adding custom OpenCL kernels to an existing ArrayFire application
By default, ArrayFire manages its own context, queue, memory, and creates custom IDs for devices. Thus there is some bookkeeping that needs to be done to integrate your custom OpenCL kernel.
If your kernels can share operate in the same queue as ArrayFire, you should:
- Obtain the OpenCL context, device, and queue used by ArrayFire
- Obtain cl_mem references to Array objects
- Load, build, and use your kernels
- Return control of Array memory to ArrayFire
Note, ArrayFire uses an in-order queue, thus when ArrayFire and your kernels are operating in the same queue, there is no need to perform any synchronization operations.
This process is best illustrated with a fully worked example:
//! A trivial example. Copied from ocl-core crate repository. use af_opencl_interop as afcl; use arrayfire as af; use ocl_core::{ArgVal, Event}; use std::ffi::CString; fn main() { af::info(); let dims = af::dim4!(8); let af_buffer = af::constant(0f32, dims.clone()); af::af_print!("af_buffer", af_buffer); let src = r#" __kernel void add(__global float* buffer, float scalar) { buffer[get_global_id(0)] += scalar; } "#; let af_did = afcl::get_device_id(); let af_ctx = afcl::get_context(false); let af_que = afcl::get_queue(false); let _devid = unsafe { ocl_core::DeviceId::from_raw(af_did) }; let contx = unsafe { ocl_core::Context::from_raw_copied_ptr(af_ctx) }; let queue = unsafe { ocl_core::CommandQueue::from_raw_copied_ptr(af_que) }; // Define which platform and device(s) to use. Create a context, // queue, and program then define some dims.. let src_cstring = CString::new(src).unwrap(); let program = ocl_core::create_program_with_source(&contx, &[src_cstring]).unwrap(); ocl_core::build_program( &program, None::<&[()]>, &CString::new("").unwrap(), None, None, ) .unwrap(); // Fetch cl_mem from ArrayFire Array let ptr = unsafe { af_buffer.device_ptr() }; let buffer = unsafe { ocl_core::Mem::from_raw_copied_ptr(ptr) }; // Create a kernel with arguments matching those in the source above: let kernel = ocl_core::create_kernel(&program, "add").unwrap(); ocl_core::set_kernel_arg(&kernel, 0, ArgVal::mem(&buffer)).unwrap(); ocl_core::set_kernel_arg(&kernel, 1, ArgVal::scalar(&10.0f32)).unwrap(); let ocl_dims: [usize; 3] = [dims[0] as usize, dims[1] as usize, dims[2] as usize]; unsafe { ocl_core::enqueue_kernel( &queue, &kernel, 1, None, &ocl_dims, None, None::<Event>, None::<&mut Event>, ) .unwrap(); } ocl_core::finish(&queue).unwrap(); af_buffer.unlock(); //Give back control of cl_mem to ArrayFire memory manager af::af_print!("af_buffer after running Custom Kernel on it", af_buffer); }
If your kernels needs to operate in their own OpenCL queue, the process is essentially identical,
except you need to instruct ArrayFire to complete its computations using the sync function
prior to launching your own kernel and ensure your kernels are complete using clFinish
(or similar) commands prior to returning control of the memory to ArrayFire:
- Obtain the OpenCL context, device, and queue used by ArrayFire
- Obtain cl_mem references to Array objects
- Instruct ArrayFire to finish operations using sync
- Load, build, and use your kernels
- Instruct OpenCL to finish operations using clFinish() or similar commands.
- Return control of Array memory to ArrayFire
Adding ArrayFire to an existing OpenCL application
Adding ArrayFire to an existing application is slightly more involved and can be somewhat tricky due to several optimizations we implement. The most important are as follows:
- ArrayFire assumes control of all memory provided to it.
- ArrayFire does not (in general) support in-place memory transactions.
We will discuss the implications of these items below. To add ArrayFire to existing code you need to:
- Instruct OpenCL to complete its operations using clFinish (or similar)
- Instruct ArrayFire to use the user-created OpenCL Context
- Create ArrayFire arrays from OpenCL memory objects
- Perform ArrayFire operations on the Arrays
- Instruct ArrayFire to finish operations using sync
- Obtain cl_mem references for important memory
- Continue your OpenCL application
ArrayFire's memory manager automatically assumes responsibility for any memory provided to it. If you are creating an array from another RAII style object, you should retain it to ensure your memory is not deallocated if your RAII object were to go out of scope.
If you do not wish for ArrayFire to manage your memory, you may call the Array::unlock function and manage the memory yourself; however, if you do so, please be cautious not to call
clReleaseMemObj
on acl_mem
when ArrayFire might be using it!
Given below is a fully working example:
//! A trivial example. Copied from ocl-core crate repository. use af_opencl_interop as afcl; use arrayfire as af; use ocl_core::{ContextProperties, Event}; fn main() { // Choose platform & device(s) to use. Create a context, queue, let platform_id = ocl_core::default_platform().unwrap(); let device_ids = ocl_core::get_device_ids(&platform_id, None, None).unwrap(); let device_id = device_ids[0]; let context_properties = ContextProperties::new().platform(platform_id); let context = ocl_core::create_context(Some(&context_properties), &[device_id], None, None).unwrap(); let queue = ocl_core::create_command_queue(&context, &device_id, None).unwrap(); let dims = [8, 1, 1]; // Create a `Buffer`: let mut vec = vec![0.0f32; dims[0]]; let buffer = unsafe { ocl_core::create_buffer( &context, ocl_core::MEM_READ_WRITE | ocl_core::MEM_COPY_HOST_PTR, dims[0], Some(&vec), ) .unwrap() }; ocl_core::finish(&queue).unwrap(); //sync up before switching to arrayfire // Add custom device, context and associated queue to ArrayFire afcl::add_device_context(device_id.as_raw(), context.as_ptr(), queue.as_ptr()); afcl::set_device_context(device_id.as_raw(), context.as_ptr()); af::info(); let mut af_buffer = af::Array::new_from_device_ptr( buffer.as_ptr() as *mut f32, af::Dim4::new(&[dims[0] as u64, 1, 1, 1]), ); af::af_print!("GPU Buffer before modification:", af_buffer); af_buffer = af_buffer + 10f32; af::sync(af::get_device()); unsafe { let ptr = af_buffer.device_ptr(); let obuf = ocl_core::Mem::from_raw_copied_ptr(ptr); // Read results from the device into a vector: ocl_core::enqueue_read_buffer( &queue, &obuf, true, 0, &mut vec, None::<Event>, None::<&mut Event>, ) .unwrap(); } println!("GPU buffer on host after ArrayFire operation: {:?}", vec); // Remove device from ArrayFire management towards Application Exit af::set_device(0); // Cannot pop when in Use, hence switch to another device afcl::delete_device_context(device_id.as_raw(), context.as_ptr()); }