Crates.io docs.rs GitHub

Blaze

Note
Blaze is still very much unstable and in early development. At this stage, methods and traits may be changed without deprecation period, backwards compatibility isn't guaranteed, and unexpected bugs may arise.

Blaze is a Rust crate that has the goal of rustifying and simplifying as much as possible the OpenCL experience.

This goal is achieved with the introduction of various abstractions, most of them with syntax simillar to a Rust counterpart.

Why Blaze?

OpenCL is an open-source cross-platform API for high-performance parallel computing, enabling the use of GPU's and other hardware accelerators to perform highly parallel computations.

Historically, this API has been used in C/C++ code, which has lead to the usual C and C++ problems (segfaults, memory leaks, etc).

Blaze fixes this by wrapping the OpenCL API in a Rust-friendly interface, and providing a simpler and safer, yet equally powerful, way to use it.

Design philosphy

The design philosophy of Blaze is based in the following principles.

Rustified experience

The main goal of Blaze is to provide a Rustified experience of OpenCL. This is achieved by wrapping the OpenCL API inside Rust types that offer safety guarantees provided by the Rust compiler, such as automatic release of memory through the Drop trait, and thread safety via Send and Sync.

Another major part of the Rust experience is the use of zero-cost abstractions, which also means that Blaze is written in such a way that the Rust compiler can maximally optimize the resulting code, such with the use of NonNull pointers.

Openness

Another major goal of Blaze is to be as open as possible. This translates into a series of rules that all developers must follow when contributing to the Blaze project.

Trust the user, with caution

In my opinion, one of the main reasons programming languges end up being cluttered messes is a lack of trust in the user.

To clarify, user here refers to the developers using our library.

A great example of this is Java, where getters and setters galore. This level of distrust amongst develepers is already too ingrained in the Java community to do anything about it, but it doesn't have to be this way with Rust.

No sealed traits

One of the most infuriating experiences I've had as a Rust developer is dealing with sealed traits. Sealed traits are acompliched with the following technique.

#![allow(unused)]
fn main() {
mod sealed {
    pub trait Sealed {}
}

pub trait SelaedTrait: sealed::Sealed {}
}

This way, the Sealed trait is public, but cannot be accessed outside of the crate, making SealedTrait only implementable inside the crate.

Whilst this may seem like a good idea in some ocations (for example, Blaze's SvmPointer trait), it shows a great level of distrust towards the user by the developer, and clashes with the open-source philosophy of this project, and the Rust project in general.

If a contributor finds itself in a position where a sealed trait could make sense, it must abstain from implementing it. Instead, an unsafe trait should be implemented, with a 'Safety' section in it's documentation detailing when it can be implemented safely.

Let's take the example of the SvmPointer trait. Whilst it might seem like a good idea at first, sealing it would make it impossible for downstream crates and programs to implement this trait for their own custom data type that may utilize SVM pointers (for example, a reference counted SVM pointer).

Sealed trait example 👎

#![allow(unused)]
fn main() {
mod sealed {
    pub trait Sealed {}
}

pub trait Primitive: sealed::Sealed {}

impl sealed::Sealed for i32 {}
impl Primitive for i32 {}
}

Unsafe trait example 👍

#![allow(unused)]
fn main() {
/// # Safety
/// This trait must only be implemented on primitive types
pub unsafe trait Primitive {}

unsafe impl Primitive for i32 {}
}

Raw types

In Blaze, the raw types are used to represent an underlying OpenCL objects plainly. They offer no extra functionality and have the following signature.

#[repr(transparent)]
pub struct RawType (NonNull<c_void>);

Raw types implement Drop and, since they're reference counted by OpenCL itself, they also implement Clone.

Currently, this are the existing raw types:

NameOpenCL typeOpenCL Version
RawPlatformcl_platform_idAll
RawDevicecl_device_idAll
RawContextcl_contextAll
RawCommandQueuecl_command_queueAll
RawProgramcl_programAll
RawKernelcl_kernelAll
RawEventcl_eventAll
RawMemObjectcl_memAll
RawBuffercl_memAll
RawPipecl_mem2.0 or higher

Note that since raw types are transparent wrappers of NonNull<c_void>, RawType and Option<RawType> have the same size, alongside other optimizations of NonNull performed by the compiler.

Errors

Blaze provides three types to describe errors: ErrorKind, ErrorCode and Error.
ErrorType is an enum that maps to the OpenCL error codes, whilst Error also contains an optional description and (in debug mode) a backtrace, and has athe following signature:

pub struct Error {
    pub ty: ErrorCode,
    pub desc: Option<Arc<dyn Display>>,
    #[cfg(debug_assertions)]
    pub backtrace: Arc<Backtrace>
}

The Backtrace is provided in debug mode to facilitate the debugging of errors, allowing to find their source more quickly.

ErrorKind and ErrorCode

Most raw OpenCL errors will be converted to ErrorKind automatically, but there are instances where the given error code is not recognized as an ErrorKind. For such cases, ErrorCode exists.

pub enum ErrorCode {
    Kind (ErrorKind),
    Unknown (i32)
}

Contexts

A Blaze context is the owner of a single OpenCL context and one or more OpenCL command queues, all of them associated to the context.

It's task is to manage the distribution of command queues amongst the various enqueue functions, maximizing performance by distributing the work amongst them.

The simplified signature of the Context trait is the following:

pub trait Context {
    fn as_raw (&self) -> &RawContext;
    fn queues (&self) -> &[RawCommandQueue];
    fn next_queue (&self) -> &RawCommandQueue;
}

The queues method returns a list with all the command queues owned by the context.
The next_queue method returns the next command queue to be used in an enqueue function.

Simple Context

The SimpleContext type is the most basic implementation of a context. It contains a single command queue, so no complicated logic is required for it's use, since next_queue will always return the same queue.

Global Context

Inspired by Rust's Allocator syntax, Global is a ZST that will be treated as the default context for most operations requiring of an OpenCL context or command queue.

Like with the Allocator API, you can specify a global context with the #[global_context] macro.

use blaze_rs::prelude::*;

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

#[test]
fn with_global () -> Result<()> {
    // Initialize two buffers
    let buffer : Buffer<i32> = Buffer::new(&[1, 2, 3, 4, 5], MemAccess::READ_ONLY, false)?;
    let buffer2 : Buffer<i32> = Buffer::new(&[5, 4, 3, 2, 1], MemAccess::WRITE_ONLY, false)?;

    // Read the full contents of both buffers
    let read = buffer.read_blocking(.., None)?;
    let read2 = buffer2.read_blocking(.., Some(core::slice::from_ref(&read)))?;

    assert_eq!(read.as_slice(), &[5, 4, 3, 2, 1]);
    assert_eq!(read2.as_slice(), &[1, 2, 3, 4, 5]);
    Ok(())
}

#[test]
fn without_global () -> Result<()> {
    // Initialize a context.
    let ctx = SimpleContext::default()?;
    
    // Initialize two buffers
    let buffer : Buffer<i32, &SimpleContext> = Buffer::new_in(&ctx, &[1, 2, 3, 4, 5], MemAccess::READ_ONLY, false)?;
    let buffer2 : Buffer<i32, &SimpleContext> = Buffer::new_in(&ctx, &[5, 4, 3, 2, 1], MemAccess::WRITE_ONLY, false)?;

    // Read the full contents of both buffers
    let read = buffer.read_blocking(.., None)?;
    let read2 = buffer2.read_blocking(.., Some(core::slice::from_ref(&read)))?;

    assert_eq!(read.as_slice(), &[5, 4, 3, 2, 1]);
    assert_eq!(read2.as_slice(), &[1, 2, 3, 4, 5]);
    Ok(())
}

Note that unlike with the Allocator API, no default global context is set, so you'll need to specify one explicitly if you want to use it.

Buffers

The Buffer type is a wrapper arround a RawBuffer that provides extra functionality and safety guarantees. It has the following signature:

pub struct Buffer<T: Copy, C: Context = Global> {
    inner: RawBuffer,
    ctx: C,
    phtm: PhantomData<T>
}

Example

use std::ptr::NonNull;
use blaze_rs::{prelude::*, context::SimpleContext, buffer::BufferRange};

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

fn with_buffer () -> Result<()> {
    let values = [1, 2, 3, 4, 5];
    let buffer = Buffer::new(&values, MemAccess::READ_ONLY, false)?;

    let read: Vec<i32> = buffer.read_blocking(.., None)?;
    assert_eq!(values.as_slice(), read.as_slice());

    Ok(())
}

fn without_buffer () -> Result<()> {
    let values = [1, 2, 3, 4, 5];

    let buffer = RawBuffer::new(
        values.len() * core::mem::size_of::<i32>(), 
        MemFlags::new(MemAccess::READ_ONLY, HostPtr::COPY), 
        NonNull::new(values.as_ptr() as *mut _)
    )?;
    
    let mut read = Vec::<i32>::with_capacity(values.len());
    unsafe {
        let evt : RawEvent = buffer.read_to_ptr(BufferRange::from_parts::<i32>(0, 5)?, read.as_mut_ptr().cast(), None)?;
        let _ : () = evt.join_by_ref()?;
        read.set_len(values.len());
    }

    assert_eq!(values.as_slice(), read.as_slice());
    Ok(())
}

Rectangular Buffer

Note

  • Rectangular buffers require OpenCL 1.1 or higher.
  • Currently, only 2 dimensional rect buffers are implemented.

Rectangular buffers allow the use of buffers as 2D or 3D arrays, with the same safety guarantees provided by Buffer. They have the following signature:

pub struct BufferRect2D<T: Copy, C: Context = Global> {
    inner: Buffer<T, C>,
    width: NonZeroUsize,
    height: NonZeroUsize
}

Also, a host implementation of rectangular buffers exists, faciltating the use of rectangular buffers on the host.

pub struct Rect2D<T, A: Allocator = Global> {
    ptr: NonNull<T>,
    width: NonZeroUsize,
    height: NonZeroUsize,
    alloc: A
}

Note that the Global here refers to Rust's Global allocator.

Example

use blaze_rs::{prelude::*, context::SimpleContext};

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

#[test]
fn main () -> Result<()> {
    /*
        [
            1, 2, 3,
            4, 5, 6,
            7, 8, 9,
        ]
    */
    let buffer = BufferRect2D::new(&[1, 2, 3, 4, 5, 6, 7, 8, 9], 3, MemAccess::READ_ONLY, false)?;
    let evt = buffer.read((1.., 1..), EMPTY)?;
    let segment = evt.wait()?;

    /*
        [
            5, 6,
            8, 9
        ]
    */
    assert_eq!(segment.as_slice(), &[5, 6, 8, 9]);
    Ok(())
}

Mapping

OpenCL provides a feature on buffers (and other memory objects) named mapping. With this feature, a region of device memory is mapped to hast memory, where it can be more efficiently accessed. Blaze offers support for this feature through the use of the MapBufferGuard and MapBufferMutGuard, which act simillarly to a RwLockReadGuard and a RwLockWriteGuard.

Examples

use std::ops::Deref;
use blaze_rs::prelude::*;

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

fn main () -> Result<()> {
    let buffer = Buffer::new(&[1, 2, 3, 4, 5], MemAccess::default(), false)?;
    let map = buffer.map_blocking(.., None)?;

    assert_eq!(map.deref(), &[1, 2, 3, 4, 5]);
    Ok(())
}
use std::ops::Deref;
use blaze_rs::prelude::*;

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

fn main () -> Result<()> {
    let mut buffer = Buffer::new(&[1, 2, 3, 4, 5], MemAccess::default(), false)?;
    let map = buffer.map_blocking(.., None)?;
    let mut mut_map = buffer.map_mut_blocking(.., None)?; // compile error: cannot borrow `buffer` as mutable because it is also borrowed as immutable

    assert_eq!(map.deref(), &[1, 2, 3, 4, 5]);
    Ok(())
}

Note that when maping mutably, the OpenCL mapping is done as a read-write mapping, not a write-only map.

Slices

Note slices require OpenCL 1.1 or higher.

Buffer slices act the same way as Rust slices, allowing access to a specified region of a buffer.

Examples

use std::ops::Deref;
use blaze_rs::prelude::*;

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

fn main () -> Result<()> {
    let buffer = Buffer::new(&[1, 2, 3, 4, 5], MemAccess::default(), false)?;
    let slice = buffer.slice(..)?;

    assert_eq!(slice.deref(), &buffer);
    Ok(())
}
use std::ops::Deref;
use blaze_rs::prelude::*;

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

fn main () -> Result<()> {
    let mut buffer = Buffer::new(&[1, 2, 3, 4, 5], MemAccess::default(), false)?;
    let slice = buffer.slice(None)?;
    let mut mut_slice = buffer.slice_mut(None)?; // compile error: cannot borrow `buffer` as mutable because it is also borrowed as immutable

    assert_eq!(slice, slice_mut);
    Ok(())
}

Note that when maping mutably, the OpenCL mapping is done as a read-write mapping, not a write-only map.

Events

Blaze events are considered a mixture of a Rust Future and JoinHandle. Their signature is the following:

use std::sync::mpsc::Sender;

pub struct Event<C> {
    inner: RawEvent,
    consumer: C,
    #[cfg(not(feature = "cl1_1"))]
    send: Sender<EventCallback>,
    #[cfg(feature = "cl1_1")]
    send: PhantomData<Sender<()>>,
}

Blaze events contain their underlying RawEvent alongside a Consumer.

Consumers

An event's Consumer is the responsable to perform the necessary underlying operations when the event has completed, with the ability to return a value.

pub trait Consumer<'a>: 'a {
    type Output;
    
    unsafe fn consume (self) -> Result<Self::Output>;
}

Event Callbacks

OpenCL event callbacks are supported from OpenCL 1.1 onwards. In Blaze, when using OpenCL 1.0, every time you pass a new callback to an Event (with on_complete, for example) that callback will be sent to a diferent thread, which will execute it when appropiate.

Callback handling threads are spawned for every thread from which you send a callback. This means that if, for example, you call on_complete on 10 different threads, 10 new threads will be spawned to handle the callbacks spawned on each thread, but if you call on_complete two times on one thread and once in a differen thread, only 2 new threads will be spawned.

These new threads will complete execution whenever their recievers are disconnected and they have no more listeners to handle.

Asynchronous event

Events can be joined asynchronously with the join_async method and the EventWait type.

use std::time::Duration;
use blaze_rs::{event::{EventWait, consumer::Noop}, prelude::*};

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

#[tokio::main] async fn main () -> Result<()> {
let flag = FlagEvent::new()?;
let sub: EventWait<Noop> = flag
    .subscribe()
    .into_event()
    .join_async()?;

let handle = tokio::spawn(async move {
    tokio::time::sleep(Duration::from_secs(2)).await;
    flag.try_mark(None)?;
    return Ok();
});

tokio::try_join!(handle, sub)?;
Ok(()) }

Note that join_async requires the futures feature.

Flag events

Note Flag events require OpenCL 1.1 or higher

Flag events allow the creation of events that complete whenever the user marks them. Flag events are marked via the try_mark method, which returns true if the event was successfully marked, and false if the event was already marked.

use blaze_rs::prelude::*;

#[global_context]
static CTX : SimpleContext = SimpleContext::default();

let flag = FlagEvent::new()?;
assert_eq!(flag.status(), Ok(EventStatus::Submitted));
assert_eq!(flag.try_mark(None), Ok(true));
assert_eq!(flag.status(), Ok(EventStatus::Complete));
assert_eq!(flag.try_mark(None), Ok(false));

Ok::<_, Error>()

Abortable event

Note Abortable events require OpenCL 1.1 or higher

Abortable events allow you to abort events before they're completed. When an abortable event is created, a new flag event is created, alongside a host-side flag.

If the event completes before it's aborted, the flag event is marked with it's result and the host-side flag is marked as not-aborted. The abortable event will return Ok(Some(_)) if it succedded, and Err(_) if it didn't.

If the event is aborted before completion, the flag event is marked without an error and the host-side flag is marked as aborted. The abortable event will return Ok(None).

use blaze_rs::{event::{AbortHandle, consumer::{Noop, AbortableEvent}}, prelude::*};

#[global_context]
static CTX : SimpleContext = SimpleContext::default();

#[test]
fn aborted () -> Result<()> {
    let flag = FlagEvent::new()?;
    let (event, handle): (AbortableEvent<Noop>, AbortHandle) = flag
        .subscribe()
        .into_event()
        .abortable()?;
    
    assert_eq!(handle.try_abort(), Ok(true)); // Abort the event
    assert_eq!(event.join(), Ok(None));
    Ok(())
}

#[test]
fn not_aborted () -> Result<()> {
    let flag = FlagEvent::new()?;
    let (event, handle): (AbortableEvent<Noop>, AbortHandle) = flag
        .subscribe()
        .into_event()
        .abortable()?;
    
    assert_eq!(flag.try_mark(None), Ok(true)); // Complete the event
    assert_eq!(event.join(), Ok(Some(())));
    assert_eq!(handle.try_abort(), Ok(false));
    Ok(())
}

Scopes

Inspired by Rust's scopes, Blaze scopes allow you to use events with non-'static lifetimes.

use blaze_rs::{buffer, prelude::*};

#[global_context]
static CTX : SimpleContext = SimpleContext::default();

let buffer: Buffer<i32> = buffer![1, 2, 3, 4, 5]?;

let [left, right]: [Vec<i32>; 2] = scope(|s| {
    let left = buffer.read(s, ..2, None)?;
    let right = buffer.read(s, 2.., None)?;
    return Event::join_all_sized_blocking([left, right]);
})?;

assert_eq!(left.as_slice(), &[1, 2]);
assert_eq!(right.as_slice(), &[3, 4, 5]);
Ok::<_, Error>(())

Asynchronous scopes

With the scope_async macro, you can create asynchronous scopes. These async scopes return a Future that completes when all the events spawned inside the scope have completed.

use blaze_rs::{buffer, scope_async, prelude::*};

#[tokio::main] async fn main () -> Result<()> {
let buffer = buffer![1, 2, 3, 4, 5]?;

let (left, right) = scope_async!(|s| async {
    let left = buffer.read(s, ..2, None)?.join_async()?;
    let right = buffer.read(s, ..2, None)?.join_async()?;
    return tokio::try_join!(left, right);
}).await?;

assert_eq!(left, vec![1, 2]);
assert_eq!(right, vec![3, 4, 5]);
Ok::<_, Error>(()) }

Unlike it's blocking counterpart, scope_async does not ensure that all events inside the future will be ran. Rather, if the future is dropped before completion, it's destructor will block the current thread until every already-started event has completed, and discarting the remaining uninitialized events.

use blaze_rs::{buffer, scope_async};
use futures::{task::*, future::*};

#[tokio::main] async fn main () -> Result<()> {
let buffer = buffer![1, 2, 3, 4, 5]?;

let mut scope = Box::pin(scope_async!(|s| async {
    let left = buffer
        .read(s, ..2, None)?
        .inspect(|_| println!("Left done!"))
        .join_async()?
        .await;

    let right = buffer
        .read(s, ..2, None)?
        .inspect(|_| println!("Right done!"))
        .join_async()?
        .await;

    return Ok((left, right));
}));

let mut ctx = std::task::Context::from_waker(noop_waker_ref());
let _ = scope.poll_unpin(&mut ctx)?;
drop(scope); // prints "Left done!", doesn't print "Right done!"
Ok::<_, Error>(()) }

Program

To ease the safe use of OpenCL programs and kernels, Blaze provides the #[blaze] macro. The blaze macro will turn pseudo-normal Rust extern syntax into a struct that will hold a program and it's various kernels, providing a safe API to call the kernels.

Example

use blaze_rs::prelude::*;
use core::mem::*;

#[blaze(MatrixOps)]
#[link = include_str!("matrixops.cl")]
extern "C" {
    #[link_name = "mul"]
    fn matrix_mul (k: u32, lhs: *const f32, rhs: *const f32, out: *mut MaybeUninit<f32>);
}

Expands to

use blaze_rs::prelude::*;
use core::mem::*;

struct MatrixOps<C: ::blaze::context::Context = ::blaze::context::Global> {
    __blaze_inner__: ::blaze::core::RawProgram,
    __blaze_ctx__: C,
    matrix_mul: ::std::sync::Mutex<::blaze::core::RawKernel>,
}

impl MatrixOps<::blaze::context::Global> {
    #[inline(always)]
    fn new<'a>(options: impl Into<Option<&'a str>>) -> ::blaze::core::Result<Self> {
        Self::new_in(::blaze::context::Global, options)
    }
}

impl<C: ::blaze::context::Context> MatrixOps<C> {
    fn new_in<'a>(ctx: C, options: impl Into<Option<&'a str>>) -> ::blaze::core::Result<Self> {
        let __blaze_ctx__ = ctx;
        let (__blaze_inner__, __blaze_kernels__) =
            ::blaze::core::RawProgram::from_source_in(&__blaze_ctx__, include_str!("matrixops.cl"), options)?;
        let mut matrix_mul = None;
        for __blaze_kernel__ in __blaze_kernels__.into_iter() {
            match __blaze_kernel__.name()?.as_str() {
                "mul" => matrix_mul = unsafe { Some(__blaze_kernel__.clone()) },
                __other => {
                    return Err(::blaze::core::Error::new(
                        ::blaze::core::ErrorType::InvalidKernel,
                        {
                            let res = ::alloc::fmt::format(::core::fmt::Arguments::new_v1(
                                &["unknown kernel \'", "\'"],
                                &[::core::fmt::ArgumentV1::new_display(&__other)],
                            ));
                            res
                        },
                    ))
                }
            }
        }
        let matrix_mul = match matrix_mul {
            Some(__x) => ::std::sync::Mutex::new(__x),
            None => {
                return Err(::blaze::core::Error::new(
                    ::blaze::core::ErrorType::InvalidKernel,
                    "kernel \'matrix_mul\' not found",
                ))
            }
        };
        Ok(Self {
            __blaze_inner__,
            __blaze_ctx__,
            matrix_mul,
        })
    }
}

impl<C: ::blaze::context::Context> ::std::ops::Deref for MatrixOps<C> {
    type Target = ::blaze::core::RawProgram;
    #[inline(always)]
    fn deref(&self) -> &Self::Target {
        &self.__blaze_inner__
    }
}

struct MatrixMul<LHS, RHS, OUT> {
    __blaze_inner__: ::blaze::event::RawEvent,
    lhs: LHS,
    rhs: RHS,
    out: OUT,
}

impl<C: ::blaze::context::Context> MatrixOps<C> {
    unsafe fn matrix_mul<
        LHS: ::core::ops::Deref,
        RHS: ::core::ops::Deref,
        OUT: ::core::ops::DerefMut,
        const N: usize,
    >(
        &self,
        k: u32,
        lhs: LHS,
        rhs: RHS,
        out: OUT,
        global_work_dims: [usize; N],
        local_work_dims: impl Into<Option<[usize; N]>>,
        wait: impl Into<::blaze::event::WaitList>,
    ) -> ::blaze::core::Result<MatrixMul<LHS, RHS, OUT>>
    where
        <LHS as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<f32>,
        <RHS as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<f32>,
        <OUT as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<MaybeUninit<f32>>,
    {
        let mut wait = wait.into();
        let mut __blaze_kernel__ = match self.matrix_mul.lock() {
            Ok(x) => x,
            Err(e) => e.into_inner()
        };
        __blaze_kernel__.set_argument(0u32, &k)?;
        ::blaze::buffer::KernelPointer::set_arg(
            ::core::ops::Deref::deref(&lhs),
            &mut __blaze_kernel__,
            &mut wait,
            1u32,
        )?;
        ::blaze::buffer::KernelPointer::set_arg(
            ::core::ops::Deref::deref(&rhs),
            &mut __blaze_kernel__,
            &mut wait,
            2u32,
        )?;
        ::blaze::buffer::KernelPointer::set_arg(
            ::core::ops::Deref::deref(&out),
            &mut __blaze_kernel__,
            &mut wait,
            3u32,
        )?;
        let __blaze_inner__ = __blaze_kernel__.enqueue_with_context(
            &self.__blaze_ctx__,
            global_work_dims,
            local_work_dims,
            wait,
        )?;
        drop(__blaze_kernel__);
        ::blaze::buffer::KernelPointer::complete(
            ::core::ops::Deref::deref(&lhs),
            &__blaze_inner__,
        )?;
        ::blaze::buffer::KernelPointer::complete(
            ::core::ops::Deref::deref(&rhs),
            &__blaze_inner__,
        )?;
        ::blaze::buffer::KernelPointer::complete(
            ::core::ops::Deref::deref(&out),
            &__blaze_inner__,
        )?;
        Ok(MatrixMul {
            __blaze_inner__,
            lhs,
            rhs,
            out,
        })
    }
}

impl<LHS: ::core::ops::Deref, RHS: ::core::ops::Deref, OUT: ::core::ops::DerefMut>
    ::blaze::event::Event for MatrixMul<LHS, RHS, OUT>
where
    <LHS as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<f32>,
    <RHS as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<f32>,
    <OUT as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<MaybeUninit<f32>>,
{
    type Output = (LHS, RHS, OUT);
    #[inline(always)]
    fn as_raw(&self) -> &::blaze::event::RawEvent {
        &self.__blaze_inner__
    }
    #[inline(always)]
    fn consume(
        self,
        err: Option<::blaze::prelude::Error>,
    ) -> ::blaze::prelude::Result<Self::Output> {
        if let Some(err) = err {
            return Err(err);
        };
        Ok((self.lhs, self.rhs, self.out))
    }
}

Shared Virtual Memory

All functionality related to SVM requires the svm features.

Blaze implements support for shared virtual memory (also known as SVM) through Rust's Allocator API.

SVM allows the host and device portions of an OpenCL application to seamlessly share pointers and complex pointer-containing data-structures. Moreover, as described in this article, SVM is more than just about shared address space. It also defines memory model consistency guarantees for SVM allocations. This enables the host and the kernel sides to interact with each other using atomics for synchronization, like two distinct cores in a CPU.

Intel has a great article about SVM here

Coarse v. Fine grained

OpenCL's SVM implementation consist's of two diferent variant: Coarse grained and fine grained memory. With coarse grained memory, the SVM user must indicate to OpenCL the synchronization points between host and device code, whilst with fine-grained memory, this process is done automatically by OpenCL.

But with Blaze, you don't have to think about this implementation details, since it will automatically detect the context's capabilities, and use fine-grained allocations whenever possible. In situiations where fine-grained allocations are unavailable, Blaze will indicate the synchronization points to OpenCL automatically whenever a pointer is passed as a kernel argument.

Note that coarse-grained synchronization points are only set automatically for kernels generated with the #[blaze] macro

Atomics

Note that SVM atomic are only available when the context devices support them, and can onloy be used in fine grained allocations.

OpenCL supports the use of atomics through SVM pointers. However, the following would not result in a correct SVM atomic implementation.

use blaze_rs::{prelude::*, svm::*};
use std::sync::atomic::AtomicU32;

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

#[blaze(BadAtomic)]
#[link = ...]
extern "C" {
    fn use_atomic (value: *const AtomicU32);
}

fn main () -> Result<()> {
    let program = BadAtomic::new(None)?;
    let pointer = SvmBox::new_in(AtomicU32::default(), Svm::try_default()?);

    let result = unsafe {
        program.use_atomic(&pointer)
    };

    Ok(())
}

This implementation is incorrect because, when using atomics, OpenCL must be notified that the SVM allocation needs support for atomics. The correct implementation would be the following.

use blaze_rs::{prelude::*, svm::*};
use std::sync::atomic::AtomicU32;

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

#[blaze(GoodAtomic)]
#[link = ...]
extern "C" {
    fn use_atomic (value: *const AtomicU32);
}

fn main () -> Result<()> {
    let program = GoodAtomic::new(None)?;
    let pointer = SvmAtomicU32::new(&[0]);

    let result = unsafe {
        program.use_atomic(&pointer)
    };

    Ok(())
}

Currently supported atomics

NameOpenCL typeRust atomicOpenCL features
SvmAtomicI32atomic_intAtomicI32None
SvmAtomicU32atomic_uintAtomicU32None
SvmAtomicI64atomic_longAtomicI32cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics
SvmAtomicU64atomic_ulongAtomicU32cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics
SvmAtomicIsizeatomic_ptrdiff_tAtomicIsizecl_khr_int64_base_atomics and cl_khr_int64_extended_atomics (on 64-bit devices)
SvmAtomicUsizeatomic_size_tAtomicUsizecl_khr_int64_base_atomics and cl_khr_int64_extended_atomics (on 64-bit devices)

Features

OpenCL versions

When enabling an OpenCL feature, all previous OpenCL verion features are enabled automatically. There is no default OpenCL version, so you'll have to set one when importing Blaze.

FeatureOpenCL Version
cl1_11.1
cl1_21.2
cl22.0
cl2_12.1
cl2_22.2
cl33.0

Strict

When the strict feature is enabled, Blaze will not check for OpenCL support for the specified version at runtime, increasing perfomance. When disabled, Blaze will dynamically check the OpenCL version at runtime (when needed), and make adjustments to ensure the maximum compatiblity possible.

SVM

The svm feature enables all functionality related to shared virtual memory.

See Shared Virtual Memory

Futures

The futures feature currently only enables support for wait_async on Events. In the future, more functionality might be added to this feature.

Examples

Matrix multiplication

Rust code

use std::mem::MaybeUninit;
use blaze_rs::prelude::*;

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

#[blaze(MatrixOps)]
#[link = include_str!("matrixops.cl")]
extern "C" {
    #[link_name = "mul"]
    fn matrix_mul (k: u32, lhs: *const f32, rhs: *const f32, out: *mut MaybeUninit<f32>);
}

fn main () -> Result<()> {
    let ops = MatrixOps::new(None)?;

    let lhs = BufferRect2D::<f32>::new(&[1.,2.,4.,5.,7.,8.], 2, MemAccess::READ_ONLY, false)?; // 3 x 2
    let rhs = BufferRect2D::<f32>::new(&[1.,2.,3.,4.,5.,6.], 3, MemAccess::READ_ONLY, false)?; // 2 x 3
    let mut result = BufferRect2D::<f32>:new_uninit(3, 3, MemAccess::WRITE_ONLY, false)?; // 3 x 3

    let evt = unsafe { ops.matrix_mul(2, &lhs, &rhs, &mut result, [3, 3], None, WaitList::EMPTY)? };
    evt.wait()?;

    let result = unsafe { result.assume_init() };
    println!("{:?}", result);
    
    Ok(())
}

OpenCL C Code

kernel void mul (const uint k, __constant float* lhs, __constant float* rhs, __global float* out) {
    const uint width = get_global_size(1);
    const uint x = get_global_id(0);
    const uint y = get_global_id(1);

    float sum = 0.0f;
    for (uint i = 0; i < k; ++i) {
        sum = fma(lhs[y * k + i], rhs[i * width + x], sum);
    }

    out[y * width + x] = sum;
}

Random Number Generator

This code requires OpenCL 3.0 or higher.

Rust code

use std::{time::{SystemTime}, mem::MaybeUninit};
use once_cell::sync::Lazy;
use blaze_rs::prelude::*;

#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();

static CODE : Lazy<String> = Lazy::new(|| {
    let nanos = SystemTime::now().duration_since(SystemTime::UNIX_EPOCH).unwrap();
    format!("#define TIME {}l\n{}", nanos.as_nanos(), include_str!("rng.cl"))
});

#[blaze(Rng)]
#[link = Lazy::force(&CODE)]
pub extern "C" {
    fn next_ints (n: u32, out: *mut MaybeUninit<u32>);
}

#[test]
fn main () -> Result<()> {
    let rng = Rng::new(None)?;
    let mut random = Buffer::<u32>::new_uninit(5, MemAccess::WRITE_ONLY, false)?;
    
    let random = unsafe {
        let _ = rng.next_ints(5, &mut random, [5], None, EMPTY)?.wait()?;
        random.assume_init()  
    };

    println!("{random:?}");
    Ok(())
}

OpenCL C code

#define MUTPILIER 0x5DEECE66Dl
#define ADDEND 0xBl
#define MASK() ((1l << 48) - 1)

global atomic_ulong SEED = ATOMIC_VAR_INIT((8682522807148012l * 1181783497276652981L) ^ TIME);

kernel void next_ints (const uint n, global uint* out) {
    const uint ID = get_global_id(0);
    const uint SIZE = get_global_size(1);
    ulong oldseed, nextseed;

    for (uint i = ID; i < n; i += SIZE) {
        do {
            oldseed = atomic_load(&SEED);
            nextseed = (oldseed * MUTPILIER + ADDEND) & MASK();
        } while (!atomic_compare_exchange_strong(&SEED, &oldseed, nextseed));
        out[i] = nextseed >> 16;
    }
}

Contributions

If you wish to contribute to the Blaze project, here are a couple of areas in which help would be much apreciated. Don't shy away from contributing to the project in a way not mentioned here, this are mere examples.

Documentation

One of the most important contributions would bo those to the documentation. Since this Blaze started as a solo-project, some (perhaps much) of it's documentation isn't up to standard. Don't be afraid to make a pull request on the dev branch if you want to contribute to the documentation. Also feel free to contact a maintainer, or any other contributor, if you have any questions.

Bug fixes

Another major type of contribution is bug fixes. If you find a bug, please report it with a new issue on the GitHub repository.

New features

The Blaze project is always excited about the prospect of addding new features and improving existsing ones. If you have any ideas for new features, please feel free to create a new issue on the GitHub repository.

Warning
When cloning the Blaze repo, you might have to clone the docs submodule manually. To do so, run the following commands.

git clone https://github.com/Aandreba/blaze.git
git submodule update --recursive --init
make check

Policy

If you wish to contribute to the Blaze project, you mus follow this rules. Non-conformity with any of this policies can lead to a ban from the project.

No power structure

In the Blaze project, there is no king. This doesn't mean that the project doesn't have a leader (it does), rather that no opinion is above another only by merit of the person making it. Any opinion must be argumented, and cannot be taken at face value, regardless of the person making it. There is only a small exception to this rule. If the argument being made is about aesthetics, or two compiting arguments have proven to hold simillar value, the maintainers decision will prevail over any other.

Politeness

This policy is easy and simple. When addressing another contributor, you should do so politely. This means that any insult or mean-spirited comment towards another individual, or grup of individuals, will not be tolerated.

No politics

This is also a simple policy. No topics, other than those directly involving the project, should be discussed in issues or PRs. A violation of this policy will result in the perpetrators being immediatelly banned.