Blaze
Note
Blaze is still very much unstable and in early development. At this stage, methods and traits may be changed without deprecation period, backwards compatibility isn't guaranteed, and unexpected bugs may arise.
Blaze is a Rust crate that has the goal of rustifying and simplifying as much as possible the OpenCL experience.
This goal is achieved with the introduction of various abstractions, most of them with syntax simillar to a Rust counterpart.
Links
Why Blaze?
OpenCL is an open-source cross-platform API for high-performance parallel computing, enabling the use of GPU's and other hardware accelerators to perform highly parallel computations.
Historically, this API has been used in C/C++ code, which has lead to the usual C and C++ problems (segfaults, memory leaks, etc).
Blaze fixes this by wrapping the OpenCL API in a Rust-friendly interface, and providing a simpler and safer, yet equally powerful, way to use it.
Design philosphy
The design philosophy of Blaze is based in the following principles.
Rustified experience
The main goal of Blaze is to provide a Rustified experience of OpenCL.
This is achieved by wrapping the OpenCL API inside Rust types that offer safety guarantees provided by the Rust compiler, such as automatic release of memory through the Drop
trait, and thread safety via Send
and Sync
.
Another major part of the Rust experience is the use of zero-cost abstractions, which also means that Blaze is written in such a way that the Rust compiler can maximally optimize the resulting code, such with the use of NonNull
pointers.
Openness
Another major goal of Blaze is to be as open as possible. This translates into a series of rules that all developers must follow when contributing to the Blaze project.
Trust the user, with caution
In my opinion, one of the main reasons programming languges end up being cluttered messes is a lack of trust in the user.
To clarify, user here refers to the developers using our library.
A great example of this is Java, where getters and setters galore. This level of distrust amongst develepers is already too ingrained in the Java community to do anything about it, but it doesn't have to be this way with Rust.
No sealed traits
One of the most infuriating experiences I've had as a Rust developer is dealing with sealed traits. Sealed traits are acompliched with the following technique.
#![allow(unused)] fn main() { mod sealed { pub trait Sealed {} } pub trait SelaedTrait: sealed::Sealed {} }
This way, the Sealed
trait is public, but cannot be accessed outside of the crate, making SealedTrait
only implementable inside the crate.
Whilst this may seem like a good idea in some ocations (for example, Blaze's SvmPointer
trait), it shows a great level of distrust towards the user by the developer, and clashes with the open-source philosophy of this project, and the Rust project in general.
If a contributor finds itself in a position where a sealed trait could make sense, it must abstain from implementing it. Instead, an unsafe trait
should be implemented, with a 'Safety' section in it's documentation detailing when it can be implemented safely.
Let's take the example of the SvmPointer
trait. Whilst it might seem like a good idea at first, sealing it would make it impossible for downstream crates and programs to implement this trait for their own custom data type that may utilize SVM pointers (for example, a reference counted SVM pointer).
Sealed trait example 👎
#![allow(unused)] fn main() { mod sealed { pub trait Sealed {} } pub trait Primitive: sealed::Sealed {} impl sealed::Sealed for i32 {} impl Primitive for i32 {} }
Unsafe trait example 👍
#![allow(unused)] fn main() { /// # Safety /// This trait must only be implemented on primitive types pub unsafe trait Primitive {} unsafe impl Primitive for i32 {} }
Raw types
In Blaze, the raw types are used to represent an underlying OpenCL objects plainly. They offer no extra functionality and have the following signature.
#[repr(transparent)]
pub struct RawType (NonNull<c_void>);
Raw types implement Drop
and, since they're reference counted by OpenCL itself, they also implement Clone
.
Currently, this are the existing raw types:
Name | OpenCL type | OpenCL Version |
---|---|---|
RawPlatform | cl_platform_id | All |
RawDevice | cl_device_id | All |
RawContext | cl_context | All |
RawCommandQueue | cl_command_queue | All |
RawProgram | cl_program | All |
RawKernel | cl_kernel | All |
RawEvent | cl_event | All |
RawMemObject | cl_mem | All |
RawBuffer | cl_mem | All |
RawPipe | cl_mem | 2.0 or higher |
Note that since raw types are transparent wrappers of
NonNull<c_void>
,RawType
andOption<RawType>
have the same size, alongside other optimizations ofNonNull
performed by the compiler.
Errors
Blaze provides three types to describe errors: ErrorKind
, ErrorCode
and Error
.
ErrorType
is an enum that maps to the OpenCL error codes, whilst Error
also contains an optional description and (in debug mode) a backtrace, and has athe following signature:
pub struct Error {
pub ty: ErrorCode,
pub desc: Option<Arc<dyn Display>>,
#[cfg(debug_assertions)]
pub backtrace: Arc<Backtrace>
}
The Backtrace
is provided in debug mode to facilitate the debugging of errors, allowing to find their source more quickly.
ErrorKind and ErrorCode
Most raw OpenCL errors will be converted to ErrorKind
automatically, but there are instances where the given error code is not recognized as an ErrorKind
. For such cases, ErrorCode
exists.
pub enum ErrorCode {
Kind (ErrorKind),
Unknown (i32)
}
Contexts
A Blaze context is the owner of a single OpenCL context and one or more OpenCL command queues, all of them associated to the context.
It's task is to manage the distribution of command queues amongst the various enqueue functions, maximizing performance by distributing the work amongst them.
The simplified signature of the Context
trait is the following:
pub trait Context {
fn as_raw (&self) -> &RawContext;
fn queues (&self) -> &[RawCommandQueue];
fn next_queue (&self) -> &RawCommandQueue;
}
The queues
method returns a list with all the command queues owned by the context.
The next_queue
method returns the next command queue to be used in an enqueue function.
Simple Context
The SimpleContext
type is the most basic implementation of a context. It contains a single command queue, so no complicated logic is required for it's use, since next_queue
will always return the same queue.
Global Context
Inspired by Rust's Allocator syntax, Global
is a ZST that will be treated as the default context for most operations requiring of an OpenCL context or command queue.
Like with the Allocator API, you can specify a global context with the #[global_context]
macro.
use blaze_rs::prelude::*;
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
#[test]
fn with_global () -> Result<()> {
// Initialize two buffers
let buffer : Buffer<i32> = Buffer::new(&[1, 2, 3, 4, 5], MemAccess::READ_ONLY, false)?;
let buffer2 : Buffer<i32> = Buffer::new(&[5, 4, 3, 2, 1], MemAccess::WRITE_ONLY, false)?;
// Read the full contents of both buffers
let read = buffer.read_blocking(.., None)?;
let read2 = buffer2.read_blocking(.., Some(core::slice::from_ref(&read)))?;
assert_eq!(read.as_slice(), &[5, 4, 3, 2, 1]);
assert_eq!(read2.as_slice(), &[1, 2, 3, 4, 5]);
Ok(())
}
#[test]
fn without_global () -> Result<()> {
// Initialize a context.
let ctx = SimpleContext::default()?;
// Initialize two buffers
let buffer : Buffer<i32, &SimpleContext> = Buffer::new_in(&ctx, &[1, 2, 3, 4, 5], MemAccess::READ_ONLY, false)?;
let buffer2 : Buffer<i32, &SimpleContext> = Buffer::new_in(&ctx, &[5, 4, 3, 2, 1], MemAccess::WRITE_ONLY, false)?;
// Read the full contents of both buffers
let read = buffer.read_blocking(.., None)?;
let read2 = buffer2.read_blocking(.., Some(core::slice::from_ref(&read)))?;
assert_eq!(read.as_slice(), &[5, 4, 3, 2, 1]);
assert_eq!(read2.as_slice(), &[1, 2, 3, 4, 5]);
Ok(())
}
Note that unlike with the
Allocator
API, no default global context is set, so you'll need to specify one explicitly if you want to use it.
Buffers
The Buffer
type is a wrapper arround a RawBuffer
that provides extra functionality and safety guarantees. It has the following signature:
pub struct Buffer<T: Copy, C: Context = Global> {
inner: RawBuffer,
ctx: C,
phtm: PhantomData<T>
}
Example
use std::ptr::NonNull;
use blaze_rs::{prelude::*, context::SimpleContext, buffer::BufferRange};
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
fn with_buffer () -> Result<()> {
let values = [1, 2, 3, 4, 5];
let buffer = Buffer::new(&values, MemAccess::READ_ONLY, false)?;
let read: Vec<i32> = buffer.read_blocking(.., None)?;
assert_eq!(values.as_slice(), read.as_slice());
Ok(())
}
fn without_buffer () -> Result<()> {
let values = [1, 2, 3, 4, 5];
let buffer = RawBuffer::new(
values.len() * core::mem::size_of::<i32>(),
MemFlags::new(MemAccess::READ_ONLY, HostPtr::COPY),
NonNull::new(values.as_ptr() as *mut _)
)?;
let mut read = Vec::<i32>::with_capacity(values.len());
unsafe {
let evt : RawEvent = buffer.read_to_ptr(BufferRange::from_parts::<i32>(0, 5)?, read.as_mut_ptr().cast(), None)?;
let _ : () = evt.join_by_ref()?;
read.set_len(values.len());
}
assert_eq!(values.as_slice(), read.as_slice());
Ok(())
}
Rectangular Buffer
Note
- Rectangular buffers require OpenCL 1.1 or higher.
- Currently, only 2 dimensional rect buffers are implemented.
Rectangular buffers allow the use of buffers as 2D or 3D arrays, with the same safety guarantees provided by Buffer
. They have the following signature:
pub struct BufferRect2D<T: Copy, C: Context = Global> {
inner: Buffer<T, C>,
width: NonZeroUsize,
height: NonZeroUsize
}
Also, a host implementation of rectangular buffers exists, faciltating the use of rectangular buffers on the host.
pub struct Rect2D<T, A: Allocator = Global> {
ptr: NonNull<T>,
width: NonZeroUsize,
height: NonZeroUsize,
alloc: A
}
Note that the
Global
here refers to Rust'sGlobal
allocator.
Example
use blaze_rs::{prelude::*, context::SimpleContext};
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
#[test]
fn main () -> Result<()> {
/*
[
1, 2, 3,
4, 5, 6,
7, 8, 9,
]
*/
let buffer = BufferRect2D::new(&[1, 2, 3, 4, 5, 6, 7, 8, 9], 3, MemAccess::READ_ONLY, false)?;
let evt = buffer.read((1.., 1..), EMPTY)?;
let segment = evt.wait()?;
/*
[
5, 6,
8, 9
]
*/
assert_eq!(segment.as_slice(), &[5, 6, 8, 9]);
Ok(())
}
Mapping
OpenCL provides a feature on buffers (and other memory objects) named mapping. With this feature, a region of device memory is mapped to hast memory, where it can be more efficiently accessed.
Blaze offers support for this feature through the use of the MapBufferGuard
and MapBufferMutGuard
, which act simillarly to a RwLockReadGuard
and a RwLockWriteGuard
.
Examples
use std::ops::Deref;
use blaze_rs::prelude::*;
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
fn main () -> Result<()> {
let buffer = Buffer::new(&[1, 2, 3, 4, 5], MemAccess::default(), false)?;
let map = buffer.map_blocking(.., None)?;
assert_eq!(map.deref(), &[1, 2, 3, 4, 5]);
Ok(())
}
use std::ops::Deref;
use blaze_rs::prelude::*;
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
fn main () -> Result<()> {
let mut buffer = Buffer::new(&[1, 2, 3, 4, 5], MemAccess::default(), false)?;
let map = buffer.map_blocking(.., None)?;
let mut mut_map = buffer.map_mut_blocking(.., None)?; // compile error: cannot borrow `buffer` as mutable because it is also borrowed as immutable
assert_eq!(map.deref(), &[1, 2, 3, 4, 5]);
Ok(())
}
Note that when maping mutably, the OpenCL mapping is done as a read-write mapping, not a write-only map.
Slices
Note slices require OpenCL 1.1 or higher.
Buffer slices act the same way as Rust slices, allowing access to a specified region of a buffer.
Examples
use std::ops::Deref;
use blaze_rs::prelude::*;
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
fn main () -> Result<()> {
let buffer = Buffer::new(&[1, 2, 3, 4, 5], MemAccess::default(), false)?;
let slice = buffer.slice(..)?;
assert_eq!(slice.deref(), &buffer);
Ok(())
}
use std::ops::Deref;
use blaze_rs::prelude::*;
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
fn main () -> Result<()> {
let mut buffer = Buffer::new(&[1, 2, 3, 4, 5], MemAccess::default(), false)?;
let slice = buffer.slice(None)?;
let mut mut_slice = buffer.slice_mut(None)?; // compile error: cannot borrow `buffer` as mutable because it is also borrowed as immutable
assert_eq!(slice, slice_mut);
Ok(())
}
Note that when maping mutably, the OpenCL mapping is done as a read-write mapping, not a write-only map.
Events
Blaze events are considered a mixture of a Rust Future
and JoinHandle
. Their signature is the following:
use std::sync::mpsc::Sender;
pub struct Event<C> {
inner: RawEvent,
consumer: C,
#[cfg(not(feature = "cl1_1"))]
send: Sender<EventCallback>,
#[cfg(feature = "cl1_1")]
send: PhantomData<Sender<()>>,
}
Blaze events contain their underlying RawEvent
alongside a Consumer
.
Consumers
An event's Consumer
is the responsable to perform the necessary underlying operations when the event has completed, with the ability to return a value.
pub trait Consumer<'a>: 'a {
type Output;
unsafe fn consume (self) -> Result<Self::Output>;
}
Event Callbacks
OpenCL event callbacks are supported from OpenCL 1.1 onwards. In Blaze, when using OpenCL 1.0, every time you pass a new callback to an Event
(with on_complete
, for example) that callback will be sent to a diferent thread, which will execute it when appropiate.
Callback handling threads are spawned for every thread from which you send a callback. This means that if, for example, you call on_complete
on 10 different threads, 10 new threads will be spawned to handle the callbacks spawned on each thread, but if you call on_complete
two times on one thread and once in a differen thread, only 2 new threads will be spawned.
These new threads will complete execution whenever their recievers are disconnected and they have no more listeners to handle.
Asynchronous event
Events can be joined asynchronously with the join_async
method and the EventWait
type.
use std::time::Duration;
use blaze_rs::{event::{EventWait, consumer::Noop}, prelude::*};
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
#[tokio::main] async fn main () -> Result<()> {
let flag = FlagEvent::new()?;
let sub: EventWait<Noop> = flag
.subscribe()
.into_event()
.join_async()?;
let handle = tokio::spawn(async move {
tokio::time::sleep(Duration::from_secs(2)).await;
flag.try_mark(None)?;
return Ok();
});
tokio::try_join!(handle, sub)?;
Ok(()) }
Note that
join_async
requires thefutures
feature.
Flag events
Note Flag events require OpenCL 1.1 or higher
Flag events allow the creation of events that complete whenever the user marks them.
Flag events are marked via the try_mark
method, which returns true
if the event was successfully marked, and false
if the event was already marked.
use blaze_rs::prelude::*;
#[global_context]
static CTX : SimpleContext = SimpleContext::default();
let flag = FlagEvent::new()?;
assert_eq!(flag.status(), Ok(EventStatus::Submitted));
assert_eq!(flag.try_mark(None), Ok(true));
assert_eq!(flag.status(), Ok(EventStatus::Complete));
assert_eq!(flag.try_mark(None), Ok(false));
Ok::<_, Error>()
Abortable event
Note Abortable events require OpenCL 1.1 or higher
Abortable events allow you to abort events before they're completed. When an abortable event is created, a new flag event is created, alongside a host-side flag.
If the event completes before it's aborted, the flag event is marked with it's result and the host-side flag is marked as not-aborted. The abortable event will return Ok(Some(_))
if it succedded, and Err(_)
if it didn't.
If the event is aborted before completion, the flag event is marked without an error and the host-side flag is marked as aborted. The abortable event will return Ok(None)
.
use blaze_rs::{event::{AbortHandle, consumer::{Noop, AbortableEvent}}, prelude::*};
#[global_context]
static CTX : SimpleContext = SimpleContext::default();
#[test]
fn aborted () -> Result<()> {
let flag = FlagEvent::new()?;
let (event, handle): (AbortableEvent<Noop>, AbortHandle) = flag
.subscribe()
.into_event()
.abortable()?;
assert_eq!(handle.try_abort(), Ok(true)); // Abort the event
assert_eq!(event.join(), Ok(None));
Ok(())
}
#[test]
fn not_aborted () -> Result<()> {
let flag = FlagEvent::new()?;
let (event, handle): (AbortableEvent<Noop>, AbortHandle) = flag
.subscribe()
.into_event()
.abortable()?;
assert_eq!(flag.try_mark(None), Ok(true)); // Complete the event
assert_eq!(event.join(), Ok(Some(())));
assert_eq!(handle.try_abort(), Ok(false));
Ok(())
}
Scopes
Inspired by Rust's scopes, Blaze scopes allow you to use events with non-'static
lifetimes.
use blaze_rs::{buffer, prelude::*};
#[global_context]
static CTX : SimpleContext = SimpleContext::default();
let buffer: Buffer<i32> = buffer![1, 2, 3, 4, 5]?;
let [left, right]: [Vec<i32>; 2] = scope(|s| {
let left = buffer.read(s, ..2, None)?;
let right = buffer.read(s, 2.., None)?;
return Event::join_all_sized_blocking([left, right]);
})?;
assert_eq!(left.as_slice(), &[1, 2]);
assert_eq!(right.as_slice(), &[3, 4, 5]);
Ok::<_, Error>(())
Asynchronous scopes
With the scope_async
macro, you can create asynchronous scopes. These async scopes return a Future
that completes when all the events spawned inside the scope have completed.
use blaze_rs::{buffer, scope_async, prelude::*};
#[tokio::main] async fn main () -> Result<()> {
let buffer = buffer![1, 2, 3, 4, 5]?;
let (left, right) = scope_async!(|s| async {
let left = buffer.read(s, ..2, None)?.join_async()?;
let right = buffer.read(s, ..2, None)?.join_async()?;
return tokio::try_join!(left, right);
}).await?;
assert_eq!(left, vec![1, 2]);
assert_eq!(right, vec![3, 4, 5]);
Ok::<_, Error>(()) }
Unlike it's blocking counterpart, scope_async
does not ensure that all events inside the future will be ran. Rather, if the future is dropped before completion, it's destructor will block the current thread until every already-started event has completed, and discarting the remaining uninitialized events.
use blaze_rs::{buffer, scope_async};
use futures::{task::*, future::*};
#[tokio::main] async fn main () -> Result<()> {
let buffer = buffer![1, 2, 3, 4, 5]?;
let mut scope = Box::pin(scope_async!(|s| async {
let left = buffer
.read(s, ..2, None)?
.inspect(|_| println!("Left done!"))
.join_async()?
.await;
let right = buffer
.read(s, ..2, None)?
.inspect(|_| println!("Right done!"))
.join_async()?
.await;
return Ok((left, right));
}));
let mut ctx = std::task::Context::from_waker(noop_waker_ref());
let _ = scope.poll_unpin(&mut ctx)?;
drop(scope); // prints "Left done!", doesn't print "Right done!"
Ok::<_, Error>(()) }
Program
To ease the safe use of OpenCL programs and kernels, Blaze provides the #[blaze]
macro. The blaze macro will turn pseudo-normal Rust extern syntax into a struct that will hold a program and it's various kernels, providing a safe API to call the kernels.
Example
use blaze_rs::prelude::*;
use core::mem::*;
#[blaze(MatrixOps)]
#[link = include_str!("matrixops.cl")]
extern "C" {
#[link_name = "mul"]
fn matrix_mul (k: u32, lhs: *const f32, rhs: *const f32, out: *mut MaybeUninit<f32>);
}
Expands to
use blaze_rs::prelude::*;
use core::mem::*;
struct MatrixOps<C: ::blaze::context::Context = ::blaze::context::Global> {
__blaze_inner__: ::blaze::core::RawProgram,
__blaze_ctx__: C,
matrix_mul: ::std::sync::Mutex<::blaze::core::RawKernel>,
}
impl MatrixOps<::blaze::context::Global> {
#[inline(always)]
fn new<'a>(options: impl Into<Option<&'a str>>) -> ::blaze::core::Result<Self> {
Self::new_in(::blaze::context::Global, options)
}
}
impl<C: ::blaze::context::Context> MatrixOps<C> {
fn new_in<'a>(ctx: C, options: impl Into<Option<&'a str>>) -> ::blaze::core::Result<Self> {
let __blaze_ctx__ = ctx;
let (__blaze_inner__, __blaze_kernels__) =
::blaze::core::RawProgram::from_source_in(&__blaze_ctx__, include_str!("matrixops.cl"), options)?;
let mut matrix_mul = None;
for __blaze_kernel__ in __blaze_kernels__.into_iter() {
match __blaze_kernel__.name()?.as_str() {
"mul" => matrix_mul = unsafe { Some(__blaze_kernel__.clone()) },
__other => {
return Err(::blaze::core::Error::new(
::blaze::core::ErrorType::InvalidKernel,
{
let res = ::alloc::fmt::format(::core::fmt::Arguments::new_v1(
&["unknown kernel \'", "\'"],
&[::core::fmt::ArgumentV1::new_display(&__other)],
));
res
},
))
}
}
}
let matrix_mul = match matrix_mul {
Some(__x) => ::std::sync::Mutex::new(__x),
None => {
return Err(::blaze::core::Error::new(
::blaze::core::ErrorType::InvalidKernel,
"kernel \'matrix_mul\' not found",
))
}
};
Ok(Self {
__blaze_inner__,
__blaze_ctx__,
matrix_mul,
})
}
}
impl<C: ::blaze::context::Context> ::std::ops::Deref for MatrixOps<C> {
type Target = ::blaze::core::RawProgram;
#[inline(always)]
fn deref(&self) -> &Self::Target {
&self.__blaze_inner__
}
}
struct MatrixMul<LHS, RHS, OUT> {
__blaze_inner__: ::blaze::event::RawEvent,
lhs: LHS,
rhs: RHS,
out: OUT,
}
impl<C: ::blaze::context::Context> MatrixOps<C> {
unsafe fn matrix_mul<
LHS: ::core::ops::Deref,
RHS: ::core::ops::Deref,
OUT: ::core::ops::DerefMut,
const N: usize,
>(
&self,
k: u32,
lhs: LHS,
rhs: RHS,
out: OUT,
global_work_dims: [usize; N],
local_work_dims: impl Into<Option<[usize; N]>>,
wait: impl Into<::blaze::event::WaitList>,
) -> ::blaze::core::Result<MatrixMul<LHS, RHS, OUT>>
where
<LHS as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<f32>,
<RHS as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<f32>,
<OUT as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<MaybeUninit<f32>>,
{
let mut wait = wait.into();
let mut __blaze_kernel__ = match self.matrix_mul.lock() {
Ok(x) => x,
Err(e) => e.into_inner()
};
__blaze_kernel__.set_argument(0u32, &k)?;
::blaze::buffer::KernelPointer::set_arg(
::core::ops::Deref::deref(&lhs),
&mut __blaze_kernel__,
&mut wait,
1u32,
)?;
::blaze::buffer::KernelPointer::set_arg(
::core::ops::Deref::deref(&rhs),
&mut __blaze_kernel__,
&mut wait,
2u32,
)?;
::blaze::buffer::KernelPointer::set_arg(
::core::ops::Deref::deref(&out),
&mut __blaze_kernel__,
&mut wait,
3u32,
)?;
let __blaze_inner__ = __blaze_kernel__.enqueue_with_context(
&self.__blaze_ctx__,
global_work_dims,
local_work_dims,
wait,
)?;
drop(__blaze_kernel__);
::blaze::buffer::KernelPointer::complete(
::core::ops::Deref::deref(&lhs),
&__blaze_inner__,
)?;
::blaze::buffer::KernelPointer::complete(
::core::ops::Deref::deref(&rhs),
&__blaze_inner__,
)?;
::blaze::buffer::KernelPointer::complete(
::core::ops::Deref::deref(&out),
&__blaze_inner__,
)?;
Ok(MatrixMul {
__blaze_inner__,
lhs,
rhs,
out,
})
}
}
impl<LHS: ::core::ops::Deref, RHS: ::core::ops::Deref, OUT: ::core::ops::DerefMut>
::blaze::event::Event for MatrixMul<LHS, RHS, OUT>
where
<LHS as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<f32>,
<RHS as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<f32>,
<OUT as ::core::ops::Deref>::Target: ::blaze::buffer::KernelPointer<MaybeUninit<f32>>,
{
type Output = (LHS, RHS, OUT);
#[inline(always)]
fn as_raw(&self) -> &::blaze::event::RawEvent {
&self.__blaze_inner__
}
#[inline(always)]
fn consume(
self,
err: Option<::blaze::prelude::Error>,
) -> ::blaze::prelude::Result<Self::Output> {
if let Some(err) = err {
return Err(err);
};
Ok((self.lhs, self.rhs, self.out))
}
}
Shared Virtual Memory
All functionality related to SVM requires the
svm
features.
Blaze implements support for shared virtual memory (also known as SVM) through Rust's Allocator API.
SVM allows the host and device portions of an OpenCL application to seamlessly share pointers and complex pointer-containing data-structures. Moreover, as described in this article, SVM is more than just about shared address space. It also defines memory model consistency guarantees for SVM allocations. This enables the host and the kernel sides to interact with each other using atomics for synchronization, like two distinct cores in a CPU.
Intel has a great article about SVM here
Coarse v. Fine grained
OpenCL's SVM implementation consist's of two diferent variant: Coarse grained and fine grained memory. With coarse grained memory, the SVM user must indicate to OpenCL the synchronization points between host and device code, whilst with fine-grained memory, this process is done automatically by OpenCL.
But with Blaze, you don't have to think about this implementation details, since it will automatically detect the context's capabilities, and use fine-grained allocations whenever possible. In situiations where fine-grained allocations are unavailable, Blaze will indicate the synchronization points to OpenCL automatically whenever a pointer is passed as a kernel argument.
Note that coarse-grained synchronization points are only set automatically for kernels generated with the
#[blaze]
macro
Atomics
Note that SVM atomic are only available when the context devices support them, and can onloy be used in fine grained allocations.
OpenCL supports the use of atomics through SVM pointers. However, the following would not result in a correct SVM atomic implementation.
use blaze_rs::{prelude::*, svm::*};
use std::sync::atomic::AtomicU32;
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
#[blaze(BadAtomic)]
#[link = ...]
extern "C" {
fn use_atomic (value: *const AtomicU32);
}
fn main () -> Result<()> {
let program = BadAtomic::new(None)?;
let pointer = SvmBox::new_in(AtomicU32::default(), Svm::try_default()?);
let result = unsafe {
program.use_atomic(&pointer)
};
Ok(())
}
This implementation is incorrect because, when using atomics, OpenCL must be notified that the SVM allocation needs support for atomics. The correct implementation would be the following.
use blaze_rs::{prelude::*, svm::*};
use std::sync::atomic::AtomicU32;
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
#[blaze(GoodAtomic)]
#[link = ...]
extern "C" {
fn use_atomic (value: *const AtomicU32);
}
fn main () -> Result<()> {
let program = GoodAtomic::new(None)?;
let pointer = SvmAtomicU32::new(&[0]);
let result = unsafe {
program.use_atomic(&pointer)
};
Ok(())
}
Currently supported atomics
Name | OpenCL type | Rust atomic | OpenCL features |
---|---|---|---|
SvmAtomicI32 | atomic_int | AtomicI32 | None |
SvmAtomicU32 | atomic_uint | AtomicU32 | None |
SvmAtomicI64 | atomic_long | AtomicI32 | cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics |
SvmAtomicU64 | atomic_ulong | AtomicU32 | cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics |
SvmAtomicIsize | atomic_ptrdiff_t | AtomicIsize | cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics (on 64-bit devices) |
SvmAtomicUsize | atomic_size_t | AtomicUsize | cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics (on 64-bit devices) |
Features
OpenCL versions
When enabling an OpenCL feature, all previous OpenCL verion features are enabled automatically. There is no default OpenCL version, so you'll have to set one when importing Blaze.
Feature | OpenCL Version |
---|---|
cl1_1 | 1.1 |
cl1_2 | 1.2 |
cl2 | 2.0 |
cl2_1 | 2.1 |
cl2_2 | 2.2 |
cl3 | 3.0 |
Strict
When the strict
feature is enabled, Blaze will not check for OpenCL support for the specified version at runtime, increasing perfomance.
When disabled, Blaze will dynamically check the OpenCL version at runtime (when needed), and make adjustments to ensure the maximum compatiblity possible.
SVM
The svm
feature enables all functionality related to shared virtual memory.
Futures
The futures
feature currently only enables support for wait_async
on Event
s. In the future, more functionality might be added to this feature.
Examples
Matrix multiplication
Rust code
use std::mem::MaybeUninit;
use blaze_rs::prelude::*;
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
#[blaze(MatrixOps)]
#[link = include_str!("matrixops.cl")]
extern "C" {
#[link_name = "mul"]
fn matrix_mul (k: u32, lhs: *const f32, rhs: *const f32, out: *mut MaybeUninit<f32>);
}
fn main () -> Result<()> {
let ops = MatrixOps::new(None)?;
let lhs = BufferRect2D::<f32>::new(&[1.,2.,4.,5.,7.,8.], 2, MemAccess::READ_ONLY, false)?; // 3 x 2
let rhs = BufferRect2D::<f32>::new(&[1.,2.,3.,4.,5.,6.], 3, MemAccess::READ_ONLY, false)?; // 2 x 3
let mut result = BufferRect2D::<f32>:new_uninit(3, 3, MemAccess::WRITE_ONLY, false)?; // 3 x 3
let evt = unsafe { ops.matrix_mul(2, &lhs, &rhs, &mut result, [3, 3], None, WaitList::EMPTY)? };
evt.wait()?;
let result = unsafe { result.assume_init() };
println!("{:?}", result);
Ok(())
}
OpenCL C Code
kernel void mul (const uint k, __constant float* lhs, __constant float* rhs, __global float* out) {
const uint width = get_global_size(1);
const uint x = get_global_id(0);
const uint y = get_global_id(1);
float sum = 0.0f;
for (uint i = 0; i < k; ++i) {
sum = fma(lhs[y * k + i], rhs[i * width + x], sum);
}
out[y * width + x] = sum;
}
Random Number Generator
This code requires OpenCL 3.0 or higher.
Rust code
use std::{time::{SystemTime}, mem::MaybeUninit};
use once_cell::sync::Lazy;
use blaze_rs::prelude::*;
#[global_context]
static CONTEXT : SimpleContext = SimpleContext::default();
static CODE : Lazy<String> = Lazy::new(|| {
let nanos = SystemTime::now().duration_since(SystemTime::UNIX_EPOCH).unwrap();
format!("#define TIME {}l\n{}", nanos.as_nanos(), include_str!("rng.cl"))
});
#[blaze(Rng)]
#[link = Lazy::force(&CODE)]
pub extern "C" {
fn next_ints (n: u32, out: *mut MaybeUninit<u32>);
}
#[test]
fn main () -> Result<()> {
let rng = Rng::new(None)?;
let mut random = Buffer::<u32>::new_uninit(5, MemAccess::WRITE_ONLY, false)?;
let random = unsafe {
let _ = rng.next_ints(5, &mut random, [5], None, EMPTY)?.wait()?;
random.assume_init()
};
println!("{random:?}");
Ok(())
}
OpenCL C code
#define MUTPILIER 0x5DEECE66Dl
#define ADDEND 0xBl
#define MASK() ((1l << 48) - 1)
global atomic_ulong SEED = ATOMIC_VAR_INIT((8682522807148012l * 1181783497276652981L) ^ TIME);
kernel void next_ints (const uint n, global uint* out) {
const uint ID = get_global_id(0);
const uint SIZE = get_global_size(1);
ulong oldseed, nextseed;
for (uint i = ID; i < n; i += SIZE) {
do {
oldseed = atomic_load(&SEED);
nextseed = (oldseed * MUTPILIER + ADDEND) & MASK();
} while (!atomic_compare_exchange_strong(&SEED, &oldseed, nextseed));
out[i] = nextseed >> 16;
}
}
Contributions
If you wish to contribute to the Blaze project, here are a couple of areas in which help would be much apreciated. Don't shy away from contributing to the project in a way not mentioned here, this are mere examples.
Documentation
One of the most important contributions would bo those to the documentation. Since this Blaze started as a solo-project, some (perhaps much) of it's documentation isn't up to standard. Don't be afraid to make a pull request on the dev branch if you want to contribute to the documentation. Also feel free to contact a maintainer, or any other contributor, if you have any questions.
Bug fixes
Another major type of contribution is bug fixes. If you find a bug, please report it with a new issue on the GitHub repository.
New features
The Blaze project is always excited about the prospect of addding new features and improving existsing ones. If you have any ideas for new features, please feel free to create a new issue on the GitHub repository.
Warning
When cloning the Blaze repo, you might have to clone thedocs
submodule manually. To do so, run the following commands.git clone https://github.com/Aandreba/blaze.git git submodule update --recursive --init make check
Policy
If you wish to contribute to the Blaze project, you mus follow this rules. Non-conformity with any of this policies can lead to a ban from the project.
No power structure
In the Blaze project, there is no king. This doesn't mean that the project doesn't have a leader (it does), rather that no opinion is above another only by merit of the person making it. Any opinion must be argumented, and cannot be taken at face value, regardless of the person making it. There is only a small exception to this rule. If the argument being made is about aesthetics, or two compiting arguments have proven to hold simillar value, the maintainers decision will prevail over any other.
Politeness
This policy is easy and simple. When addressing another contributor, you should do so politely. This means that any insult or mean-spirited comment towards another individual, or grup of individuals, will not be tolerated.
No politics
This is also a simple policy. No topics, other than those directly involving the project, should be discussed in issues or PRs. A violation of this policy will result in the perpetrators being immediatelly banned.