Skip to main content

Basics of Compute Shaders in WL 1

Β· 4 min read
Kirill Vasin

A GPU is also a great tool for general-purpose computations. There are a few ways to couple it with Wolfram Language:

  • CUDALink
  • OpenCLLink
  • LibraryLink

The first option requires hardware from Nvidia, which is a massive drawback considering there are many other processors on the market capable of crunching numbers. The last option allows integrating any dynamic library written in C/Rust (probably there are other bindings as well) into our Kernel, but the amount of effort required to write a general-purpose GPU library and deal with cross-platform issues is quite cumbersome and defeats the whole purpose of using WL here.

We will go with the most cross-platform and hardware-agnostic solution: OpenCL πŸš…

Download original notebook
Needs["OpenCLLink`"]
If[!OpenCLQ[], Alert["Report the police, it doest not work"] // FrontSubmit]
True

We assume you already know the OpenCL programming language and its execution model. There are plenty of resources available online.

Drawing on Canvas​

The most visual way of seeing the work of your tiny GPU kernels is to ask them to generate data and interpret it as an image. Let us define a simple kernel that fills an entire screen.

test.cl

__kernel void render(
  __global uchar4* output, 
  const int width, 
  const int height
) {
    unsigned int work_item_id = get_global_id(0);
    
    unsigned int x_coord = work_item_id % width;
    unsigned int y_coord = work_item_id / width;

    float3 finalcolor = (float3)(0.0f, 0.5f, 0.2f);

    //clamp to 8bits for each channel
    
    uchar4 rgba;
    rgba.x=(uchar)(finalcolor.x*255.0);
    rgba.y=(uchar)(finalcolor.y*255.0);
    rgba.z=(uchar)(finalcolor.z*255.0);
    rgba.w=255;

    output[work_item_id] = rgba;
}

A side note

If you run the cell above, it creates a file named test.cl

We added a minor optimization: the resulting color will be normalized to the range of the UnsignedInteger8 typed array in Wolfram Language.

Now let us compile it.

render = OpenCLFunctionLoad[File["test.cl"], 
  "render", {{"UnsignedByte[4]", _, "Output"}, _Integer, _Integer}, 256, "ShellOutputFunction"->Print];
" "

Here we specified the local group size to be 256, but at this stage, it does not matter for us. Just make sure that it does not exceed your CL device limitations.

Cases[OpenCLInformation[], Rule["Maximum Work Group Size", r_] :> r, Infinity]
{256}

Allocate memory for our generated image. Let it be small for now, 128x128.

image = OpenCLMemoryLoad[Table[{0, 0, 0, 0}, {i, 128}, {j, 128}], "UnsignedByte[4]"];

A side note

Do not forget to unload this memory if you no longer need it.

To execute our OpenCL function, we also need to specify the grid size (aka how many kernels we need to run). Since in our case this is a bitmap image and we are running 1 kernel per pixel the number will 128^2 according the resolution we picked for an image

render[image, 128, 128, 128 128];
image // OpenCLMemoryGet;
NumericArray[%, "UnsignedInteger8"];
Image[%, "Byte", Magnification->3]
(*VB[*)(FrontEndRef["fe0488c4-e676-4a8a-9e38-79950149d078"])(*,*)(*"1:eJxTTMoPSmNkYGAoZgESHvk5KRCeEJBwK8rPK3HNS3GtSE0uLUlMykkNVgEKp6UamFhYJJvoppqZm+maJFok6lqmGlvomltamhoYmlimGJhbAACA9RUH"*)(*]VB*)

It is important to note that you should always try to utilize NumericArray when working with raster graphics, since Wolfram's native List is quite slow for this task. The second important thing is to specify the format explicitly in Image as well. There is a 1:1 correspondence:


OpenCLNumericArrayImage
UnsignedByteUnsignedInteger8Byte
FloatReal32Real

Let us try something more interesting.

Here is a very primitive example of what is called Signed-Distance-Fields (SDF). We set the "geometry" using a parametric equation and calculate the color of every pixel based on the distance to vertices or any other control points of our defined geometry. For example, to draw a disk we need only the equation of a circle with two parameters:

color=R2βˆ’βˆ£rβˆ’rc∣2\text{color} = R^2 - |\mathbf{r} - \mathbf{r}_c|^2

where rc\mathbf{r}_c is the coordinates of the center and RR is the radius. After that, we should clip the resulting color to the range of 0.0 and 1.0 to avoid any artifacts.

test.cl

float clip(float val, const float min, const float max) {
  if (val > max) return max;
  if (val < min) return min;
  return val;
}

__kernel void render(
  __global uchar4* output, 
  const int width, 
  const int height
) {
    unsigned int work_item_id = get_global_id(0);
    
    unsigned int x_coord = work_item_id % width;
    unsigned int y_coord = work_item_id / width;

    //normalize screen coordinates
    float fx = (float)x_coord / (float)width; 
	float fy = (float)y_coord / (float)height;

    float3 finalcolor = (float3)(fy * 0.1f, fy * 0.3f, 0.3f); // a nice background

    float dist2 = (fx - 0.5f)*(fx - 0.5f) + (fy - 0.5f)*(fy - 0.5f);
    finalcolor.x += clip(10.0f*(0.4f*0.4f - dist2), 0.0f, 1.0f);

    //clamp to 8bits for each channel
    
    uchar4 rgba;
    rgba.x=(uchar)(finalcolor.x*255.0);
    rgba.y=(uchar)(finalcolor.y*255.0);
    rgba.z=(uchar)(finalcolor.z*255.0);
    rgba.w=255;

    output[work_item_id] = rgba;
}

Reload

render = OpenCLFunctionLoad[File["test.cl"], 
  "render", {{"UnsignedByte[4]", _, "Output"}, _Integer, _Integer}, 256, "ShellOutputFunction"->Print];

render[image, 128, 128, 128 128];
image // OpenCLMemoryGet;
NumericArray[%, "UnsignedInteger8"];
Image[%, "Byte", Magnification->3]
" "
(*VB[*)(FrontEndRef["402e057f-1586-4e2d-8c5a-504dc17741f1"])(*,*)(*"1:eJxTTMoPSmNkYGAoZgESHvk5KRCeEJBwK8rPK3HNS3GtSE0uLUlMykkNVgEKmxgYpRqYmqfpGppamOmapBql6FokmybqmhqYpCQbmpubGKYZAgB3XxUP"*)(*]VB*)

Because our toy SDF function provides a squared distance, it creates a smooth transition on the edges before it starts to clip to 1.0.

Controls​

We can hook up some external controls to change the parameters of our disk. The simplest one will be the radius.

test.cl

float clip(float val, const float min, const float max) {
  if (val > max) return max;
  if (val < min) return min;
  return val;
}

__kernel void render(
  __global uchar4* output, 
  const int width, 
  const int height,
  const float radius //<---- HERE
) {
    unsigned int work_item_id = get_global_id(0);
    
    unsigned int x_coord = work_item_id % width;
    unsigned int y_coord = work_item_id / width;

    //normalize screen coordinates
    float fx = (float)x_coord / (float)width; 
	float fy = (float)y_coord / (float)height;

    float3 finalcolor = (float3)(fy * 0.1f, fy * 0.3f, 0.3f); // a nice background

    float dist2 = (fx - 0.5f)*(fx - 0.5f) + (fy - 0.5f)*(fy - 0.5f);
    finalcolor.x += clip(10.0f*(radius*radius - dist2), 0.0f, 1.0f); //<---- HERE

    //clamp to 8bits for each channel
    
    uchar4 rgba;
    rgba.x=(uchar)(finalcolor.x*255.0);
    rgba.y=(uchar)(finalcolor.y*255.0);
    rgba.z=(uchar)(finalcolor.z*255.0);
    rgba.w=255;

    output[work_item_id] = rgba;
}

Now recompile it once.

render = OpenCLFunctionLoad[File["test.cl"], 
  "render", {{"UnsignedByte[4]", _, "Output"}, _Integer, _Integer, "Float"}, 256, "ShellOutputFunction"->Print];
" "

To render dynamically, we provide an intermediate symbol to which an Image listener will be assigned.

Module[{buffer},
  {
    Image[buffer // Offload, "Byte", Magnification->3],

    EventHandler[InputRange[0., 1., 0.1, "Label"->"R"], Function[radius,
      render[image, 128, 128, radius, 128 128];
      buffer = NumericArray[image // OpenCLMemoryGet, "UnsignedInteger8"];
    ]] // EventFire (*BB[*)(* fire once to initialize buffer *)(*,*)(*"1:eJxTTMoPSmNhYGAo5gcSAUX5ZZkpqSn+BSWZ+XnFaYwgCS4g4Zyfm5uaV+KUXxEMUqxsbm6exgSSBPGCSnNSg9mAjOCSosy8dLBYSFFpKpoKkDkeqYkpEFXBILO1sCgJSczMQVYCAOFrJEU="*)(*]BB*)
  } // Row 
]
(*GB[*){{(*VB[*)(FrontEndRef["0b6d244d-9e67-4236-90f8-133aa756151a"])(*,*)(*"1:eJxTTMoPSmNkYGAoZgESHvk5KRCeEJBwK8rPK3HNS3GtSE0uLUlMykkNVgEKGySZpRiZmKToWqaameuaGBmb6VoapFnoGhobJyaam5oZmhomAgB4zxTh"*)(*]VB*)(*|*),(*|*)(*VB[*)(EventObject[<|"Id" -> "e86d4b59-0c98-45ff-af58-e0d00db0df09", "Initial" -> 0.5, "View" -> "d4225ab7-34a1-4615-8131-947dbbfc6796"|>])(*,*)(*"1:eJxTTMoPSmNkYGAoZgESHvk5KRCeEJBwK8rPK3HNS3GtSE0uLUlMykkNVgEKp5gYGZkmJpnrGpskGuqamBma6loYGhvqWpqYpyQlpSWbmVuaAQB3ohUV"*)(*]VB*)}(*||*),(*||*)}(*]GB*)

When you drag the slider, it reruns the OpenCL kernel and provides a new bitmap image to be displayed on the screen.

In the next part we will go deeper and try to take advantage of local groups, memory and do animations.