An intro to WebGPU (with Emscripten)

I was intrigued when I first heard about WebGPU as a graphics API. I saw the examples on the official website and was impressed by the performance and the potential of the API. I wanted to try it out myself, but I didn't have a lot of experience with graphics programming. I had some experience with OpenGL, but I was eager to learn more about modern graphics APIs. Similar, I saw the potential of WebGPU for creating high-performance graphics applications on the web. I decided to dive in and learn more about WebGPU.

In this article, I will share my experience learning WebGPU with Emscripten. I will explain what WebGPU is, how it works, and how you can use it in your web applications. I will also show you how to set up a development environment for WebGPU with Emscripten and how to create a simple graphics application using WebGPU.

What is WebGPU?

WebGPU is a new graphics API that is being developed by the GPU for the Web Community Group. It is designed to provide a low-level, high-performance graphics API for the web. WebGPU is based on the WebGPU specification and is intended to be a successor to WebGL. WebGPU is designed to be more modern, efficient, and flexible than WebGL, and it is intended to provide better performance and more features for web graphics applications.

It is important to note that WebGPU is still in development and is not yet supported by all browsers. However, it is already supported by some browsers, and it is expected to be widely supported in the future. WebGPU is designed to work well with modern graphics hardware and to take advantage of the capabilities of modern GPUs. It is intended to provide better performance and more features than WebGL, and it is expected to be used for a wide range of web graphics applications, including games, simulations, visualizations, and more.

How does WebGPU work?

WebGPU takes after more modern graphics APIs like Vulkan and Metal, this gives you much more control over the GPU and allows you to write more efficient graphics code. This means that WebGPU has a similar, albeit simpler, setup requirement to get started. Before you can draw anything to the screen, we need to query the API for some device resources:

  • Instance: This is the entry point to the WebGPU API. This is what is used to create your device and swap chain surface.
  • Adapter: This is the physical device that the browser is running on. This could be a discrete GPU, an integrated GPU, or even a software renderer. We can use the device to query for the capabilities of the GPU and to create the logical device.
  • Device: This is the logical device that you will be using to interact with the GPU. This is where you will be creating all your resources, like buffers, textures, and shaders.
  • Queue: This is the command queue that you will be using to submit commands to the GPU. This is where you will be sending all your draw calls and other commands to the GPU.

In the following example, we will be querying these resources. Keep in mind that due to the nature of the callbacks for each of the resources, the code is easier to read bottom to top, since the callbacks are only invoked when the resources are queried.

Renderer::DeviceResources deviceResources{};

// Create the instance.
deviceResources.instance = wgpu::CreateInstance(nullptr);
    
// Callback for when the adapter is queried.
auto adapterCallback = [](WGPURequestAdapterStatus status, WGPUAdapter adapter, char const* message, void* userData) 
{
    // Check query succes.
    if (status != WGPURequestAdapterStatus_Success)
    {
        std::cout << "Failed requesting adapter: " << message << std::endl;
        return;
    }
    
    // Receive our device resources through the user data pointer.
    Renderer::DeviceResources* deviceResources{ reinterpret_cast<Renderer::DeviceResources*>(userData) };
    
    // Set the adapter.
    deviceResources->adapter = wgpu::Adapter(adapter);
    
    // Callback for when the device is queried.
    auto deviceCallback = [](WGPURequestDeviceStatus status, WGPUDeviceImpl* deviceHandle, char const* message, void* userData)
    {
        // Check query success.
        if (status != WGPURequestDeviceStatus_Success)
        {
            std::cout << "Failed requesting device: " << message << std::endl;
            return;
        }
        
        // Receive our device resources through the user data pointer.
        Renderer::DeviceResources* deviceResources{ reinterpret_cast<Renderer::DeviceResources*>(userData) };

        // Set the device and queue.
        deviceResources->device = wgpu::Device(deviceHandle);
        deviceResources->queue = deviceResources->device.GetQueue();
    };
    
    // Describe the device.
    wgpu::DeviceDescriptor deviceDesc{};
    deviceDesc.label = "Device";
    deviceDesc.requiredFeatureCount = 0;
    deviceDesc.requiredLimits = nullptr;
    deviceDesc.nextInChain = nullptr;
    deviceDesc.defaultQueue.nextInChain = nullptr;
    deviceDesc.defaultQueue.label = "Default queue";
    
    // Request the device.
    deviceResources->adapter.RequestDevice(&deviceDesc, deviceCallback, deviceResources);
};

// Describe the options for the adapter.
wgpu::RequestAdapterOptions options{};
options.powerPreference = wgpu::PowerPreference::HighPerformance;

// Request the adapter.
deviceResources.instance.RequestAdapter(&options, adapterCallback, &deviceResources);

Perhaps in the future, it will be easier to use C++ coroutines to make the code more readable, but for now, we have to work with callbacks.

Once these resources are queried, you can get started on using them to set up your graphics pipeline and start drawing to the screen.

First, we need to set up our render target; this is where we will be drawing our graphics to. We need to create a swap chain that will be used to present our graphics to the screen. The swap chain is a collection of images that the GPU will be rendering to, and the browser will be presenting to the screen. We can create a swap chain by calling device.createSwapChain and passing in the surface that we want to present to. The surface is the canvas element that we want to draw to, and we can query this by calling instance.createSurface and passing in the canvas element.

// Set up the descriptor for the canvas surface.
wgpu::SurfaceDescriptorFromCanvasHTMLSelector canvasDesc{};
canvasDesc.sType = wgpu::SType::SurfaceDescriptorFromCanvasHTMLSelector;
canvasDesc.selector = "canvas";

wgpu::SurfaceDescriptor surfDesc{};
surfDesc.nextInChain = reinterpret_cast<wgpu::ChainedStruct*>(&canvasDesc);

// Use the instance to create the surface.
wgpu::Surface surface = _instance.CreateSurface(&surfDesc);

// Describe our swapchain, using the preferred format of the surface.
wgpu::SwapChainDescriptor swapDesc{};
swapDesc.label = "Swapchain";
swapDesc.usage = wgpu::TextureUsage::RenderAttachment;
swapDesc.format = _swapChainFormat = surface.GetPreferredFormat(_adapter);
swapDesc.width = _width;
swapDesc.height = _height;
swapDesc.presentMode = wgpu::PresentMode::Fifo;

_swapChain = _device.CreateSwapChain(surface, &swapDesc);

And with that, we have the necessary resources to start drawing to the screen. We can now start creating our graphics pipeline.

Rendering pipeline

The rendering pipeline in WebGPU is similar to other modern graphics APIs like Vulkan and Metal. You need to create a pipeline layout, a render pipeline, and a render pass to draw to the screen. The pipeline layout is used to define the layout of the resources that are used in the pipeline, like the uniforms, textures, and samplers. The render pipeline is used to define the shaders, the vertex layout, and the render state that is used to draw the graphics. The render pass is used to define the attachments that are used to draw to the screen, like the color and depth attachments.

For this, we will need the following resources:

  • wgpu::RenderPipeline: This object determines the state of how we draw to the screen. It contains the render target, the shaders, the resources that are bound, the layout of our vertices, our topology, and more.
  • wgpu::PipelineLayout: The pipeline layout is used to define the layout of the resources that are used in the pipeline, like the uniforms, textures, and samplers. This is a required object for creating a render pipeline.
  • wgpu::BindGroupLayout: The bind group layouts and bind groups are closely related. The bind group layout is used to define the base requirements for a bind group. Like the visibility in the shader stages, the minimal byte size, the binding unit, and the type of the resource. A bind group layout is made up of bind group layout entries (which is quite a mouthful). But this is just a composition relation: a bind group layout is made up of bind group layout entries. Finally, the bind group layout is required for creating the pipeline layout, and thus the render pipeline. Making it a necessary initialization step for your render pass.
  • wgpu::BindGroup: The bind group has a similar structure as the bind group layout, where it is composed of several bind group entries. However, now we have set up the actual data that is expected from this bind group entry in the shader. So here we pass either the texture views, the buffers or samplers. This is the actual data that is passed to the shader and used. However, what we can do is change bind groups before we invoke a draw call. Meaning that we can use the same pipeline state and change the data that is passed to the shader. This is useful, for example, when you have meshes with different materials and vertex data. You can use the same pipeline state and change the bind group to change the material and vertex data that is passed to the shader.
  • wgpu::ShaderModule: The shader module is the object that describes compiled shader source code. You can use these during the creation of the render pipeline.
  • wgpu::VertexState: This describes the vertex stage for your render pipeline. Here you pass the shader module for your vertex shader, the layout of your vertices, and you can choose the name of the entry function for your vertex shader. Meaning that you can repurpose the same shader module for different stages in your pipeline.
  • wgpu::FragmentState: This describes the fragment stage for your render pipeline. Here you pass the shader module for your fragment shader, and you can choose the name of the entry function for your fragment shader. It also requires you to pass the target formats for your render pass. This is the format of the color attachment that you will be rendering to.
  • wgpu::Buffer: This is a piece of data located on the GPU. And can be accessed through uniforms on the shaders. For example, this can be used for your instance data, like transforms and materials. This is a very versatile object that can be used for numerous different things. If you want to use this in a shader, it first has to be set up through a bind group layout and bind group, so the render pipeline is aware of it.

Shaders

WebGPU makes use of their own shader language; WGSL. There are some similarities with GLSL, but it's mostly a new language. WGSL is very reminiscent of Rust, and experience with Rust will make it easier to understand. Here is an example of a simple vertex shader in WGSL:

struct VertexIn 
{
    @location(0) aPos: vec3<f32>,
}

struct VertexOut 
{
    @builtin(position) vPos: vec4<f32>,
    @location(0) vUv: vec3<f32>,
}

struct Common 
{
    proj: mat4x4f,
    view: mat4x4f,
    vp: mat4x4f,

    lightDirection: vec3<f32>,
    time: f32,

    lightColor: vec3<f32>,
    normalMapStrength: f32,

    cameraPosition: vec3<f32>,
}

struct Instance 
{
    model: mat4x4f,
    exposure: f32
}

@group(0) @binding(0) var<uniform> u_common: Common;
@group(1) @binding(0) var<uniform> u_instance: Instance;
@group(1) @binding(1) var cubemapSampler: sampler;
@group(1) @binding(2) var skyboxMap: texture_cube<f32>;

@vertex
fn vs_main(input: VertexIn) -> VertexOut
{
    var out: VertexOut;
    let mvp = u_common.vp * u_instance.model;
    out.vPos = (mvp * vec4<f32>(input.aPos, 1.0)).xyzw;
    out.vUv = input.aPos;

    return out;
}

@fragment
fn fs_main(in: VertexOut) -> @location(0) vec4<f32>
{
    var color = pow(textureSample(skyboxMap, cubemapSampler, in.vUv).rgb, vec3<f32>(2.2));
    color = vec3<f32>(1.0) - exp(-color * u_instance.exposure);

    return vec4<f32>(color, 1.0);
}

This shader is used to draw a skybox in our scene. Let's go over a few things in this shader:

  • We use both @vertex and @fragment to define the entry points for our shaders. And have them in the same source file. We can make use of the entry point names in the render pipeline to reuse the same shader module for different stages.
  • We declare our vertex and fragment shaders through the @vertex and @fragment attributes. This is the entry point for our shaders.
  • We define the input and output structs for our shaders. These are used to pass data between the vertex and fragment shaders. The input struct is used to pass data from the vertex shader to the fragment shader. The output struct is used to pass data from the vertex shader to the fragment shader.
  • We define the uniforms that are used in our shaders. These are the variables that are passed to the shader from the CPU. We use the @group and @binding attributes to define the binding points for the uniforms. This is used to bind the uniforms to the pipeline layout. Here you can start to see the complete life cycle of the bind groups and how they relate to uniforms in shaders.

A final overview of bind groups

Personally, I found the bind groups to be the most confusing part of WebGPU. But once you understand the life cycle of the bind groups, it becomes considerably easier to work with. Here is a final overview of the life cycle of the bind groups:

  1. Bind group layout entries: These are used to compose a complete bind group layout. They define the type of data that is expected of them, their visibility in the different shader stages, and the binding unit. It's basically the bare essentials to declare them in the render pipeline, and as such, they are required to be defined when making the render pipeline.
  2. Bind group layouts: These are made up of bind group layout entries. During this stage, they don't actually have a bind group ID yet, that can be set when it is bound to the pipeline before making a draw call, meaning that you have more freedom in how you organize these. (This ID is the one that is defined in the shader example above: @group()).
  3. Bind group entries: These relate directly to the bind group layout entries. But instead, now we provide the actual data that is expected in the shader. Like the texture view, the buffer or the sampler. With this data, we can create a bind group.
  4. Bind groups: These are made up of bind group entries. These make use of the bind group layout to define them, meaning that they should have the same number of entries between them. It doesn't make sense to have a bind group that uses a bind group layout that has a different number of bind group layout entries. This object is used to bind to the pipeline before making a draw call. For example, you would have different bind groups per mesh object, but you have one bind group layout in the render pipeline that the bind groups are based on.
  5. Finally, we get to our draw call. Before making the actual call, we first bind the bind group to the pipeline. This is where we set the bind group ID that is also defined in the shader. This requires some coordination between your shader and your draw call.
  6. Then, when the shader is executed, it matches the bind groups with the @group() @binding() attributes in the shader. And this data can now be used in the shader stages that are described in the bind group layout entries.

Below is an example that I used to provide uniform data for an instance in my application. This makes use of one feature I didn't talk about, which is the dynamic offset. This is used to provide a dynamic offset for the buffer that is used in the bind group. This is useful when you have multiple instances that you want to draw, and you want to provide different data for each instance. This is done by providing a dynamic offset for the buffer that is used in the bind group. This is done by setting the hasDynamicOffset to true in the bind group layout entry.

void Initialize()
{
    // Define the bind group layout entry for the instance data.
    std::array<wgpu::BindGroupLayoutEntry, 1> instanceBGLayoutEntry{};
    instanceBGLayoutEntry[0].binding = 0;
    instanceBGLayoutEntry[0].visibility = wgpu::ShaderStage::Vertex;
    instanceBGLayoutEntry[0].buffer.type = wgpu::BufferBindingType::Uniform;
    instanceBGLayoutEntry[0].buffer.minBindingSize = sizeof(Instance);
    instanceBGLayoutEntry[0].buffer.hasDynamicOffset = true;
    
    // Create the bind group layout for the instance data.
    wgpu::BindGroupLayoutDescriptor bgLayoutDesc{};
    bgLayoutDesc.label = "Instance binding group layout";
    bgLayoutDesc.entryCount = instanceBGLayoutEntry.size();
    bgLayoutDesc.entries = instanceBGLayoutEntry.data();
    _instanceBindGroupLayout = _renderer.Device().CreateBindGroupLayout(&bgLayoutDesc);
    
    // Define the bind group entry for the instance data.
    std::array<wgpu::BindGroupEntry, 1> bgEntry{};
    bgEntry[0].binding = 0;
    bgEntry[0].buffer = _instanceBuffer;
    bgEntry[0].size = sizeof(Instance);
    
    // Create the bind group for the instance data.
    wgpu::BindGroupDescriptor bgDesc{};
    bgDesc.label = "Instance bind group";
    bgDesc.layout = _instanceBindGroupLayout;
    bgDesc.entryCount = bgLayoutDesc.entryCount;
    bgDesc.entries = bgEntry.data();
    _instanceBindGroup = _renderer.Device().CreateBindGroup(&bgDesc);
       
    // Create the pipeline layout for the render pipeline.
    // We use three different bind group layouts.
    wgpu::PipelineLayoutDescriptor layoutDesc{};
    layoutDesc.label = "Default pipeline layout";
    std::array<wgpu::BindGroupLayout, 3> bindGroupLayouts{ _renderer.CommonBindGroupLayout(), _instanceBindGroupLayout, _pbrBindGroupLayout};
    layoutDesc.bindGroupLayoutCount = bindGroupLayouts.size();
    layoutDesc.bindGroupLayouts = bindGroupLayouts.data();
    wgpu::PipelineLayout pipelineLayout = _renderer.Device().CreatePipelineLayout(&layoutDesc);
    
    // Create the render pipeline for the application.
    // (I've skipped the details for filling out the descriptor for the render pipeline.)
    _pipeline = _renderer.Device().CreateRenderPipeline(&rpDesc);
}

void Render(const wgpu::CommandEncoder& encoder)
{
    wgpu::RenderPassColorAttachment colorDesc{};
    colorDesc.view = _renderTarget;
    colorDesc.loadOp = wgpu::LoadOp::Load;
    colorDesc.storeOp = wgpu::StoreOp::Discard;
    
    wgpu::RenderPassDescriptor renderPass{};
    renderPass.label = "Main render pass";
    renderPass.colorAttachmentCount = 1;
    renderPass.colorAttachments = &colorDesc;
    renderPass.depthStencilAttachment = &_renderer.DepthStencilAttachment();
    
    // Set up our render pass encoder.
    wgpu::RenderPassEncoder pass = encoder.BeginRenderPass(&renderPass);
    
    // Set the pipeline for the render pass.
    pass.SetPipeline(_pipeline);
    
    // Iterate over the drawings we have queued up.
    uint32_t i{ 0 };
    while (!_drawings.empty())
    {
        // Get the next drawing.
        auto [mesh, transform] = _drawings.front();
        _drawings.pop();
    
        // Describe the instance data (mainly the transformation matrix.).
        Instance instance;
        instance.model = _renderer.BuildSRT(transform);
        instance.transInvModel = glm::mat4{ glm::mat3{ glm::transpose(glm::inverse(instance.model)) } };
    
        // Determine the dynamic offset for the instance data.
        uint32_t dynamicOffset{ i * _uniformStride };
        
        // Write the instance data to the buffer.
        _renderer.Queue().WriteBuffer(_instanceBuffer, dynamicOffset, &instance, sizeof(instance)); // TODO: write entire buffer once.
    
        // Set the indices and vertices for the mesh.
        pass.SetVertexBuffer(0, mesh.vertBuf, 0, wgpu::kWholeSize);
        pass.SetIndexBuffer(mesh.indexBuf, mesh.indexFormat, 0, wgpu::kWholeSize);
        
        // Set the bind groups for the render pass.
        pass.SetBindGroup(0, _renderer.CommonBindGroup(), 0, nullptr);
        pass.SetBindGroup(1, _instanceBindGroup, 1, &dynamicOffset); // Note the use of the dynamic offset.
        pass.SetBindGroup(2, mesh.bindGroup, 0, nullptr);
    
        // Perform the draw call.
        pass.DrawIndexed(mesh.indexCount, 1, 0, 0, 0);
    
        ++i;
    }
    
    // End the pass.
    pass.End();
}

Conclusion

While I enjoyed my brief experiments with WebGPU, I found it quite troublesome at times. The entire set-up required for the bind groups makes it difficult to change around uniforms in shaders or add new ones. This makes the iterating process frustrating, since making tiny changes is now quite an architectural challenge. However, understanding the entire process was quite gratifying, and in the end, I can see how you could use this to make graphics applications.

I did thoroughly enjoy making use of modern graphics API as opposed to OpenGL. Having actual objects to work with and a nice C++ API to work with was a breath of fresh air. I also found the shaders to be quite nice to work with, and the WGSL language was straightforward to understand. The learning process was very rewarding, since there isn't as much documentation available yet, so you have to figure out plenty of things yourself.

Next, I found the development pipeline more trouble than it's worth. Using Emscripten to compile to WASM and making use of WebGPU was a cool journey to go through, and I found it liberating to find out it's actually possible. But the debugging on Emscripten lacks luster. I had to make use of an extension in VS to be able to compile to Emscripten, but the break points didn't work. I found myself using std::cout to debug my code, which is a bit of a step back from the modern debugging tools we have available. Debugging WebGPU was also slightly annoying; while there are some extensions on Chrome to let you take frame captures, it is still not there yet.

Perhaps this was the result of using an API with C++ that was not intended for it. From what I saw, this works better with Rust, so perhaps the experience is better there.

As a final note, I would recommend WebGPU to anyone who is interested in modern graphics programming. It is a great way to learn about modern graphics APIs and to create high-performance graphics applications on the web. I would also recommend using Rust with WebGPU, as it seems to be a better fit for the API. I hope that WebGPU will be widely supported in the future and that it will be used for a wide range of web graphics applications. I am excited to see what the future holds for WebGPU and what kind of graphics applications will be created with it.

I hope this article has been helpful and informative. I will leave the fruits of my labor at the bottom of the page for you to look at and interact with (note that it is likely only supported on Chrome browsers).