Multi Your Threading #10: Nice GPU you have there, drawing this webpage....

Here I am, copypasting this disclaimer once again!

 

This is chapter 10 in a series I’ve been doing on multithreading. Each chapter (generally) is based on me reimplementing the same code using different languages/technologies/models and talking about them both in isolation as well as in comparison to one another. While it’s not necessary that you’ve been following, I do assume you’re familiar with parallelisation in general and I’ll make the occasional references to other implementations. You can find them here if you’d like to check them out:

 

  1. C++ STL Threads and Tasks
  2. OpenMP and MPI
  3. OpenCL and CUDA
  4. AMD HIP and HC C++
  5. SYCL and Vulkan
  6. Algorithm overhaul with updated benchmarks for previous implementations
  7. Intel TBB and ISPC
  8. STL Threads for the browser with emscripten
  9. OpenGL through both compute and fragment shaders

Also: All the code here is from ToyBrot. If you prefer to see the whole picture or browse on your own as you read, feel free to clone the repo and open it up on your favourite code editor (maybe on a second screen if you have one). I personally am a big fan of Atom

So, in Multi Your Threading #8 we used emscripten to do the unholy act that was compiling our C++ multithreaded code and somehow making a browser run it. Then in #9 we took a step back for a quick stop in the land of the living and had a look at using Ye Olde Open Graphics Library to give us yet another way to access our GPUs. Now it is time we do the unspeakable and combine these two bits of tech. Place your GPU’s tender cores upon the alter of a darker tomorrow. Your browser shall feast on it!

All of the GL, and then some

So, I mentioned before that the reason why I implemented OpenGL is because I needed it to access the GPU through the browser. That is true only in some ways. There are, currently, two options and neither of these options are exactly OpenGL… kind of…. opening up the current Khronos’ active standards page, this is what we get:

So,  when we say OpenGL… do you mean OpenGL OpenGL, or OpenGL, the family? Because a lot of those are related in various ways. Of interest to us, in addition to just plain old OpenGL, are two others: OpenGL ES and WebGL.

WebGL is what we’re REALLY interested in. WebGL, currently in version 2.0, is essentially the OpenGL API that’s made for browsers. The great advantage of WebGL from the user side is that the support is built into the browser, so you don’t need to install a separate plugin or whatnot. You have a modern browser, there you go. For the developer, from Khronos:

“WebGL is a cross-platform, royalty-free web standard for a low-level 3D graphics API based on OpenGL ES, exposed to ECMAScript via the HTML5 Canvas element. Developers familiar with OpenGL ES 2.0 will recognize WebGL as a Shader-based API using GLSL, with constructs that are semantically similar to those of the underlying OpenGL ES API. It stays very close to the OpenGL ES specification, with some concessions made for what developers expect out of memory-managed languages such as JavaScript. WebGL 1.0 exposes the OpenGL ES 2.0 feature set; WebGL 2.0 exposes the OpenGL ES 3.0 API.”

And THIS is why OpenGL ES becomes important to us. What is OpenGL ES? “OpenGL® ES is a royalty-free, cross-platform API for rendering advanced 2D and 3D graphics on embedded and mobile systems – including consoles, phones, appliances and vehicles. It consists of a well-defined subset of desktop OpenGL suitable for low-power devices, and provides a flexible and powerful interface between software and graphics acceleration hardware.”

So… OpenGL is this very handy, very powerful graphics API for talking with GPUs. In order to help providing OpenGL for systems with stricter limitations, there is OpenGL ES, which is a restricted set of OpenGL. And then built on top of this restricted set, there is WebGL, for browsers. WebGL 1.0 mirrors OpenGL ES 2.0 and WebGL 2.0 mirrors OpenGL ES 3.0. This is why the fragment shader on the previous chapter was targeted at shader language version 300 ES

All right, so, WebGL 2.0, we have it, just code like you’re coding to OpenGL ES 3.0, emscripten will do the magic, we’re golden right? But why are we talking about the fragment shader, then? Well, again from Khronos:

“OpenGL ES 3.1 provides the most desired features of desktop OpenGL 4.4 in a form suitable for mobile devices,” said Tom Olson, chair of the OpenGL ES working group and Director of Graphics Research at ARM. “It provides developers with the ability to use cutting-edge graphics techniques on devices that are shipping today.”

 

Key features of the OpenGL ES 3.1 specification include:

  • Compute shaders – applications can use the GPU to perform general computing tasks, tightly coupled with graphics rendering. Compute shaders are written in the GLSL ES shading language, and can share data with the graphics pipeline;

wtfdidijustread

So… yeah… you can’t Compute Shader in WebGL. Not until they release a new version of the standard AND, this being the most important bit, browsers implement it. So we’re stuck in the good old days of just tricking your GPU into thinking it’s doing graphics until things mover forward. However, when “things move forward”, though, we might not even be using WebGL for this, actually…

There is a new standard coming out, not from Khronos, but from W3C, the standards org for the interwebs. This standard is currently known as WebGPU and it’s aimed at targeting GPUs at a lower level, so more like the Vulkan style. Of note in this context is that Emscripten DOES have some support for WebGPU, however it’s a bit like “headers exist but I couldn’t find anything in the docs right now”. Additionally, browser support is super iffy. Right now on my machine, the only browser I’ve managed to run some webGPU samples in was Firefox Nightly (not even dev). And this is all part and parcel of being a very nascent standard. However, webGPU DOES include compute shaders so I think this could open some massive opportunities for web applications. Due to all of those problems, sadly, it’s not something I can use and, thus, will not look into too deeply into at the moment.

Which brings us full circle: The option here is WebGL 2.0, which means our code to be built by emscripten needs to conform to OpenGL ES 3.0 (with some additional restrictions) and, thus, cannot use compute shaders… well, let’s go then

Building a board so your GPU can surf the web

So, the good thing from having done both the CPU emscripten implementation and the OpenGL fragment shader stuff is that really, for the most part, I’ve got all of the working bits sorted at this point. So what we need to do is basically marry the main.cpp from our CPU emscripten implementation with the FracGen.cpp from the OpenGL-frag implementation and then just sort everything around that. Easy right?!?!?

One of the first decisions this allows us to make is, since the heavy-lifting is going to all moved to the GPU, we can actually drop multithreading from toyBrot here, on the CPU side. The advantage being that we can then avoid the issues that arose from that because of the spectre mitigations (and Apple just not implementing stuff, though Safari actually ALSO doesn’t implement WebGL 2 because… of course). This actually means that our GPU based implementation should (famous last words) “just run” on more browser/device combinations than our CPU-only version.

In order to do this, I have split up my CMake Options for emscripten related to multithreading. I’ve touched on this briefly before but this requires some specific stuff both for the preprocessor as well as for emscripten.

if(TB_NUM_THREADS EQUAL 0)
    message("Web Assembly requires a predefined number of threads")
    message("Defaulting to 120")
    set(TB_NUM_THREADS "120" CACHE STRING "Number of threads to use (0 for auto)" FORCE)
endif()

set(TB_EMS_THREADS  -sUSE_PTHREADS=1 
                    -sPTHREAD_POOL_SIZE=${TB_NUM_THREADS} 
                    -sALLOW_BLOCKING_ON_MAIN_THREAD=1 )

set(TB_EMS_OPTIONS  -sEXIT_RUNTIME=1
                    -sDISABLE_EXCEPTION_CATCHING=0
                    -sTOTAL_STACK=100mb
                    -sINITIAL_MEMORY=700mb
                    -sASSERTIONS=1
                    
                    #this is for the display
                    -sUSE_SDL=2
                    
                    #this is for image download
                    -sEXTRA_EXPORTED_RUNTIME_METHODS=['FS']
                    --bind
                    -sUSE_LIBPNG=1
                    
                    #this is for loading files
                    --use-preload-plugins )

set(TB_EMS_DEFINES "TOYBROT_MAX_THREADS=${TB_NUM_THREADS}")

The CMakeLists specific for the WebGL project uses these variables:

project(rmWebGL LANGUAGES CXX)

set(rmWebGL_SRCS    "main.cpp"
                    "FracGen.cpp" )

set(rmWebGL_HDRS    "FracGen.hpp"
                    "${TB_COMMON_SRC_DIR}/Vec.hxx"
                    "${TB_COMMON_SRC_DIR}/dataTypes.hxx"
                    "${TB_COMMON_SRC_DIR}/defines.hpp" )

set(rmWebGL_GLSRCS "FracGen.frag" )

set(rmWebGL_JSSRCS "${TB_COMMON_SRC_DIR}/download.js")

set(rmWebGL_HTML "${TB_COMMON_SRC_DIR}/toybrot.html" )

if(TB_ALTERNATIVE_SHELL_FILE)
    set(rmWebGL_EMCCEXTRA --shell-file ${rmWebGL_HTML})
endif()

list(APPEND rmWebGL_EMCCEXTRA 
                    --extern-pre-js ${rmWebGL_JSSRCS})

add_executable(${PROJECT_NAME}    ${rmWebGL_SRCS} ${rmWebGL_HDRS} )
add_executable(${PROJECT_NAME}-mt ${rmWebGL_SRCS} ${rmWebGL_HDRS} )

add_custom_target(${PROJECT_NAME}_EXTRA_SRCS SOURCES    ${rmWebGL_GLSRCS}
                                                        ${rmWebGL_JSSRCS} 
                                                        ${rmWebGL_HTML} )

target_compile_definitions(${PROJECT_NAME} PRIVATE  ${TB_EMS_DEFINES} 
                                                    "TB_SINGLETHREAD"
                                                    "TB_WEBGL_COMPAT"
                                                    "TB_OPENGL"
                                                    "TOYBROT_ENABLE_PNG"
                                                    "TOYBROT_ENABLE_GUI"
                                                    $<$<BOOL:TB_STDFS_FOUND>:"${TB_FS_DEFINE}"> )
                                                    
target_compile_definitions(${PROJECT_NAME}-mt PRIVATE   ${TB_EMS_DEFINES}
                                                        "TB_WEBGL_COMPAT"
                                                        "TB_OPENGL"
                                                        "TOYBROT_ENABLE_GUI"
                                                        $<$<BOOL:TB_STDFS_FOUND>:"${TB_FS_DEFINE}"> )

target_compile_options(${PROJECT_NAME} PRIVATE  --preload-file ${rmWebGL_GLSRCS}
                                                ${TB_EMS_OPTIONS})
                                                
target_compile_options(${PROJECT_NAME}-mt PRIVATE   "-sOFFSCREENCANVAS_SUPPORT=1" 
                                                    --preload-file ${rmWebGL_GLSRCS}
                                                    ${TB_EMS_OPTIONS}
                                                    ${TB_EMS_THREADS} )

target_link_options(${PROJECT_NAME} PRIVATE     "-sMAX_WEBGL_VERSION=2"
                                                "-sMIN_WEBGL_VERSION=2" 
                                                ${rmWebGL_EMCCEXTRA}
                                                --preload-file ${rmWebGL_GLSRCS}
                                                ${TB_EMS_OPTIONS})
                                                
target_link_options(${PROJECT_NAME}-mt PRIVATE  "-sOFFSCREENCANVAS_SUPPORT=1"
                                                "-sMAX_WEBGL_VERSION=2"
                                                "-sMIN_WEBGL_VERSION=2"
                                                ${rmWebGL_EMCCEXTRA}
                                                --preload-file ${rmWebGL_GLSRCS}
                                                ${TB_EMS_OPTIONS}
                                                ${TB_EMS_THREADS})

target_sources(${PROJECT_NAME} PRIVATE  "${TB_COMMON_SRC_DIR}/FracGenWindow.cpp"
                                        "${TB_COMMON_SRC_DIR}/FracGenWindow.hpp"
                                        "${TB_COMMON_SRC_DIR}/pngWriter.cpp"
                                        "${TB_COMMON_SRC_DIR}/pngWriter.hpp" )
                                        
target_sources(${PROJECT_NAME}-mt PRIVATE "${TB_COMMON_SRC_DIR}/FracGenWindow.cpp"
                                          "${TB_COMMON_SRC_DIR}/FracGenWindow.hpp")


target_include_directories(${PROJECT_NAME}  PRIVATE
                                            ${CMAKE_CURRENT_SOURCE_DIR}
                                            ${TB_COMMON_SRC_DIR}
                                            ${TB_EMS_INCLUDES} )

target_include_directories(${PROJECT_NAME}-mt  PRIVATE
                                            ${CMAKE_CURRENT_SOURCE_DIR}
                                            ${TB_COMMON_SRC_DIR}
                                            ${TB_EMS_INCLUDES} )


add_custom_command( TARGET ${PROJECT_NAME} PRE_BUILD
                    COMMAND ${CMAKE_COMMAND} -E copy ${rmWebGL_GLSRCS} ${CMAKE_CURRENT_BINARY_DIR}/
                    COMMAND ${CMAKE_COMMAND} -E copy ${rmWebGL_JSSRCS} ${CMAKE_CURRENT_BINARY_DIR}/
                    DEPENDS ${rmWebGL_GLSRCS} ${rmWebGL_JSSRCS}
                    WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})
                    
add_custom_command( TARGET ${PROJECT_NAME}-mt PRE_BUILD
                    COMMAND ${CMAKE_COMMAND} -E copy ${rmWebGL_GLSRCS} ${CMAKE_CURRENT_BINARY_DIR}/
                    COMMAND ${CMAKE_COMMAND} -E copy ${rmWebGL_JSSRCS} ${CMAKE_CURRENT_BINARY_DIR}/
                    DEPENDS ${rmWebGL_GLSRCS} ${rmWebGL_JSSRCS}
                    WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})

if(NOT WIN32)
    if(CMAKE_CXX_COMPILER MATCHES clang)
        target_compile_options(${PROJECT_NAME} PRIVATE "-fcxx-exceptions")
    else()
        target_compile_options(${PROJECT_NAME} PRIVATE "-fexceptions")
    endif()
endif()

install(TARGETS ${PROJECT_NAME} RUNTIME DESTINATION bin)
install(TARGETS ${PROJECT_NAME}-mt RUNTIME DESTINATION bin)
install(FILES ${rmWebGL_GLSRCS} DESTINATION bin)

So there is a lot happening here and it’s mostly to do with really how these files either are used for multiple situations in the build or (like the main.cpp of the emscripten projects) are built as if they were so, because, for example, this project and the CPU version for emscripten COULD just be a check in the OpenGL-frag and the STD::Threads projects: “if emscripten built these in this manner”. I’ve mentioned before but a different project of mine, Warp Drive, a game engine built on top of SDL2, DOES account for the possibility of the very same code base being built for either regular desktop or webassembly with emscripten. For toyBrot though, enough shenanigans as it is, simpler to split the project, even if there is some needless code replication.

The very first interesting thing that we have in this CMakelists is that, in addition to the fragment shader itself, there’s also a javascript file and an html file. The HTML file is what emscripten calls a “shell file”. I didn’t get into this in the chapter 8 because there was already too much new, but now that we (hopefully) have a bit more familiarity with this whole thing, we can look a bit deeper. The shell file is just the html where emscripten pastes the call to the javascript it generates. It also defines the “Module” object which the generated javascript talks to. You can also talk with this object from C++ using emscripten APIs. The only reason I have an alternative version here is that, for deploying in this blog, integrated in wordpress, I needed emscripten to look for an html canvas element with a name other than “canvas” which is the default, because between wordpress and elementor (which I use to build my pages), that’s already being used. Right now this also requires you to tweak A line of code in the SDL2 port for emscripten and rebuild it. The process is trivial you delete the binary, tweak the line, save the file and next time emcc needs it it just rebuilds. But it HAS to be done, for reasons outlined here.

The Javascript file is useful if you want to, say, have additional javascript code that goes into the generated stuff by emscripten around the part where it actually calls the webassembly for your application. In here, “–extern-pre-js” means that it gets put before the application and that I don’t want this to be optimised or anything like that, just to make sure it won’t get mangled or anything. And this JS is where I have my download function that I copypasted from people who know how to do this and then tweaked to work around some issues, such as a multithreading related crash.

function offerFileAsDownload(filename, mime) {
  mime = mime || "application/octet-stream";

  let content = Module.FS.readFile(filename);
  console.log(`Offering download of "${filename}", with ${content.length} bytes...`);

  var a = document.createElement('a');
  a.download = filename;
  a.href = URL.createObjectURL(new Blob([content], {type: mime}));
  a.style.display = 'none';

  document.body.appendChild(a);
  a.click();
  setTimeout(() => {
    document.body.removeChild(a);
    URL.revokeObjectURL(a.href);
  }, 2000);
}

if (typeof window != "undefined") {
    window.offerFileAsDownload = offerFileAsDownload
}

What happens here that’s interesting is that while running in a browser, your code does not have access to the file system. So we can’t save our png to disk, nor can we read our shaders from disk. What emscripten does to work around this is use a virtual filesystem. Besides including this javascript file, there are three options on the main CMakeLists related to this:

 

-sEXTRA_EXPORTED_RUNTIME_METHODS=['FS']
--bind
-sUSE_LIBPNG=1

The export is so that we can use functions related to emscripten FileSystem subsystem externally, like calling it from the javascript side. The –bind refers to enabling Embind, an API to connect Java and C++, enabling you to call one from the other. In our case, we want to call a Javascript function (this download function) from the C++ side. And, finally, -sUSE_LIBPNG=1 is, I hope, self explanatory. It gets the port, downloads and links it to the project.

Moving on, there IS also a –preload-file option that is relevant. You pass a file or folder as an argument here and it gets packaged in a way your program can access it. But you MAY also need to make sure you have –use-preload-plugins, which I found out when Warp Drive just couldn’t load textures at all. The rest of the file is just setting up some defines we’ll use and making sure all sources are there and that helper files are copied to the build folder where they’re needed… Once you hit build your build folder looks a bit like this.

The selected files are the actual output from emscripten. 

  • rmWebGL.data
    • This is your preloaded files
  • rmWebGL.html
    • This is the generated HTML file, if you asked for one
  • rmWebGL.js
    • This is the javascript that has your additional code and loads your web assembly code
  • rmWebGL.wasm
    • Hmmmm…. no idea what this could be
  • rmWebGL.worker.js (not present in THIS one)
    • If you are using threads, this is the JS that gets run by the spawned JS workers

 

I DO recommend you always generate the html. There is a nice looking shell file and a minimal one on emscripten’s source files. You can use the minimal one and having it helps you know what you need to code and/or extract from it to put in your webpage. This is how I integrated my emscripten generated stuff into a “real” page here in The Great Refactoring, rather than linking to the emscripen one.

So… if that’s all the build stuff and MOST of the C++ is the same, what is different?

When source files have identity crises

Back in chapter 8, I went through how in emscripten you need to use their hooks and callbacks to run your application loop, so I’m not going to go through THAT bit again. What I am going to show this time is how this changed the exporting to a PNG side. The very first thing is that, since we can’t just save to disk, instead I have to offer the file for the user to download. This is what the function in the download.js file does. But this is how we get to it:

//webGL/main.cpp

void writePNG(void*)
{
    if (exportWriter == nullptr)
    {
        return;
    }
    else
    {
        #if defined (__EMSCRIPTEN__) && defined(TOYBROT_ENABLE_PNG)
            exportWriter->Write();
        #endif
    }
}

.
.
.

void mainLoop(void* args)
{
    //This is what gets set up as mainLoop throughe emscripten
    if(mainWindow == nullptr)
    {
        std::cout << "no window" <<std::endl;
        return;
    }
    struct context *ctx = reinterpret_cast<context*>(args);
    #ifndef __EMSCRIPTEN__
    while(!*(ctx->exit))
    {
    #endif
        if(*(ctx->redraw))
        {
            generate(ctx);
            mainWindow->paint();

            #ifdef TOYBROT_ENABLE_PNG
            if(pngExport)
            {
                #ifdef __EMSCRIPTEN__
                    emscripten_push_main_loop_blocker(&writePNG, nullptr);
                #else
                    exportWriter->Write(nullptr);
                #endif
            }
            #endif
        }
    mainWindow->paint();

    #ifndef __EMSCRIPTEN__
    }
    #else
    if(*(ctx->exit))
    {
        //emscripten_push_main_loop_blocker(&cleanup, nullptr);
    }
    #endif
}

Here we start to see some of those preprocessor shenanigans in play and we also see a new function, emscripten_push_main_loop_blocker. What this function does is run the function you pass to it and pause the main loop until it returns. This is important here because we don’t want the program to exit or the image data to get mangled before we can finish saving the png and offering the download. On the pngWriter side…

//common/pngWriter.cpp

#include "pngWriter.hpp"

#include <string>
#include <cstdio>
#include <iostream>
#include <fstream>
#include <thread>
#include <future>
#include <limits>

#ifdef __EMSCRIPTEN__
    #include <emscripten.h>
#endif

.
.
.

bool pngWriter::Write()
{
    if(fractal == nullptr)
    {
        return false;
    }
    // Write image data

    auto convertPixel = [&](size_t idx)
            {
                #ifndef TB_SINGLETHREAD
                    size_t numTasks = std::thread::hardware_concurrency();
                #else
                    constexpr const size_t numTasks = 1;
                #endif
                for(size_t i = idx; i < this->cam->ScreenWidth()*this->cam->ScreenHeight(); i+= numTasks)
                {
                    size_t row = i / cam->ScreenWidth();
                    size_t col = i % cam->ScreenWidth();
                    png_save_uint_16( reinterpret_cast<png_bytep>(&outData[row][col].r)
                                    , (*fractal)[row*cam->ScreenWidth() + col].R() 
                                            * std::numeric_limits<uint16_t>::max());
                    png_save_uint_16(reinterpret_cast<png_bytep>(&outData[row][col].g)
                                    , (*fractal)[row*cam->ScreenWidth() + col].G()
                                            * std::numeric_limits<uint16_t>::max());
                    png_save_uint_16(reinterpret_cast<png_bytep>(&outData[row][col].b)
                                    , (*fractal)[row*cam->ScreenWidth() + col].B()
                                            * std::numeric_limits<uint16_t>::max());
                    png_save_uint_16(reinterpret_cast<png_bytep>(&outData[row][col].a)
                                    , (*fractal)[row*cam->ScreenWidth() + col].A()
                                            * std::numeric_limits<uint16_t>::max());
                }
            };
    #ifndef TB_SINGLETHREAD
        std::vector<std::future<void>> tasks (std::thread::hardware_concurrency());
        // "classic" for loop because we need to know the idx here
        for(size_t idx = 0; idx < std::thread::hardware_concurrency(); idx++)
        {
            tasks[idx] = std::async(convertPixel, idx);
        }
        for( auto& t : tasks)
        {
            t.get();
        }
    #else

        convertPixel(0);

    #endif
    return Write(outData);
}

bool pngWriter::Write(const pngData2d& rows)
{
    // Write image data
    for (auto row : rows)
    {
       png_write_row(writePtr, reinterpret_cast<png_const_bytep>(row.data()) );
    }

    // End write
    png_write_end(writePtr, nullptr);
    //check for error here
    fclose(output);

    #ifdef __EMSCRIPTEN__
        std::string callScript ("window.offerFileAsDownload(\"" + filename +"\", \"image/png\")");
        emscripten_run_script(callScript.c_str());
    #else
        std::cout << "Wrote "<< filename << std::endl;
    #endif
    return true;
}

So here we see two things. First in the Write() function with no parameters has some shenanigans to work around the possibility of running either multi or single threaded.  Second, in the other function we get the invocation of the download function.  This is done through the emscripten_run_script function, which, as one would expect, runs a bit of inlined javascript. In our case, calling the windows.offerFileAsDownload function. For something this limited, it’s pretty simple.

Other than this, the other place where I have to do a bunch of code switching is in the window code, because it MAY need to do openGL manually instead of falling back to whatever SDL does and it MIGHT need to contend with single vs multithreaded, for the same reason as the pngWriter, I do image conversion multithreaded (because why not?).

There IS one thing of note though, when we’re creating the window:

FracGenWindow::FracGenWindow( CameraPtr c
                            , std::string& flavourDesc
                            , std::shared_ptr<bool> redrawFlag
                            , std::shared_ptr<bool> exitFlag)
    : cam{c}
    , colourDepth{32}
    , drawRect{false}
    , HL{0,0,0,0}
    , mouseX{0}
    , mouseY{0}
    , redrawRequired{redrawFlag}
    , exitNow{exitFlag}
{


    SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS);

    //#ifdef HAVE_OPENGLES
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_ES);
    //#endif

    SDL_GL_SetAttribute( SDL_GL_DOUBLEBUFFER, 1 );
    SDL_GL_SetAttribute( SDL_GL_ACCELERATED_VISUAL, 1 );

    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 3);
#ifdef TB_WEBGL_COMPAT
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
#else
    //We want 3.1 ES for compute shaders in the regular OpenGL project
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 1);
#endif

    std::string title = flavourDesc + " Toybrot Mandelbox";
    mainwindow = SDL_CreateWindow(title.c_str(),
                              SDL_WINDOWPOS_UNDEFINED,
                              SDL_WINDOWPOS_UNDEFINED,
                              cam->ScreenWidth(), cam->ScreenHeight(),
                              SDL_WINDOW_SHOWN | SDL_WINDOW_OPENGL);
#ifdef TB_OPENGL

    #ifdef TB_WEBGL_COMPAT
        ubyteTex.resize(cam->ScreenWidth()*cam->ScreenHeight());
    #endif
    ctx = nullptr;
    initGL();
#else

    render = SDL_CreateRenderer(mainwindow, -1, 0);
    //SDL_RenderClear(render);

    screen = SurfPtr(SDL_CreateRGBSurface(  0, cam->ScreenWidth(), cam->ScreenHeight(), colourDepth,
                                            0xFF000000,
                                            0x00FF0000,
                                            0x0000FF00,
                                            0x000000FF));

    texture = SDL_CreateTexture(    render,
                                    SDL_PIXELFORMAT_RGBA8888,
                                    SDL_TEXTUREACCESS_STREAMING,
                                    cam->ScreenWidth(), cam->ScreenHeight());

    surface = SurfPtr(SDL_CreateRGBSurface(0, cam->ScreenWidth(), cam->ScreenHeight(), colourDepth,
                                        0xFF000000,
                                        0x00FF0000,
                                        0x0000FF00,
                                        0x000000FF));

    highlight = SurfUnq(SDL_CreateRGBSurface(0,cam->ScreenWidth(), cam->ScreenHeight(),colourDepth,
                                             0xFF000000,
                                             0x00FF0000,
                                             0x0000FF00,
                                             0x000000FF));

    SDL_SetSurfaceBlendMode(highlight.get(), SDL_BLENDMODE_BLEND);
    void* pix = highlight->pixels;
    for(int i = 0; i < surface->h; i++)
    {
        for(int j = 0; j< surface->w; j++)
        {

           auto p = reinterpret_cast<uint32_t*>(pix) +
                    (i * highlight->w)
                    + j;
            *p = SDL_MapRGB(surface->format, 255u, 255u, 255u);
        }
    }
    SDL_SetSurfaceAlphaMod(highlight.get(), 128u);
#endif
}

This is where we need to make sure we’re asking for “the right type” of OpenGL context. And SDL helps us here, if we’re deploying to WebGL, we need OpenGL ES 3.0, so we can target WebGL2, but if not (and this file is common to all toyBrot implementations) then we want at least ES 3.1 because of compute shaders.

Here we can also see that if we’re not going to use OpenGL ourselves, I am just initialising a bunch of SDL constructs and having SDL’s internals do the work, but if we are, as I mentioned in chapter 9, then this doesn’t work and we need to do the drawing ourselves (I know because I tried real hard to work around this). So instead we call a separate initGL function. What this function does is initialising the buffers for a couple of triangles which’ll cover the screen and defining and compiling the minimal shaders to draw a texture on them… You can check the entire file here if you want but it’s not particularly interesting.

This SDL particularity also means that I need to account for different execution paths for most things with hilarious and depressing results from this. Some of this I have brought unto myself for really making this too general, there’s not A LOT of reason why I couldn’t just assume it’s OpenGL rendering and be done with it; but some of these might come into play. For example, when I send a fractal to be drawn, the image data needs to be copied and converted to the adequate format which, here, is actually different depending on what’s happening underneath:

void FracGenWindow::updateFractal()
{

    if(fractal == nullptr)
    {
        return;
    }
    if(fractal->size() != static_cast<size_t>(cam->ScreenWidth() * cam->ScreenHeight()) )
    {
        std::cout << "Fractal and ScreenSize mismatch!" << std::endl;
        exit(2);
    }
#ifdef TB_OPENGL
    #ifdef TB_WEBGL_COMPAT
        #ifndef TB_SINGLETHREAD
            size_t num_threads = std::min(8ul, toyBrot::MaxThreads > 0 ? toyBrot::MaxThreads 
                                                                       : std::thread::hardware_concurrency() );
            std::vector<std::thread> tasks (num_threads);
            // "classic" for loop because we need to know the idx here
            for(size_t idx = 0; idx < num_threads; idx++)
            {
                tasks[idx] = std::thread([this, idx, num_threads](){this->convertPixels(num_threads, idx);});
            }


            //Wait here and block until all threads are done

            for( auto& t : tasks)
            {
                if(t.joinable())
                {
                    t.join();
                }
            }

        #else
            convertPixels(1, 0);
        #endif
    #endif

    glActiveTexture(GL_TEXTURE0);
    glBindTexture(GL_TEXTURE_2D, glTex);
#ifdef TB_WEBGL_COMPAT
    glTexImage2D( GL_TEXTURE_2D, 0, GL_RGBA8
                , cam->ScreenWidth(), cam->ScreenHeight()
                , 0, GL_RGBA, GL_UNSIGNED_BYTE, ubyteTex.data());
#else
    glTexImage2D( GL_TEXTURE_2D, 0, GL_RGBA32F
                , cam->ScreenWidth(), cam->ScreenHeight()
                , 0, GL_RGBA, GL_FLOAT, fractal->data());
#endif
    checkGlError("updateFractal");

#else


    /*
     * multi-threading the conversion to the SDL type
     * using similar logic/mechanisms to the STD::THREADS
     * version of the generation itself
     */
    size_t num_threads = std::min(8ul, toyBrot::MaxThreads > 0 ? toyBrot::MaxThreads 
                                                               : std::thread::hardware_concurrency() );
    std::vector<std::thread> tasks (num_threads);
    SDL_LockSurface(surface.get());
    // "classic" for loop because we need to know the idx here
    for(size_t idx = 0; idx < num_threads; idx++)
    {
        tasks[idx] = std::thread([this, idx, num_threads](){this->convertPixels(num_threads, idx);});
    }


    //Wait here and block until all threads are done

    for( auto& t : tasks)
    {
        if(t.joinable())
        {
            t.join();
        }
    }

    SDL_UnlockSurface(surface.get());
#endif
}

void FracGenWindow::convertPixels(size_t total_threads, size_t idx)
{
#if !defined(TB_OPENGL) || defined(TB_WEBGL_COMPAT)
    size_t surfLength = cam->ScreenWidth() * cam->ScreenHeight();
    for(size_t i = idx; i < surfLength; i += total_threads)
    {
        Vec4<uint8_t> colour8( static_cast<uint8_t>( (*fractal)[i].R() * std::numeric_limits<uint8_t>::max()),
                               static_cast<uint8_t>( (*fractal)[i].G() * std::numeric_limits<uint8_t>::max()),
                               static_cast<uint8_t>( (*fractal)[i].B() * std::numeric_limits<uint8_t>::max()),
                               static_cast<uint8_t>( (*fractal)[i].A() * std::numeric_limits<uint8_t>::max()) );

    #ifdef TB_OPENGL
        ubyteTex[i] = *reinterpret_cast<std::array<uint8_t,4>*>(&colour8);
    #else
        reinterpret_cast<uint32_t*>(surface->pixels)[i] = SDL_MapRGBA( surface->format
                                                                     , colour8.R()
                                                                     , colour8.G()
                                                                     , colour8.B()
                                                                     , colour8.A());
    #endif
    }    
#endif
}

As you can see, this can get a bit silly. “If it’s SDL, then call the SDL function, but if it’s not, then do it manually but make sure to check if we are spawning one or more threads for conversion and also if it’s for WebGL we are using a different texture format so make sure you’re converting to the correct format as well….”

 

A more focused project is unlikely to run into a situation this bad but be aware that some for of this might need to happen depending on the different situations you’re handling

If you’re in a phone, please read the next section BEFORE clicking this link

Juggle all this things so that they fall in the correct place and, there you go: Compute on your GPU through WebGL in a browser 

The Verdict: Is this hassle even worth it?

So, same as with the CPU-based implementation, we need to get some new numbers for our GPU implementation because of different dimensions so…. let’s get a baseline:

All right, now, for some browser numbers:

So these numbers are, in general, a bit slower than the desktop numbers but they’re really good and a MASSIVE improvement over the 10-12 seconds on the CPU implementation. I once again had troubles with Konqueror though, this time it did run the calculation and got a similar result but did not actually display the fractal for whatever reason, unfortunately.

On the phone though, situation was a bit more disappointing. I have three browsers on my phone, Chrome, Edge and Firefox. Now, both Chrome and Edge (the other Chrome) not only didn’t work, they crashed the entire phone. Firefox, on the other hand did work and the time, though not looking super impressive at first, is a great improvement compared to the almost 2 whole minutes the CPU version took on Chrome:

So on the desktop, this whole trouble is definitely worth it, but on mobile you’re probably better off (I mean, you’re always BETTER off doing so) making an actual application where you can make use of something like OpenCL. If Chrome and Chrome Jr had worked I’d say this would still be a win but… welcome to a developer’s nightmare. You don’t know what the user is running, what sort of hardware it is, what browser they’re using, at which version and any part of that chain could have some fatal flaw that breaks your code. And even though I am very surprised that my code broke Chrome spectacularly enough that it crashed my phone, the risk of just having a more normal “well, doesn’t run” or tries to run and the page hangs or whatnot… is a real one.

Despite all that, it’s impossible not to circle back to “being able to deliver heterogeneous compute software through a webpage is bonkers”. This is worlds ahead any conception of a browser like 20 years ago, and this is something to be celebrated in its own, I believe.

Like with “regular” emscripten, you run into a bunch of problems not directly related to your code, or maybe if they are you can’t tell properly, given the amount of moving parts or the fact that when you try to run it just resets your phone so you can’t really debug.

This kind of power, despite all the hoops you have to jump through, is really something though and worth considering. I personally am very happy with the kind of numbers… almost every mainstream browser I had at hand got. For me this is still a good tool to have under my belt and once webGPU is a bit more mature and I can build for and test it I’m 100% revisiting this, because just having compute shaders would get rid of  A LOT of the pain, even a hypothetical new version of WebGL 2.1 that had those would already be a massive improvement. IF  browsers implement them.

It’s hard to make a hard recommendation either way because of all of this. It’s a lot of pain to code, might not give you QUITE as much reach as you want, but you already get so much, and you can wrangle a lot of power from whatever you code. I know I’m using WebGL in the context of my Game Engine, Warp Drive and while there isn’t any YET, knowing myself I’m pretty sure I’ll eventually try to sneak some compute in. So I definitely think there are some projects where this is a valid option. It falls on you to consider if your project is one because it’s a tricky road to travel.

A final thing to consider is that Emscripten DOES support WebGL 1, which I did not at all touch. This COULD be relevant, for example, if you’re not willing to sacrifice iOS Safari and call it a day

What's next?

So, in terms of Multi Your Threading there is nothing on the immediate horizon though there ARE some things I would like to revisit and check out, like Intel’s SYCL stuff which I found out the morning of writing this, can be built to run on nVidia GPUs so… hmmm. I’ve also toyed with the idea of making an actual interactive toyBrot, more like a real application proof of concept, perhaps using Qt for WebAssembly. I like this idea as well but for now this is lower priority to me.

Instead, I’ve picked up some Warp Drive development again and there are a couple things I want to try and make with it. This interest was sparked in great part by emscripten. With me already having a web server and having the possibility of just deploying things through that, it’s quite exciting. While I was coding it though (more accurately, while I was coding Warp Drive instead of writing for this blog which is what I should’ve been doing instead) I did run into an OpenGL problem I somehow hadn’t bumped into. OpenGL really hates multithreading, which Warp Drive uses (though at a basic level). So I had to work around that and I DID cobble a solution together which I find quite interesting, so this is what the next post here is going to be about.

The code for that is done, tbh. Kind of like I only started writing about emscripten after having both the CPU and the GPU implementation done, I also kind of ran away from blogging by playing with Warp Drive instead so at the time I was writing the previous post, the code for this post and the next one was already done and, right now, I have forbidden myself from coding until I get all the writing done, even if I’m spacing the posts a little bit So, onward we go, see you next mission!

×

Contents