Multi your Threading #8: One doesn't simply WASM into Mordor

First things first, the same old warning, copypasted even


This is chapter 8 in a series I’ve been doing on multithreading. Each chapter (generally) is based on me reimplementing the same code using different languages/technologies/models and talking about them both in isolation as well as in comparison to one another. While it’s not necessary that you’ve been following, I do assume you’re familiar with parallelisation in general and I’ll make the occasional references to other implementations. You can find them here if you’d like to check them out:


  1. C++ STL Threads and Tasks
  2. OpenMP and MPI
  3. OpenCL and CUDA
  4. AMD HIP and HC C++
  5. SYCL and Vulkan
  6. Algorithm overhaul with updated benchmarks for previous implementations
  7. Intel TBB and ISPC

Also: All the code here is from ToyBrot. If you prefer to see the whole picture or browse on your own as you read, feel free to clone the repo and open it up on your favourite code editor (maybe on a second screen if you have one). I personally am a big fan of Atom

So, looking at all those links… toyBrot has come a long way. Lots of different technology to explore and play with. If you have the development SDKs for them and can build software yourself. If you’re a regular old developer though, you know more and more stuff is just being moved to the browser, that cursed land of pain and sorrow. And even if you wanted to be part of the fun, then you have to start using some language you’re full of discrimination against…. or would you?

Intermediary Representations are pretty powerful, huh?

So before we move on, I need to know that you, as a reader are clear on a few concepts. Most importantly is what is the difference between compiled and interpreted languages, as well as what is an Intermediary Representation, and how it relates to stuff like C++ and Assembly.

I had a section here about this but… if you’ve read my other posts you know I can go on a bit, and so I did. So I’ve moved all of that stuff to a bonus post. If you are familiar with these things, feel free to skip it. If you’re not, you may want to have a check before you move forward. I’ll leave a link here just in case. It’s okay, I’ll wait for you

One day we'll all be as cool as Hackerman. Until then, we can dream

Now that we’re all on the same page, let’s go!


A while back I remember hearing some stuff about a thing called WebAssembly. WASM is what it sounds like, it’s an Intermediary Representation for browser engines. The existence of this opens up some possibilities.

I mean, if there is an IR you can target, you can compile “real code” to it. And… you know… if you’ve read that bonus post where I talk about how every modern compiler is basically just Clang because the LLVM IR is so powerful… well… what if Clang but for WebAssembly? Could that be a thing? This is obviously rhetorical, you know it’s not only real but the reason I’m here. Let’s take a look at Emscripten

Knee deep in the web with Emscripten

So “Emscripten is a complete compiler toolchain to WebAssembly, using LLVM, with a special focus on speed, size, and the Web platform.” In a way I’m even quite a bit late to this train, a quick google and you’ll find a bunch of demos with it. The idea is exciting. The dream is that you could acquire it, then just build your code with THEIR Clang (because of course it’s a Clang) and just load it from the web. Does it achieve that dream? Kind of, actually.

The minimum scenario for Emscripten is quite trivial, for a better look I recommend you checking out the tutorial series on John Sharp’s blog. I’ve linked the “hello world” stage here. For a minimal project, you don’t even need to touch your code, just call the emscripten toolchain and let it take you away. But that’s for a minimal project.

I’m not, at least in this post, going to go about too much in detail about how to set up emscripten. But for the gist, you acquire the SDK, then you can from it, request a version of the toolchain, “activate it” and there’s a handy script for environment variables.


With that set up, you not only have the compiler itself emcc/em++, but also wrappers for configure, make and cmake! In my case I even skipped a step because I just got whatever emscripten is on Arch Linux’s repos, since I don’t need to worry about specific versions at this point.

And that gets you a long way. I think before I even get into my pain points (and there was plenty of pain), I feel it’s important to reinforce how well thought out a lot of this is. EMCC itself is a helpful compiler in many ways, particularly in how it handles compile flags. It’ll do the clang thing where even if you jumble up the link and compile flags it’ll discard the wrong ones at each step, no biggie… Honestly great experience there.

Something that I also was hoping would be in my favour is that my favourite IDE remains Qt Creator and Qt HAS a “Qt for WebAssembly” thing. it has its own intricacies and whatnot but, it means that Qt Creator has SOME notion of a compiler for WebAssembly and I HAVE used that notion to facilitate me setting up a “System Emscripten” kit on mine. What WAS not automatic was that because of the way Qt Creator handles child processes, I could NOT get it to run the wrappers properly. What I had to do instead was unwrap them and set up a couple of CMake variables myself for the kit as well as a whole bunch of necessary environment variables for the kit. Not counting the stuff on the CMakeLists itself. If you want to have a go at it, here’s what I had to add:

# Environment

# Cmake Configuration


If you’re using a regular Emscripten install you are likely going to have to tweak all the paths but this SHOULD get your compiler running (famous last words).

Finally, you’re also going to need an actual web server (kind of) to test your code. If you’re just running your webpages locally some of the resource requests get jumbled up. Emscripten does provide an “emrun” script to help with this. The Great Refactoring itself runs in a separate computer in my house on top of Apache so I was set for deployment. For dev though, I decided to install and run nginx on my dev machine. It’s much much easier to configure and get going than Apache, at the very least for the kind of minimal setup I needed just to serve localhost/ pages. A lot of people have used the python webserver too, it’s up to you what works. I can recommend the nginx route if you don’t have anything else serving web on your dev machine. It was quite trivial to get up.

With that out of the way and compiling some Hello World, it was time to get into the meat of it and start facing the real issues, of which there were quite a few.

Fitting in on a webpage

As straightforward as your hello world stuff can be, once you’re building anything that has any complexity, you start running into some real world type problems arising from the fact your program now runs within the confines of a browser’s WASM and Javascript engines. This also means that for a lot of stuff, your code needs to talk to those entities. Emscripten helps you here in a variety of ways.

Issue number one is that your browser needs to be able to control your application’s execution so you can’t just do a regular infinite loop as you’d do for a reactive application. To that end, emscripten provides a wrapper function that will hook your main loop to the browser’s runtime. If you had some regular code, the first thing you need to do is that. If your code can be OPTIONALLY built with emscripten (which might be the case even if you’re just targeting desktop for development), this also means you need to do some preprocessor and build system fiddling. For toyBrot, this is how this happens:

# CMakeLists.txt


    set(TB_WASM TRUE)

# The regular checks for compilers and libraries



    message("emscripten detected, building for Web Assembly")
    message("(this is a different set of projects)")

    # Please see common/toybrot.html for info on this option
    # you probably don't need to worry about it

    # Emscripten requires some specific tweaks

    set(CMAKE_C_FLAGS_DEBUG "-g4 -sDEMANGLE_SUPPORT=1 -sSAFE_HEAP=1 --source-map-base http://localhost/")
    set(CMAKE_CXX_FLAGS_DEBUG "-g4 -sDEMANGLE_SUPPORT=1 -sSAFE_HEAP=1 --source-map-base http://localhost/")


        message("Web Assembly requires a predefined number of threads")
        message("Defaulting to 120")
        set(TB_NUM_THREADS "120" CACHE STRING "Number of threads to use (0 for auto)" FORCE)

    # PROXY_TO_PTHREAD should(?) be used but getting "screen is undefined"
                        -sALLOW_BLOCKING_ON_MAIN_THREAD=1 )
                        -sASSERTIONS=1 )

    set(TB_EMS_ROOT "/usr/lib" CACHE PATH "Path to your Emscripten installation")
    set(TB_EMS_PORTS_INCLUDE "$ENV{HOME}/.emscripten_cache/wasm/include" 
                            CACHE PATH "Path to your Emscripten ports cache installation")
    # Emscripten should find these but your IDE might not, so this is convenience
    # I know Qt Creator doesn't, at the very least
    set(TB_EMS_INCLUDES "${TB_EMS_ROOT}/emscripten/system/include/libc"
                        "${TB_EMS_PORTS_INCLUDE}" )


Straight away, from the options and the comments you can have a glimpse at some of the problems and challenges involving building with emscripten. We’ll come back to them later. For now, let’s see come C++


struct context
    bool benching;
    bool clockReset;
    std::stringstream* stream;
    std::shared_ptr<bool> redraw;
    std::shared_ptr<bool> exit;
    pngWriter* exportWriter = nullptr;
    FracGen* cpuFrac = nullptr;


void mainLoop(void* args)
    if(mainWindow == nullptr)
        std::cout << "no window" <<std::endl;
    struct context *ctx = reinterpret_cast<context*>(args);
    #ifndef __EMSCRIPTEN__
            //Do the thing
            #ifdef TOYBROT_ENABLE_PNG
                #ifdef __EMSCRIPTEN__
                    emscripten_push_main_loop_blocker(&writePNG, nullptr);


    #ifndef __EMSCRIPTEN__

int runProgram(bool benching) noexcept
    context ctx

    if(mainWindow != nullptr)
        #ifndef __EMSCRIPTEN__
            std::thread eventCapture(
                [&mainWindow, exit, benching]()
                    while( !*exit )
           std::cout << "Generating fractal" << std::endl;
           emscripten_set_main_loop_arg(mainLoop, &ctx, 0, 1);

        #ifndef __EMSCRIPTEN__


How do you like #ifdefs ?

So, there’s not really a lot of avoiding this stuff and this is not even the worse one (spoiler warning, that’s going to come up in Multi Your Threading #10). As annoying as this is, though, this is a very localised change. The implementation I chose to build with emscripten was STD::Threads. Now, because I was unsure of the changes I’d need to do in both the C++ and the cmake side, I’ve elected to split the code here, but really, it could be safely merged, mostly the CMake would be a bit scary, I guess. Since then, for example, I’ve adapted my toy game engine, Warp Drive to support building with emscripten. It doesn’t do the whole dependency juggling this project does but it IS a much more massive one, quick select and right click gives me 136 headers and sources. And it was super easy to do, especially having the experience from toyBrot. Checking the commit, code changes themselves were in 8 sources/headers from the engine, most having to do with OpenGL changes, not emscripten itself. 

I’ve opened up Meld and compared the actual FracGen.cpp for the desktop and emscripten versions of STD::Threads and…

This is it, the only things that DID change is that now I am accounting for a thread limit that is communicated through a compile define. This is great news, because it means that if you’re writing or converting something so that it builds for WASM, chances are your application’s logical core is likely to not suffer, at least not a lot. At the very least if it is self-contained like toyBrot. If your application needs to open files, sockets or talk to the external environment, you’re going to likely need some more help. I’ll touch a bit more on this, again, on Chapter 10, where it’s emscripten once more and I DO need to open files.

Something I’ve skipped over is the display. I’ve used SDL2 for toyBrot’s output, which I know and enjoy from my experience with Warp Drive, actually, and… it just works, it’s magical.

Since you need to build with all these new restrictions AND for a different platform, dependency management becomes a bit more complicated but emscripten actually provides a bunch. On my compile options I have

                     -sUSE_LIBPNG=1 )

These refer to the emscripten ports system where the emscripten project provides source code for some libraries which has been tweaked to integrate in a browser environemnt through emscripten and all of the things they use. Lucky for me (and for you if you ever want to play around with emscripten), SDL is there. The code for the main display still uses an SDL_CreateWindow call, just the same. These are a godsend and it can take a lot of work out of your hands. You can also ask EMCC what ports are available and what are the flags for each. Right now, what I get is

$ emcc --show-ports 
Available ports:
   Boost headers v1.70.0 (USE_BOOST_HEADERS=1; Boost license)
   zlib (USE_ZLIB=1; zlib license)
   freetype (USE_FREETYPE=1; freetype license)
   SDL2 (USE_SDL=2; zlib license)
   SDL2_mixer (USE_SDL_MIXER=2; zlib license)
   SDL2_ttf (USE_SDL_TTF=2; zlib license)
   bzip2 (USE_BZIP2=1; BSD license)
   icu (USE_ICU=1; Unicode License)
   SDL2_gfx (zlib license)
   libpng (USE_LIBPNG=1; zlib license)
   mpg123 (USE_MPG123=1; zlib license)
   SDL2_image (USE_SDL_IMAGE=2; zlib license)
   ogg (USE_OGG=1; zlib license)
   vorbis (USE_VORBIS=1; zlib license)
   regal (USE_REGAL=1; Regal license)
   harfbuzz (USE_HARFBUZZ=1; MIT license)
   libjpeg (USE_LIBJPEG=1; BSD license)
   bullet (USE_BULLET=1; zlib license)
  SDL2_net (zlib license)

So a bunch of helpful stuff. Anything else you need, you’ll have to add to your build. With all this set, we should be good to go, right? Well…

Finally, something you very likely REALLY want is that line setting the CMAKE_EXECUTABLE_SUFFIX. Normally emscripten outputs your WASM compiled code and some Javascript that loads it. But if you want a quick demo you can “run”, you probably want it to output the full html page, which is what that change does.

Browsers are not your friends

I would like to offer a salute and a commiseration beer at this moment to every web developer in the world. You deserve it.

Browsers suck and the internet is held together with duck tape and prayers. It’s only when you start dealing with it that you realise just how complicated things can be for “””no reason”””. Really part of it is that as more more things get moved to be web apps, we start demanding more and more from browsers and web standards. Let’s cool down for a bit and think of what I’m attempting

I want to compile a C++ program that uses multithreading to calculate a fractal through raymarching. The way I want to do that is to compile it to an IR that I can serve on the web so people can run this through A browser on whatever their machine is.

Remember Java applets? Being here right now is our punishment for letting Microsoft kill them because it was jealous of Java Enterprise and having Oracle buy Sun instead of just having it be actual open source as it should always have been. Because this is exactly the problem they were trying to solve. And this problem is much more complicated than the stuff browsers were initially made for. What this means for us is that we bump into a lot of limitations and things that need to be circumvented

You may have picked up on a couple things on the CMakeLists. First that I am specifying a thread pool size there for emscripten. This is required otherwise your software can have a bad time spawning and joining threads. This was also part of why I chose the std::threads implementation instead of my favoured task-based one. I had some weird behaviour when I was trying that route. Maybe I’ll come back to it at some point but for now, my recommendation is that if you’re going to spawn a lot of them, use threads. Otherwise, I am still oversubscribing as usual and that still works.

The second thing that you might have noticed is that I am going super overkill on the memory for an application that just generates a png and done. And this baffled me as well. I believe this is related to how the underlying Javascript workers get spawned from the compiled code but the program kept crashing until I pushed it. For now I have just “made it big enough so it shuts up and called it a day” but this would not be a production-ready approach to this and maybe it’s a thing I’ll revisit once I have some time.

So now we’re good right? Well, so, we’re still not good, for two more reasons, both related to the fact that we want to multithread in a browser. The first one is this:

A table with the implementation status of several Web Assembly features in different browsers

That table is from and look, if it’s not Apple making everyone’s life worse yet again. I only found this today writing this, I’m very lucky in that the only Apple device I have in my house runs OpenSUSE instead because Apple wants me to throw it in the trash. So I never had to deal with the fact that Safari doesn’t implement WASM threads at all. But again, if you have some ACTUAL software you need to deploy this is a problem because people still insist on the mistake of buying Apple when their hatred for their consumers is greater than that of game publishers. (Edge is not in the table but Edge is just Chromium)

So we have the opportunity to ignore Apple, this is great news for everyone right? Well, kind of… because Web Assembly threading is currently restricted in Firefox and all mobile browsers as part of Spectre mitigations. Websites arbitrarily running WebAssembly on your machine through browsers is a dangerous access door to your machine when you have this low level control. Even without spectre, you can easily have some crypto mining WASM sneak in. So if you want to run Web Assembly stuff you need two things:

The first thing is that you need to setup your web server with some increased security policies. This makes the server much more strict with content it’s serving from different sources. I had to do this for The Great Refactoring and it immediately refused loading all of my gitlab snippets, which is how I used to embed code in the page. So to write this post I had to rework all previous ones to replace the missing elements with the new code display widget I had to look for. The blog looks much better for it, for sure, especially since it prompted me to actually configure the CSS stuff proper, but it was very much not a drag and drop thing.

The second thing is that you MIGHT need to enable experimental features on your browser, and they might not be present altogether. I’ve just tried running toyBrot on my phone right now, Chrome works because I have set up an experimental flag whereas and both Firefox and Edge straight up refused to run it. Edge doesn’t even have the flag because it’s an older Chrome. So if you want to build with emscripten, multithreading is quite dodgy, at least right now.

If you’re on the desktop, though, you can have a look and see it DOES work, which is already magical

Integrating it into the blog also had its own challenges and required me to work backwards from emscripten’s provided “shell file” which is a html skeleton. The compiled script (and other files) rely on a separate script you need to integrate in your page which defines a “Module” object which the compiled binary refers to for things like were to draw, where to look for files, how to output….

The Verdict: Should we even bother?

So, I can’t really compare my current results with the stuff I had before for two reasons. The first is that to make it a bit more browser friendly, this one generates a 1024×1024 fractal, compared to the 1820×980 of the regular version. The second reason is that I’ve overclocked my CPU a while back (honestly, it’s really disappointing that it was running stock for so long because I didn’t want to tweak after a couple crashes showed by previous OC to be not quite 100% stable). 

So let’s get some numbers we can use. I also made sure to recompile toyBrot WITHOUT the -march=znver1 flag. Currently all the optimisation on the emscripten side is -O3 and I think this is fairer

All right, we’ve got our baseline. Let’s run some WASM


Chrome actually beat the native build. So did Konqueror, by the way, but I couldn’t put it on the screenshot because it really didn’t like the code I put in to offer the download of the generated image. This is insane. It’s also not what I expected from my initial tests, so I’m guessing those initial impressions had been skewed by either dev tools being open or debug builds (hard to tell at this point).

As much as toyBrot is a round cow in perfect vacuum type of software, getting performance that’s comparable to native on a browser is bonkers. But here we can also see that your mileage may vary depending on your browser. When you’re dealing with this kind of technology, this is always a problem. The browser is a massive moving part in your system, and one you have little control over. Additionally, you have no idea what is this browser even running on. Could be a smart TV for all you know. I mentioned this works on my phone if I’m running Chrome and these are the sort of numbers I get on a Huawei Nova 3i. Not as exciting as a 1920X, but it does run

With the caveat of the threading support being somewhat in a weird limbo right now, this is really exciting. You CAN port your applications to the web and, though with some finicky bits to get through, have it running on a browser where it’s immediately accessible. This kind of reach is amazing if you’re coming from the C++ world of things and that Emscripten makes it possible is still mind boggling to me. It was a lot of effort to me getting through this. As someone who never had to deal with web development, every new corner I had to turn, I’d bump into a whole new set of things which not only I’d never seen, I wasn’t even sure I was understanding correctly. But having this knowledge now and having this up, I definitely feel it was worth, so much that, as I mentioned before, it made me make my game engine emscripten compatible too

A shiny prototype does not production make, though

Integrating your things properly in a “real webpage” is hard work. And I’ve only mentioned in passing but you’re going to have a bunch of additional work. Looking at this demo that I’ve linked here, right now, for example. This is still not ready to go out as “production code”. My integration into my wordpress page still has some stuff that I’ve lifted from the emscripten shell file and could do with being rethought for a less “tech demo” environment. I also added some functionality to download the generated PNG, same as you can save your file in regular toybrot. But in a more realistic scenario, I would use emscripten to listen to events from the page, so I could add a couple buttons to it not only to save the image but also to start running. Since I don’t have one of those, I can’t really put it into a webpage that’s not JUST THAT, as soon as it loads it’s going to freeze the page raymarching the fractal. 

Finally, a quick glance at my phone screenshot reveals that I never uncoupled the “texture” used to calculate the fractal and the one used to draw on the screen properly. Because of this, I just specify the size of the canvas toyBrot draws on and it’s always 1024×1024, it can’t respond to a smaller or bigger screen, probably looks kinda sad in a 4K screen (might check on my TV once the post is up).

A LOT of the difficulties in working with emscripten do not come from using emscripten. They come from the fact that you’re targeting browsers. And between the limitations of browsers and the whole web environment you’re buying into, there’s a whole new world of things that require your consideration and care. This to me is the greatest challenge when it comes to emscripten. It even includes the multithreading thing because it’s a browser-level limitation

You kept mentioning a Multi your Threading #10, what's that about?

Well, following the story of toyBrot, there is really only one logical place to go to now. We have a CPU implementation that works but… what about the GPU? Could you access your GPU from the web?

The short answer to that question is: “Yes, you can”. But to do that we can’t actually use any of the things we’ve used so far (to my knowledge). Can’t CUDA from the browser and WebCL never got any traction, which is a real shame. So this web train needs to make an additional stop before we can get there. We need to resort to OpenGL.

With this in mind, Multi Your Threading #9 will be about implementing toyBrot in OpenGL. And then #10 will be about taking that code, and the things we’ve looked at here and running arbitrary code on your GPU. It works, it can very much be done and the whole code side of things is good to go already, including the deployed wasm. But we’ll need to go through quite a few things to get there, so let’s take our time.