Coding shenanigans and other nerdy musings
Well, it’s been… a while… since my last Multi Your Threading entry. The last one was probably one of my favourite ones, if nothing else, just because of how powerful a tool I feel that ISPC is to have under my belt now. And really, until the day SYCL is omnipresent, TBB is also pretty great. For this one, though, we are breaching new territory, something I had until now mostly managed to avoid somehow in my professional life, but that is increasingly important. We’re taking toyBrot to the internets!
But before diving right into it, I want to make sure that you, as a reader are clear on a few different concepts. If you’re a C++ veteran this will probably be old news to you so you can feel free to skip it. Originally this was going to be just a section on the main Multi Your Threading entry but I realised I was going on quite a bit and it’s now been promoted to bonus post. As such, without further ado let’s get into it
This approach has the advantage of generating the best code for the computer to run. You can tell it to do complex things, you can manage it in different ways… This one piece of software could be built with CLANG, but have some key bits built with ISPC for that extra performance and still offload some stuff to the GPU with HIP or CUDA. Compiling makes it possible for these complicated types of toolchains to be used. The compiler can also do a lot of safety checks and some performance optimisations under the hood. Sometimes the “best” way to code something makes for code that is really awkward or difficult to understand. So a good compiler can take some of the stuff you write in a hopefully readable manner, and cheat under the hood here and there
Compiled languages though, DO have some disadvantages. Chief among them, the fact that if you want to tweak anything in your software, you need to build it again. Depending on the project this can be awkward. You ALSO need to explicitly know what you’re targeting when you compile things. CLANG, for example, can build for a whole bunch of platforms. Heck, you can have this one project where it builds for your x86_64 CPU, but is also compiling some HIP code for both AMD ROCm stuff and nVidia CUDA, which you’ll use some black magic to dynamically pick from. But you NEED to know which one is being used where. Code built for one of those will absolutely not run on the others. This is the same issue I mentioned when first talking about GPGPU and when going about ISPC. Compiler output is in Assembly language, and Assembly is specific to platform, it’s the most basic instructions that get exposed to programmers. When you’re expecting your code to run in a browser, you have no idea what’s that browser running on. It could be Windows, it could be Linux, it could be a phone, it could be a Playstation. All bets are off, what are you compiling for?
This is where interpreted languages come in. Instead of the source code running through a compiler and a binary coming out the other side ready to go, interpreted language code is interpreted on the fly. This means that whenever you’re writing code for an interpreted language, you’re targeting the interpreter. A lot of times, it’s also usual to refer to the source code of these languages as scripts and usually there isn’t a binary, it just runs straight from the source files. This is the case for languages such as Python (kinda, it does have a compiled format) and Ruby but also the likes of HTML and Javascript for which the most used interpreters are part of browsers.
The main issue with an interpreter is that it can never have as deep an understanding of your software as compiled languages do. Compiler toolchains can go over your code in great detail seeing how everything comes together, what can be optimised… An interpreter doesn’t have that sort of time or resources. It needs to quickly read what’s on the code and give an output now because the user is waiting. In addition to this restriction, interpreted languages are also designed to be much more flexible and permissible. A lot of the time, they’ll be dealing with different code that comes from different sources all working together. Just to render this webpage, your browser needs to deal with the HTML that comes out of the wordpress editor I use. But that also has a bunch of php and javascript for functionality, plus all the styling with css. It’s a big mess from a bunch of different places, a true nightmare and the interpreter desperately wants to have stuff happen. If that wasn’t rough enough, http stands for Hyper TEXT Transfer Protocol. Generally, everything on the web is a text string at some point, good luck making sense of it
This is a problem that ends up intertwining with the High and Low Level Language paradigms. Programmers want their code to be in a high level language, something close to human language, so they can understand the code. But human language is very complicated for computers to make sense of, they want something simpler, “add what’s in this address to what’s in the other address then shift the result by 4 bits”, which doesn’t make a lot of sense for most people. So computers need lower level languages, which are closer to the concepts that govern how they are built, and if you’re a programmer that cares about performance, you want to be able to dip into that when you need to
There is not a lot to talk about High Level and Low Level languages. The higher the language, the closer it is to human language, the lower, the closer it is to computer 1s and 0s. It’s more of a shorthand than a strict classification. Generally speaking Assembly is the lowest level language, from there you move onto things like C, then C++, then Java and C#, to the likes of Python and Javascript.
Of interest and relevance to us is that the lower level a language is, the more it needs to concern itself with very intrinsic machine problems. If you’re coding in C or C++, “what size (in memory) is a pointer” is a relevant question more often than you might realise. When, for example, I was porting my fractal generating code to Vulkan, I had a lot of trouble because of very low level problems. Namely memory alignment. In GLSL (the shading language standard that both Vulkan and OpenGL rely on), a vector of 3 floats, ALIGNS the same as a vector of 4 floats, but it’s not the same SIZE, this causes all sorts of unexpected behaviour if you don’t know about it and are finding about it the hard way.
This bites you right in the behind when you’re copying memory back and forth and things are just not getting to the other side safely. And it IS a very low level problem that if you’re always living in the world of Java, Python etc… there’s a good chance you’ll never bump into anything like it
// KillMeNow.glsl
struct sameSize
{
vec3 v1;
vec3 v2;
}
struct sameSize2
{
vec4 v1;
vec3 v2;
}
struct sameSize3
{
vec3 v1;
float f1;
vec3 v2;
}
struct NOTsameSize
{
vec4 v1;
float f1;
vec3 v2;
}
struct sameSize4
// I think
{
vec4 v1;
vec3 v2;
float f1;
}
So, going back to web, that necessity to be portable throws a massive wrench in the general way compiled software works and a lot of what makes it possible to optimise them and all. C++ is famously strict with typing for one. On the web, you’re basically always converting to and from strings and hoping for the best. This problem is not new and a solution to deploying compiled code to unknown arbitrary platforms was developed by Sun Microsystems way back, when they were pushing their new programming lanuguage/environment: Java.
Java worked around this problem by making a different type of interpreter, one you still needed to compile for. See, Java had this code once/run anywhere goal. But it cared a lot about performance and, crucially, A LOT about code safety, so they decided to make an interpreter that would read pre-compiled binaries. The Java compiler compilers your code not for the regular assembly languages, but for Java’s own assembly, which is called Java Bytecode. When you want to run this code, you load it up with the Java Virtual Machine, which is an interpreter, but it’s taking low level code, which it can work with much better. The JVM itself is compiled to each platform, this is the “Java runtime”, and it interprets the bytecode and translates it to the assembly for which it was compiled so things can run. Code once, run anywhere (that has a Java Virtual Machine).
Java Bytecode is an example of what is known as an Intermediate Representation. IRs are essentially “assembly code for imaginary machines”. Whereas the Assembly for x86_64 or armv8 are meant to run on that hardware, IRs are assembly code that is meant to be interpreted. And while I’m pretty sure that Java did not invent this idea, they’ve had a lot of widespread success in using it in this particular manner, nothing that I know of, at least, had used it to this “code once run anywhere” end.
Now, there are some issues with the JVM. Famously, it’s quite the memory hog (also C++ people tend to complain a lot that it’s super slow, but to me that hasn’t been a fair criticism for like, 15 years now) and it doesn’t allow you as much fine control as you’d like in some situations. But the idea is good and it got adopted in some other important contexts. Back when I talked about Vulkan, I mentioned that though you code your Vulkan shaders in GLSL, Vulkan actually doesn’t use GLSL, the shader is compiled to an IR. In Vulkan’s case, it’s called SPIR: Khronos’ Standard Portable Intermediate Representation. More relevant to us though, is the LLVM IR
According to Wikipedia, in the year 2000, the LLVM project started in the University of Illinois. LLVM stood for “Low Level Virtual Machine” but so much has been built on top of and around it that they’ve stopped focusing on it too much. Along with the definition and implementation of the virtual machine itself, LLVM has its own IR and tools are built to interpret and optimise this IR.
This enables LLVM to work as a compiler backend. If you could write a compiler frontend that would emit that IR, it could be plugged into all of those tools and work together for all sorts of low level shenanigans like optimisation and vectorisation, etc… And this happened, this is where CLANG comes in. And CLANG is pretty good, I myself use it almost exclusively, but it also showed the flexibility of LLVM in being a compiler backend. So it happened again… several times over
Having control of the IR and being able to pass it through a bunch of preexisting tools before is so powerful and so flexible that basically everyone started using it.
Really step one in writing a compiler these days seem to be forking CLANG and then you start working. And it’s a testament to how powerful this is. The LLVM project also has other stuff under its wing too, there’s their debugger, LLDB, their linker, LLD, their own implementation of the C++ standard library, LIBC++….
So I hope you’re not tired of seeing compilers which are just clang forks again because there’s bound to be more and more
As I mentioned before, this is a quickie just to get people up to speed with a few key concepts which might be foreign if they’re just landing on C++ now. I hope this wasn’t too boring. As for me, time to go back to writing about my adventures with another LLVM derived project
Ad-blocker not detected
Consider installing a browser extension that blocks ads and other malicious scripts in your browser to protect your privacy and security. Learn more.