Cartomancer and the threat of half-hearted features

Or how to find every trap by falling directly into them

01/Dec/2023

I always SAY that Qt and C++ are my main thing but, from an outsider perspective I guess I never put my money where my mouth has been all this time. I mean, I use Qt Creator and all but none of my personal projects actually use Qt itself. Well the opportunity presented itself and things changed.

I’ve recently written a library in C++ for the parsing of conditional text. you know, things like

[myVarhere|This string if true|This one if false]

Though this can be useful in more generic scenarios, the most immediate use case here is game contexts. The initial release of the code itself is already up publicly on Gitlab (though, at time of writing, there’s still some cleanup and important features that I very much want to do for a 1.1 in the near future).

Together with this library I built a demo application with Qt. In a similar situation, it’s already up, the code itself is in the same repository and I’m hosting binary releases here. You can check it out if you want already. There are packages for Windows and Linux. Sorry to the Mac folks but I don’t really have one for me to work on.

The journey getting here, though, wasn’t painless and this is what this post is going to be about. What was the idea of the project, how I decided to go about it, what were the rough edges I ran face first into and what’s next for these two intertwined projects.

You know the drill, get cosy, strap in and grab your beverage of choice. Let’s see what’s in the cards for our future

Finding a wheel to reinvent

So the story of Foreteller starts with an impromptu conversation. I have this friend who takes commissions writing smut for erotic games. One day, he’d written a passage that he’d particularly enjoyed and just randomly shared it with me. It was all good fun and all but as I read the raw text I was just looking at it and going: “Holy moly, that’s A LOT different variables” to which he kind of boasted that “yeah, not a lot of people go as far as I do in supporting all of the different scenarios we can have this thoroughly because, as you can see, it gets pretty wild”.

The passage he shared with me had 18 different variables. 9 of those are plain substitution stuff, like <character name here> sort of deal and the others are actually conditions which need to checked and will branch the the text into different options.

The developer instinct was just kicking in… “Dang, wouldn’t it be great if SOMEONE could make some sort of tool which enabled people writing this kind of thing to check their work and see if the different paths are behaving as intended and whatnot”.

We can’t do it. Can’t see a wheel just lying on the ground without immediately thinking about how you would go about reinventing it.

From there on, things proceeded as protocol. I spent a handful of days performatively saying that “it would be interesting but I’m just too busy with other stuff to do it myself, you know”. Later, as custom, I thought maybe I could get something done in a week or two and, a month of yelling later, I was putting it up on this here website.

Bloody programmers, I tell you…

And it was quite the adventure with a surprising amount of ups and downs, plus a lot of new stuff learnt along the way. So let’s start through the logical place… the end

"Who you are and who you are becoming"

So my idea started through Foreteller. I wanted an application that would parse the text and allow the user to fiddle with conditions from the text to check the output. Straight away I decided I was going to make the application in Qt. Qt is simply great, no-brainer decision for the most part. I wanted to host this as a WebAssembly application and already knew Qt has WASM export support through emscripten. The pieces were all there.

Qt and WebAssembly introduce licensing considerations, though. Most of Qt is available through the LGPL but “Qt for WebAssembly” is one of the components under the full fat GPL license. The TL;DR here is that the full GPL states that your code MUST be free and open and that any code derived from it ALSO needs to be GPL.

That’s not a problem for this application itself, but I DID want the core functionality to be reusable without the restrictions of the GPL. Say, for example I later build this into a Godot Plugin, I don’t want that to enforce GPL onto potential users of said plugin.

The obvious answer here would be splitting off the core functionality into its own thing and then have the application use that thing instead. This is where Cartomancer comes in. A library in pure C++, so no extraneous dependencies being brought in. All of the core functionality is in the library, and then the application accesses and “translates” it to Qt types for the GUI.

Sounds simple enough, we have a plan

Every journey has a first step

The first thing to do was getting some sort of prototype out. See if I could actually get things working how I wanted them to. I already had a general idea of how I wanted the main application window to look, which was good start. Just a big old panel with your text and then some controls to one side where you could do your tweaking

The tricky thing is the text processing itself. The structure of a text content like this is ultimately a tree.

[condition| You have some text which depends on a condition| But this text [inner|can have conditional text inside it|Nah, it's fine, don't worry] ]

Each string has one condition but… for you topmost level string that condition is true. And maybe different strings depend on the same condition… Perhaps most crucially, though, one big question was: “What about non-ASCII characters”? With all of those in the backburner, I actually got the initial versions sorted out rather quick. Pretty soon I was identifying each string, making sure they were in a tree, splitting off conditions from text and then storing those conditions in a list. This initial stage was pretty smooth

All of these early screenshots were taken from my work using the text my friend provided. I’ve censored them here not so much because of all the lewd content (though this blog isn’t tagged as NSFW anywhere that I can remember) but because this was commission work and we were unsure if it was okay for me to publicise the in-game event being created like this.

Afterwards I wrote my own Lorem Ipsum and had some stuff I could show. Just unfortunately, this is all I have from these early early stages

Screenshot of an application in development, with coloured blurred out text — First off, split the text

Screenshot of an application in development, with coloured blurred out text and a list to the left — Store the conditions to build our list

Screenshot of an application in development, with coloured blurred out text and a list to the left with different coloured squares — And classify the conditions according to their type

At this point, I already knew I absolutely did not want to limit this application to only working with ASCII characters. Heck, I was even already making use of non-ASCII characters myself.

Unicode has a special “Object Replacement Character“, which is meant as a placeholder for <insert image/video/file here>. And I was like “yes, that is what I want. A marker of where a child condition is supposed to go”. But 0xFFFC is a little bit larger than 0xFF.

Oh, Unicode!

See, previously I never had to deal directly in raw C++ with this sort of deal. I was blessed, I had avoided a lot of Unicode woes by being able to say, let QString handle it, I don’t care. This time, though, it was inevitable. But I saw that C++ had made some advancements in that front and, to explain those advancements, I need to talk a bit about a couple of things.

Text characters are just numbers

Everyone in the world is real unfortunate that a lot of early programming advancements were made by English speakers. The problem this presents is that English is the babyest version of written language. Without even going into the whole aspect where “written English” is mostly suggestive nonsense (Through? Trough? Though? Thought?), the problem this actually slid us into is that English has one of the most limited set of written characters in the planet. You don’t even need to go far, just look at any Latin language and you’ll see a bunch of accents and even new punctuation.

¿Qué pasa, baboso?

Now, inside the computer, everything is just numbers. We just have different logic for what the numbers mean. In the case of text characters, we have tables. “Number 65 in the table, is the character that looks like A“. And for a lot of programming environments, you want to be economical, especially with stuff like text.

So people looked at how many characters they needed and said, yeah, 7 bits is enough. 127 values is way more than enough.

Welcome to hell.

This was the start of a trend of people repeatedly realising that “oh, crap…. this is nowhere near enough. Let’s add some more”. ASCII itself got extended into an eight bit. Powers of 2 are really convenient when your low level stuff is all binary, so this was a given. But that was also not enough once people realised that, y’know, Japanese and Chinese people also have to write on computers.

So it turns out, getting characters on the screen that are supposed to be text is a WAY more complicated task than it may seem at first

What is the measure of a char

When you’re programming, the size of your variable in memory is a really important thing. Text is represented as strings, which, ultimately are sequences of characters. So, what is a char? If you’re tooting from a default config instance and have only 500 characters to complain, how big is that in memory?

I mentioned before that powers of 2 are convenient in computing. Since this was the smallest one to contain the ASCII set, 8 ended up as the de-facto standard length for a character. One byte, easy and convenient. And this is generally what people expect when they say a char. A type that is 8 bit-wide, and this is usually true.

As I mentioned earlier, though, this was not enough. So, easy answer, double it up. And so was born the wchar. A “wide character”, usually 16 bits in length. Plenty to write all those foreign scribbles.

Or so people thought.

But it turned out that was ALSO nowhere near enough. So now the new wide-char was more or less as useless as the previous one. I mean, it IS better, but you’re also just one unlucky day away from □□□ □ all over your □□□□. So now what? Chars need to be 32-bit wide? 32 (as far as I believe the current understanding to be) IS plenty, actually. But that also means a lot of tricky things code-wise. Because there’s a lot of code that RELIES on char being that small type. And also on wchar being 16-wide. And this stuff is baked into OSes too. So do we add an ultra wide char now? Do we just convert everything and immediately quadruple the storage size of a bunch of text by padding zeroes to each character?

And it gets worse.

You MAY have noticed that when discussing the size of a char I used some very non-committal language there. That’s because the size of a char is implementation dependent, actually. That’s right, a char doesn’t HAVE to be 8 bits, not as far as C or C++ are concerned, at the very least. wchar? EVEN WORSE STILL. Windows calls it “UNICODE” which is just a non-sequitur, just to add even more confusion. I mentioned people relying on it being 16 bit wide and it is… on Windows.

This is a nightmare. If you’re on Windows, you can’t guarantee one wide character to hold an entire “character on the screen”. But if you’re not aware of this and just code away in Linux, you’re going to have a bad time once people build your code in Windows because things may suddenly and, seemingly randomly, just break.

Though the specifics here are particularly gnarly, the overall problem of “what if someone builds this on a system that has a different size for <type>” is not a new thing in C and C++ and, in fact, C11 already introduced specific char16_t and char32_t types. With those you can have have a guarantee about the minimum size of your representation.

With all that in mind and this software not being one in which I have to be super concerned about memory (these are small texts for PCs, we’re not hurting for RAM here), I decided I’d just use std::u32string and call it a day, this way a lot of code operations become easier and, for the interface, I just need to convert it to and from QStrings, which is trivial.

Strings: C++'s Achilles' heel

So that idea worked well and fine…. until it didn’t.

C++, at the best of times, is pretty limited in the STL when it comes to strings. A lot of things you’d take for granted coming from any higher level language are just not in C++’s standard library. C++ lacks even a string::split or string::tokenize type function. You have to just go through the motions and do it yourself. And this can become an annoying problem when you want to, say, trim a string, which you also need to do manually.

But this problem goes one level deeper too. The little support C++ HAS for string operations is only defined for the basic classic string, std::basic_string<char>. Most of it IS also implemented for wide strings or has alternatives for wchar_t, (like isspace and iswspace). But I was on std::basic_string<char32_t> which essentially meant I was on my own. At the very least, unless I opened the can of worms that is C++’s locale library, something I wasn’t keen on adding to the list of things I had to go through.

What this meant for me was that operations to convert to and from numbers, checking if certain characters are whitespace or not and so forth would need to be implemented by hand. Now, adding an isspace32 type function and then having a comparison against space types I can find and hoping I didn’t forget one was a failure point I wasn’t keen on introducing into the application.

This forced me back to the drawing board, and, as I’m sure will come as no surprise to those who have ran face first into this wall before, the solution was

Unicode Transformation Format

At one point, people realised that this idea of “one character on the screen is also one character on your string”, while lovely and convenient was just not without serious complications and drawbacks. Worse still, it doesn’t even fully make sense once you get down into the weeds. A character as defined by unicode may have multiple ways of being represented as “code points” (the actual computer data) and they’re really supposed to mean the same. Think and ´ + an A, they need to be equivalent to the singular Á. Plus, you have this soup of computing standards around which make it non-trivial to just declare, say, “one character is 32 bits for everyone now”. It has hardware and OS implications…….

So, instead some new rules were created to represent unicode values in units that are smaller. This is the “Unicode Transformation Format”. The most common one is the version that represents things in 8 bits, because that’s the most common char size. UTF-8

There are a lot of arguments for UTF-8 and, given my own troubles, I was convinced, after reading a full blown manifesto on it, that it’s the smallest pain. This is not without its troubles, though, especially when it comes to coding. Take, for example, that Object Replacement Character I mentioned way back. You can express that as either:

char32_t ORC32 = 0xFFFC
OR
std::string ORC8 = u8"\xEF\xBF\xBC"

So instead of having one character I can just compare against, now my one “character” is a 3-char long string, which makes a lot of compare and find loops more fiddly to write. But it DOES solve the representation problem, and means you still have all the base facilities at your disposal. I could finally move on

You have activated C++20's trap card

I’m still catching up to the latest and greatest C++ modern standards and, during the coding of Cartomancer, I came across the “Spaceship operator” <=>, something I very quickly fell in love. If this is new to you, here’s the gist:

A <=> B will return a value that is less than 0 if A is smaller, greater than 0 if A is greater and equal to 0 if they’re “equal”. The compiler knows this and can rewrite regular operator calls to just use this so, straight off the bat, you only need to write one operator. But there’s a second crucial detail: the type it returns is not just a number. It’s a special comparison type and, depending on the specifics. you can tell what the results mean. If you return std::strong_ordering, this means all relationships are well defined, like for regular numbers. But if it’s std::weak_ordering, then A==B means more that they’re somehow equivalent. This could be the case if they’re, say, rectangles with different sizes but you’re “ordering” them by area. A(2,2) and B(4,1) would be EQUIVALENT, but not equal.

This kind of information layering is, honestly, one of the things I love about C++ and, when I had to write an ordering so I could put my conditions on a set, I decided I wanted to write one of these. But this is very new, so I had to bump my C++ standard to 20. No biggie, it’s supported in both current Clang and GCC.

All that talk about the more well defined string and char types, that’s all good in principle. You WANT your types to be tightly defined but, conspicuously absent was a char type that’s actually specified to be 8 bits. Well, C++ 20 fixed that and, if, by extension, std::u8string exists and does so SPECIFICALLY to be an explicit representation of an UTF-8 string, then it DOES make sense that writing std::string ORC8 = u8"\xEF\xBF\xBC" would be straight up an error. That’s the wrong type of string, right?

In a vacuum, sure. But remember how I mentioned that what facilities C++ DOES have for strings, it has them just for the basic std::string ? WELL…

In what turned out to be a massive screw up, C++20 kinda forced everyone into a type that’s only half-functional for all intents and purposes. The assignment aboves refuses to compile but, if you change to the correct string type, then you can’t convert from numbers, check spaces, etc….

This was so bad that compilers had to roll out a flag to explicitly disable this new char and string type and… yeah. That was my solution. Strings aren’t just C++’s Achilles’ heel. It’s BOTH heels. As I was posting this I started to think “dang, this post is just Unicode now, huh?” but, honestly, that’s kind of how a lot of the work here felt too. Just massive amounts of time wasted trying to just make sure I could process some chars and things wouldn’t randomly explode. And it wasn’t the only rug pull either

Kick them when they're down

After a LOT of pain and reworking the core of the application. Ironing out bugs here and there, refactoring a bunch of stuff, I had it. Things were working. All I had to do was build and upload. I was already expecting this to be fiddly, Emscripten IS rather finicky to handle but, nothing unexpected and… afet a lot of polishing and whatnot, I DID it! It was live

Except

Wouldn’t it be hilarious if my application about editing and parsing text…. could not receive any text input? Like, if the input fields simply refused to work?

Well, it would, actually

They’re not ENTIRELY non-functional, you CAN paste into them. They just don’t take any raw keyboard input. You’d think “text fields do not work” would be a pretty high priority bug but shows what I know, I guess, this problem is long known. And I was explicitly using an LTS version. At the time I found other issues like it for even older versions

So here I was, with my application ready but basic functionality being pulled from under me at the last second. Great. Lucky me I’m no stranger to having to package Qt applications…

It just really sucks to have to be maintaining these now, especially since I currently don’t have any CI set up. But there was a silver lining. This is what made me finally dive into AppImage packaging and I must say I’m generally happy with that.

Qt made my life harder even then, though. I wanted to use the same version of Qt for Windows and Linux but the online installer was not running properly on the containers for Linux. OpenSUSE, which I’m using as the build container base was packaging Qt 6.4, not 6.5. But on Windows I DO need 6.5 because windeployqt FINALLY got integrated in Qt’s CMake scripts so they’re different versions now. What a mess, honestly.

I even tried just building Qt from source for the container but it would crash in an internal dependency and at that point I just couldn’t really put more time into it.

Was it worth it in the end?

Well, you tell me. Both Cartomancer and Foreteller are live and have an official home here in this very website

As for me, the whole Unicode stuff was pretty exhausting and what I kind of thought was going to be a big old Qt victory lap just left me feeling honestly a bit betrayed. Having something this basic not working was a real slap in the face. And I’m kinda wary of Qt in some ways.

I’m not really a web developer but the possibility of using Emscripten to deploy Qt applications to browsers is amazing… in theory. QML really is, as far as I can tell, the golden standard in application GUI. Super easy to make really good looking and nice to use applications. If you’re not like me, who WANTED to make sure I have my application core separate, then Qt itself just offers you SO much as a framework.

But this is kind of a bad joke and I guess the dream of Qt on the web is dead if Qt cares so little about it that their LTS version has something as basic as Text Input just non-functional. It sucks for me because, yeah, I was hoping to make more of these in the future but… not with Qt

This doesn’t at all mean that either Cartomancer or Foreteller are abandonware, though. I still have improvements and polishing I want to do on both fronts. Expect a 1.1/1.1 in the near future.

If you decide to check either out and they’re something that’s useful to you, or if there’s a feature you’d like to see, feel not only free but encouraged to let me know, easiest place to find me is on Mastodon. It would be really cool if this DOES turn out to be useful to people

I also didn’t talk much about how Cartomancer works on the inside and why I made the decisions I did. If that’s something you’d like me to talk about, also let me know. I WILL write about stuff, it’s just that this one post is already long as it is. But between <=> and std::variant, Cartomancer is doing some new and interesting things under the hood

Coming next on The Great Refactoring

This whole deal with the TEXT INPUT (seriously still can’t believe it) not working on Qt for WebAssembly was an even bigger driver than GPL in making me consider alternatives. And so I have.

Between releasing Foreteller 1.0 and writing this post, I HAVE come back to examine Godot as an alternative for stuff I’d like to deploy on the web. I’ve had some initial good impressions from the little mini games I’ve made but hadn’t had a lot of experience making stuff more in the direction of a regular application. That’s changing now, and that’s what the next post is going to be about! How Qt hurt my fee fees and I straight away went off to go and kiss Godot in the mouth instead. If you want to know how that’s been turning out, I’m going to post that one week after this goes up so stay tuned!

The Great Refactoring