Math on objects

Andrew Hussie, author of the online webcomic Homestuck, stated that he uses computer science as a "source of raw material for absurd kinds of exploitation". The more I learned in my degree and my engineering career, the more I understood early Homestuck. I think one quality of a "Homestuck-like" is a story where you compile all the weird knowledge you've accumulated and turn it into lore for characters to muddle through. Writers tend to gravitate toward what they know, but Hussie did it blatantly, on-purpose, and with such flagrant mockery that what he knew of computer science was either not enough or too boring for him to continue with into the later chapters of the comic. After all, it's a story, not a textbook.

This left a great opportunity for me to pick up the baton.

Alchemy as presented

In Homestuck, the main characters each have a "sylladex", which stores items in "captchalogue cards", and can be stored or retrieved as if they were items in a video game inventory. Sylladexes (sylladices?) follow rules determed by the character's "fetch modus", which are initially based on data structures, such as a stack, binary tree, or hash map. For example, a sylladex using the stack fetch modus can only retrieve the item in the uppermost full captchalogue card from the stack.

The characters play a video game called SBURB, which can drop objects into people's rooms and retroactively affect reality by sending meteors to the player's home planet. The entry sequence of the game, which teleports the player's house into the game's pocket dimension, uses three machines and a captchalogue card to create a crystalline object that activates the teleportation. After they're in, the player can flip over captchalogue cards with items in them to get the item's "captcha code", and then use a fourth machine to take empty captchalogue cards, punch them with the codes given, and replicate the item for a set price in "grist", one of the game's currencies.

All that to get to the fun part. Of course, it's skipped over after the first couple of times.

At some point, the characters figure out that you can overlay two cards to only show the punched holes shared by both, or punch two codes into the same card and get all the punched holes from either. John Egbert, our main point of view character, uses his rudimentary computer science knowledge to call these operations &&, or "AND", and ||, or "OR", and writes a helpful guide for the reader, or anyone who's still on GameFAQs.com after the apocalypse.

We're not sure how && and || are actually used in the comic. There's only one time the same combination was used with both operations.

Dead Things in Amber && Smuppet = Foam Mutant Smuppet Encased in Amber

Dead Things in Amber || Smuppet = Amber Mutant Smuppet Abomination

The common explanation given is that && does whatever the opposite of || does, and they both preserve the aesthetic of one item with the functionality of the other. One's a lewd amber puppet, the other's a lewd puppet in amber. This is the closest we'll get to a real answer.

I'm not buying it. It's nondeterministic and doesn't fit with some existing captchalogue codes in the comic. Even worse is the theory that the || operation takes the qualities of the first item and the aesthetics of the second, which isn't even commutative! It's a throwaway joke in comparison to the sheer magnitude of everything else, but a central theme of the comic is "taking the bit too far", so that's what I'm going to do.

Alchemy as theorized

I'm not putting TeX support on my blog, so you'll have to use your imagination, or KaTeX.

Alchemy consists of a set of four operations,

$$

C(x): \mathbb{O} \rightarrow \mathbb{H}^8 \\

AND(x, y): \mathbb{H}^8 \times \mathbb{H}^8 \rightarrow \mathbb{H}^8 \\

OR(x, y): \mathbb{H}^8 \times \mathbb{H}^8 \rightarrow \mathbb{H}^8 \\

A(x): \mathbb{H}^8 \rightarrow \mathbb{O}

$$

where $\mathbb{H}$ is the set of 64 characters that are valid for a captcha code, and $\mathbb{O}$ is the set of every object in existence. Trivially, A, the alchemize function, cannot be an inverse of C, the captchalogue function, because the cardinality ("size") of $\mathbb{H}^8$ is 2 to the power of 48 captcha codes, while the cardinality of $\mathbb{O}$ is probably much larger than 2^48. John says as much in the story:

first of all, with all the hole slots, there are 48 bits in total, which means there are almost 300 trillion possible codes. and 300 trillion sounds huge! but when you consider it is supposed to account for ALL CONCEIVABLE ITEMS, including all the wacky combinations of stuff, it suddenly doesn't seem that big!

Alchemy's operating as a hash function with some homomorphic properties. A hash function is a function that takes a value of any size and returns a value of fixed size, with the guarantee that you won't have two inputs return the same output outside of a very small chance (and I mean very), and that given a hashed value, you won't be able to find an original word that gives you that hashed value without (doing something close to) trying every possible word. In the real world, programs use these to store passwords, and if a password's hashed value matches the hash stored, they let you in. They don't know your real password, and it'll be really hard for them to find it. You already know it, so it's easy for you to log in.

Homomorphic is a word usually used for normal code-making (encryption) and not hashing; it means that there are functions that you can do on the encrypted values that correspond to some function on the normal value. For example, with some homomorphic encryption schemes, you can do some operation that "adds" or "multiplies" the two secret numbers together without knowing what either one is, or what their result will be. You can then decrypt it and get the proper result.

AND and OR clearly have some conceptual effect on the items they represent, and C crunches down the object space, potentially infinite, into only around 300 trillion codes. However, the point of a hash is that it's impossible to get every preimage (original input) of a hash (output), and highly improbable to get even one. A rainbow table is a table that maps hash codes to possible pre-images; they're terabytes in size, and they're used to crack common passwords in real life. They're made by taking the hash of a bunch of common words and phrases, assuming that people would make their passwords one of these words or phrases. I imagine the game stores a rainbow table of every item created, looks the code up in the rainbow table, and return the stored object. If the user inputs random garbage, it outputs a random garbage object by combining random objects stored in its database. This happened once in the comic, which is how John eventually got the rocket pack.

There's one object in the story that we know the captcha code of that proves extraordinarily useful for futher theorizing and implementation. The Perfectly Generic Object, code 00000000, has no qualities whatsoever. It is represented by a green cube in the drawings, but for all intents and purposes, is treated as an object with no traits.

AND and OR operate in the same way their corresponding binary operations work. This means that anything AND 00000000 = 00000000, and anything OR 00000000 is that same thing. This was the impetus for the final theory that I put into the implementation of the bot:

  • AND (&&) takes two object codes and returns a code for an object that has only the shared qualities of both predecessor objects.
  • OR (||) takes two object codes and returns a code for an object that shares all the qualities of both predecessor objects.

OR takes the entire Venn diagram, AND takes only the middle. Just like the actual Boolean operations!

Because 300 trillion is a lot less than infinity, there will inevitably be collisions, or two objects with the same code. This doesn't happen in the comic, but John says as much:

this leads me to believe that not every combination of item has a viable duplicate. but this is kinda obvious anyway, since there are many combinations of punch cards that will produce either a blank card (with AND) or a totally punched card (with OR). so there are lots of dud combinations out there, and many that will just lead to the same pattern. like for instance a gun and an atom bomb could make some sort of ULTIMATE DEATH RAY, but for that matter a shoe horn and a potted plant could lead to exactly the same pattern!!!!! so weird.

If you keep using AND between random objects, you'll end up with 00000000, the code for a perfectly generic object. Fittingly enough, you're taking the intersection of random sets of traits, which will probably be empty eventually. Similarly, with OR, you'll end up with the code !!!!!!!!, some hypothetical object with every trait.

Alchemy, specifically the OR function, functions similarly to, but not exactly like, a Bloom filter. In the simplest terms I can manage, it's a way to quickly check if you've probably seen an item before. ORing objects together makes a new object with all the traits of the previous objects, and since the hash space isn't that large, traits may overlap significantly. In a filter, m is the total number of bits, and k is the number of bits that get checked and flipped for each object. For alchemy's "filter", m = 48, since there are 6 bits per character * 8 characters = 48 bits to flip. k doesn't have a fixed value, since different "base objects" could have different numbers of bits flipped, so it's not a one-to-one comparison, but fairly close.

Alchemy as implemented

Past implementations of alchemy include the Overseer Project, where any combination of items was met with a web form declaring "oops! You alchemized a combination we didn't implement yet. Put a suggestion in here and we'll get to it eventually". The number of possible "oops"es expanded quadratically because every new item created 2n new interactions, where n is the total number of items. If you don't know math, that means "pretty fast". Thankfully, nowadays we have giant blocks of linear algebra that form plausible bullshit generators that will gladly do this for you on demand, so there's no need to wait for the admin of an unmaintained web game to put in your latest item combo.

Yes, that means AI.

Last November, I began work on the Paradox Engine, a Homestuck classpecting bot. My friend already added alchemy to her own classpecting bot (in fact, her alchemy came first, before her classpecting), but I wanted to build it from scratch with my own ideas. I implemented alchemy from the bottom to the top. Start with the root operations, build functionality on top of those root functions, continue until we have a usable bot.

The root functions for the alchemy codes themsleves are fairly simple. There's a JavaScript library out there that does core alchemy operations in 100 lines; I did it in less than 40 lines in Python. I had some fun with types so they're "fully type-safe" now, with a special type for alchemy codes.

Next was the item data structure, which needs to save the item's name, code, and description. I ended up adding the components that built the item as well, because with just the name and description, the items quickly devolve into "The Destroyinator: It destroys stuff really hard", losing all flavor of the original components. The humor in Homestuck and Hiveswap is very "snarky late 2000s gamer"; if the budget were higher and more eyes were on it, some of the item descriptions would probably degrade into Joss Whedon-approved "he's right behind me, isn't he" lines. In Hiveswap Act 1, you can use any object to interact with any other object or any item in the scenery. Someone had to write all those lines. On my first run-through with generating descriptions, I told the prompt to sound like a 20-year-old unpaid game dev intern whose most fun part of the job is writing all these lines. It actually nailed the tone. Now that's prompt engineering.

Alchemization in the comic is a very collaborative process. The characters regularly share codes and items with each other across universes, because any code put into one Alchemiter is usable on any of them. It's an instant transfer of information and utility for an eventually marginal price in grist, and also a useful plot device. The alchemization feature for this bot should also be collaborative in the same fashion; seeing items other people made and building off of them is fun, while attempting to combine items and getting a different result from someone who just did the same thing you did feels weird. This is a common flaw in a lot of AI apps; they're random plausible bullshit generators. They're not consistent, you need to force that upon the system for a good user experience.

It took me a while to settle on the final architecture, but the central alchemy function takes in two item names or codes, searches for an item in a locally-stored SQLite database by that name or code, and if it finds it, keeps it on hand for the next part. If it doesn't, then it makes a placeholder item with a random code and plausible description based on the name. Even the lowly needle gets a description:

For stitching wounds or making new ones. Haystack not included.

Once we have two item objects, we store them in the database so they can be reused later (if not already taken from the database), and combine the two into a new object using the rules described above. The component lists get added together with the operation in the middle, so we have a direct trace of every item used to get to this point, and this final item, with its name, description, code, and components, is stored in the database and sent to the user.

Development lessons

I spent too many hours trying to Dockerize this bot because I thought it'd make it easier to update; it turns out it's the same difficulty. I put in better logging, and realized I was a much better dev than last November. My friend asked for more details because she was interested in how other people get to the same destination, and I think that's an important part of development; so much of enterprise software is taking existing building blocks off the shelf and ruminating over system design before you get it approved by your higher-ups. In projects like these, there's room for creativity in architecture and implementation, for better or worse.

There are a couple main takeaways from this, including "AI can be used to have fun with other people in a collaborative and creative rather than passive and extractive manner", or "there are cathedrals everywhere for those with the benzodiazepine to see", but I'm brought back to that quote I put in the credits of the Paradox Engine on a whim.

"Real paradise lies eternally in the person who dreams of it. Why don't you venture forth in search of your own utopia?"

A very beautiful way to say "if you want something done, do it yourself". Only seen in some sourceless image hacked together with an 8-bit Spiderman, and before that, a JRPG about Aztecs or something. Every night I work on this bot, I end sleepless and metaphorically breathless; I have a road ahead of me I know I can take, and I know what lies beyond it. I'll reread these posts in the morning like I read my old code only a couple months after I wrote it, with some mix of amusement and disgust. The beauty of the internet is there's always an opportunity for revision; I can change a couple words and hit a couple buttons, and my typos and bugs are fixed.

You can see the code at https://github.com/muridaerattus/paradox-engine. Have fun.