Welcome to Story Space

Story space is the special place where all the stories live.

Sep 19, 2024

Story space is the infinite Platonic realm of all possible stories. This isn't a new idea and certainly doesn’t originate with me.1 In 1941, Argentine author Jorge Luis Borges wrote about a version in his surreal short story The Library of Babel.

Mathematician Rudy Rucker described the underlying principle in great detail in his 1987 popular math book, Mind Tools. (I wrote about his approach back in 2013. See: L26 and L27 and Beyond.)

The short version is this:

Given some encoding scheme, any text can be seen as a single unique number (one huge beyond imagination). So, every book is a point on the infinitely long number line.

A simple example: the word “Hello”. If we use a basic system of a=1, b=2, and so on up to z=26, then we can encode the letters as 08-05-12-12-15. We use leading zeros in single-digit numbers to make them two-digit. Note that the highest possible number is 26, so all letters easily encode to double-digit numbers.

Now mash it all together into a single number: 805,121,215 (we can throw away the leading zero). This is the point on the number line for “hello”.2

We can also start with a number and decode it. What do you think the number 7,151,503,022,505 might be? It breaks out to 07-15-15-03-02-25-05, and from those numbers we can work out the text: “goodbye”.

The same process can be applied to any text, from words to epic tomes, magazines, blog posts, or chat messages. Any text can be encoded as a single (generally very large) number. I’ll leave it for the reader to decode 23,251,804,191,325,200,805.

And every number decodes to some text, although most numbers (nearly all of them) decode to gibberish. For instance, 71,423,010,401 (a number I just generated randomly) decodes to 07-14-23-01-04-01, which decodes to “gnwada” — which actually isn’t that bad. Pronounceable, at least.3

A brief detour for some technical details. I’ll keep this as simple as possible.4

As described above, the scheme lacks much. It doesn’t handle spaces between words, let alone punctuation. We’d like uppercase and lowercase letters, too. And the ten digits 0 through 9. (And maybe some control characters, like [Tab] and [Enter].) We can just add them all to the double-digit scheme, giving them numbers as needed. So long as we keep it below 99, the scheme works fine.

But a proper Library of Babel should have books in all languages, and that makes for a much larger character set (in the millions). Fortunately, a scheme already exists. Unicode is a global standard for encoding text from all languages into the numbers used by computers. Exactly what we want.5

And we can use it directly. That is, each byte of Unicode text is a digit6 of the number that represents that text. So, the string of bytes for any Unicode-encoded text is the binary number of that text.

This makes it easy to see why most text numbers are gibberish. Imagine a text file made of random bytes. Given Unicode, the letters wouldn't generally be from the Latin alphabet. Unicode includes a wide variety of graphics symbols (including all the emojis7), so many of the characters in a random file wouldn’t be letters in any language.

The numbers here are large beyond imagination. As an example, I found a PDF of The Library of Babel online; it’s 131,594 bytes long. That's a binary number over one-million bits long.8

As a decimal number, it has 316,910 digits. (Starting with 85112… and ending with …02496.) Human imagination begins to fail with numbers with more than half-a-dozen digits. Book numbers are far beyond our imagination.

Even short text ends up with hefty numbers here. For example, “Wyrd Smythe” encodes to the number 105,750,062,794,582,700,860,926,053.

In the Borges story, there is a library that contains every possible book. For every number, there is a book in the library. And, indeed, nearly all of them are just gibberish. The equivalent of files of random bytes.

Many, though, are close to existing books — varying in anything from all possible single typos to all possible sweeping editorial changes. Yet still close enough to be considered a version rather than a new and separate work. And of course, all existing books are in this library, both in their as-printed versions with typos as well as the perfect version the author intended. Along with the better and worse versions the author could have written (or wrote and threw away). Plus, there are translations of every book into every language.9

An infinite number of lucid original works exist, too. All possible ones. Correct stories that haven't been written yet — some great, some not. (Or which may never be written. Or never written by humans.)

The microcosm version is the transcendental number pi, the digits of which contain every finite number string possible, so the same logic applies. Every number means every book. (See: Here Today; Pi Tomorrow and Happy Pi Day!) So, good old pi has in its digits all the stories, the full Babel Library.10

This post is in story space, too. It’s a unique number on that infinite number line. If I go back and fix a typo, its story number changes.11

Imagining stories on an infinite number line is a bit too linear for my taste, and the points tend to be widely separated with an infinite amount of random gibberish dividing them. It’s hard to find readable books in Borges’s library. As mathematically accurate and appealing as this vision is, I prefer something more… friendly.

I like to visualize story space as a landscape (of sunny days, clear air, rolling hills, babbling brooks, and lots of different trees). Regions reflect genres, but without hard (or sometimes even visible) boundaries. In fact, because so many genres overlap and blend, the landscape topology is decidedly multi-dimensional with assorted wormholes, portals, and oddly overlapping lands. Tesseracts are ordinary toys in this landscape!

The idea is to ignore all the story numbers that are gibberish — not stories at all — and imagine all the valid points distributed thematically across a landscape. Every point on this landscape is a valid story (possibly a dull or mundane one).

So, metaphorically speaking, authors search the story space landscape for unique points on which to build their creations.

The question here is whether a story space neighborhood can become over-crowded. Mathematically, there always is an infinite number of points between any two given points, no matter how close those points are. Mathematically there is always room.12

But can a neighborhood become too well explored, too known? I read science fiction because I see the region of regular fiction as too well explored. For thousands of years.

Even to my jaded eyes, though, authors manage to find interesting plots of land in the most crowded of neighborhoods. Our exploration of story numbers showed that the space of possible stories is unimaginably large, perhaps uncountably infinite. In which case, there will always be new stories to find and tell.

But the metaphor of a crowded well-known story neighborhood remains with me. From time to time, people ask “Is genre X over?” A similar question has long been asked about music, “Is rock dead?” I think the answer in both cases is “No, not dead, but maybe kinda tired.”13

Neighborhoods do get a bit crowded and well known over time. That said, there can be comfort in familiarity. Nothing like a favorite old pair of shoes. And sometimes the comfort of the crowd is a good thing.

In conclusion, I have no conclusion. Stories and life are ongoing. I’m just putting one more point on the story number line.

Until next time…

In some sense it does go all the way back to Plato and perfect forms.

To make the illustration as simple as possible, I skipped a lot of details, but it suffices to show the text-number relationship.

One is more likely to get “nkshdbz” or “fxxegrq”.

But no simpler.

Specifically, since we’re dealing with bytes, Unicode in its UTF-8 form.

In base 256!

Though, because these are multi-byte sequences in UTF-8, it’s highly unlikely random bytes would be the right contiguous bytes to amount to any emoji.

131,594 bytes × 8 bits per byte = 1,052,752 bits.

Leaving one to ponder the number for the Klingon translation of Hamlet.

My favorite part of Carl Sagan’s Contact (1985) is the part about what they find in pi.

Because the most significant digits are at the beginning and the least significant at the end, changes early in the story result in a bigger change to the story number. If, say, you’d mistakenly typed a comma at very end and changed it to a period, that would be about the smallest possible numeric change. In the Unicode scheme, comma=44, and period=46, so the story number would increase by only 2.

At the Hilbert Hotel. You can check out any time you like, but you can never leave.

“It’s just resting!”

Logos con Carne

Discussion about this post