6 Comments
User's avatar
Joseph Rahi's avatar

This is fantastic! I really like the idea of doing it from "blanked" pixels/characters. That was kind of part of my initial thought process, thinking about seeing complexity as how much the picture "makes sense" (is compressible) as we uncover the full picture. It's about how "integral" it is.

>On the left, the file is just zeros or spaces (or any single character). On the right, each byte is entirely random. Neither extreme has any meaning.

I'm not sure this is correct. The non-randomised text should be compressible to something which would appear much closer to randomness, but would still contain the same meaning (once decompressed anyway), so I think the degree of randomness quite possibly has no connection at all to how much meaning there might be within it. I'm not sure, but I think under an ideal compression system, all messages would appear as if they were random. What do you think?

Something interesting I noticed with the pictures is that it's much easier to see the image as it emerges from randomness than it is with it emerging from all black. Which you might not expect, since both have the same % of the original image visible. If anything, I'd have guessed that emerging from all black would be easier, since your brain doesn't have to pick out signal from random noise. Perhaps the randomness gives the brain's pattern recognition system more to latch onto?

You might be interested to hear that I've since experimented a little with LLM-based text compression, which integrates some degree of the semantic structure into the compression system. Hypothetically, if the LLM were a perfect model of the language, it should be very close to perfect compression for real-world, meaningful texts. Weirdly, the final steps of removing the random noise actually make much less difference to its compressibility for the LLM-based compression, whereas with the normal compression method the last steps make the biggest difference. I suppose the LLM is picking up the error-correcting structures within language? I should probably try to write something up about it.

Expand full comment
Wyrd Smythe's avatar

Cool, thanks, I'm glad you liked it. After mucking around with this for weeks, it only occurred to me a couple of weeks ago to try emerging from blankness as opposed to submerging into randomness. (This series looks to run for quite a few posts, so I'll be getting into all sorts of gory details.)

> "The non-randomised text should be compressible to something which would appear much closer to randomness, ..."

Not quite sure what you mean here. In the paragraph in question, I was talking about the whole spectrum from a file consisting of only spaces (on the left side of the "Meaningful Complexity" graph) versus one containing random bytes. The former compresses to a tiny fraction of its size, whereas the latter doesn't compress at all.

The one in the middle — the undistorted text — compresses to the general average for text files (which varies considerably depending on the text file content). I intend to explore this in much more detail in future posts, but for now I'll note that the last chart in the post, for the text file Hamlet, uses a distortion technique that preserves the file structure, so the right side of the compression curve doesn't indicate zero compression as it does when I use a different distortion filter. (One that alters all chars to 8-bit values, so 100% noise results in a completely random file.) Here's an example:

https://substack.com/profile/195807185-wyrd-smythe/note/c-129108721

I think the way an image emerges from a solid color versus randomness may depend on the image size (and possibly even the monitor — my examples look slightly different on my laptop monitor versus the attached big-screen monitor). Even monitor brightness may affect them.

I posted some examples in Notes for you to consider:

https://substack.com/profile/195807185-wyrd-smythe/note/c-129109583

https://substack.com/profile/195807185-wyrd-smythe/note/c-129110313

For my eyes, the one that seems most clear is the emerging from black one. I think size plays a big role. I've noticed that in the thumbnails, the random ones do seem to stand out more. Thumbnails of the black and white seem almost entirely black or white.

I'd be interested in reading what you're doing with LLM-based compression. Prima facie, I'd think their stochastic nature would make the compression lossy?

Expand full comment
Joseph Rahi's avatar

What I meant was that the apparent "randomness" of a file doesn't preclude it from being meaningful. A compressed file should appear less structured because the redundant structure has been removed/compressed away, but it would be just as meaningful (once decompressed).

Those examples are interesting. You may well be right about it depending on image size and perhaps monitor. I think it might also be that different ways of distorting the image have different levels of clarity for us at different levels of distortion, so perhaps it's easier with random pixels at 50%, but tougher at 90% distortion?

The LLM-based compression is actually lossless. It uses arithmetic coding (https://en.wikipedia.org/wiki/Arithmetic_coding), using the probability distribution the LLM produces for predicting the next token at each step, which is deterministic (generally).

Expand full comment
Wyrd Smythe's avatar

> "What I meant was that the apparent "randomness" of a file doesn't preclude it from being meaningful."

Ah, yes, very true. A canonical case is the digits of pi. Effectively random but altering any one of them changes the value to not-pi. Encryption is another example. Good encryption cranks out data with the highest possible entropy. Again, effectively random but altering even a single bit corrupts the message.

Which reminds me, from your last reply I forgot to respond to:

> "I think under an ideal compression system, all messages would appear as if they were random."

Yes. Compression also turns out very high-entropy data that looks close to random. You generally cannot compress a compressed file. (Though they differ from pi and encryption in having embedded non-random header and "dictionary" information. PNG files are likewise apparently high-entropy except for their headers and info areas. That small bit of non-random data allows them to *sometimes* be compressed. BTW: JPG files are a whole other thing. They encode images using visual Fourier transforms. Almost a kind of magic.)

> "I think it might also be that different ways of distorting the image have different levels of clarity for us at different levels of distortion, ..."

I think the distortion method does have a lot to do with it. The modes where I fade all pixels to black or white generally seem clearer than the modes that replace pixels (several modes: random color, random grayscale, black, gray, white). For instance, the fade-to-white seems clearer than the white pixel mode at higher "noise" levels. It also seems clearer than the matching fade-to-black mode, yet the black pixel mode seems clearer than its matching white pixel mode. So, *adding* white pixels visually "erases" the picture more than adding black pixels, but *fading* to white seems (to me) to erase it less than fading to black. Go figure.

Expand full comment
Lois Thomson Bowersock's avatar

Wow Wyrd! The world needs people like you because there are people like me. Bentley looks like a true K9 angel! Keep up the good work and I'll do likewise. Happy Fourth to you and your family.

Expand full comment
Wyrd Smythe's avatar

As they say, it takes all types to make a world. Have a fun weekend, but don't blow anything up!

Expand full comment