Yeah, Substack’s software can be a bit iffy sometimes. I try to remember to occasionally go through all my subs and check what I’m subscribed to, but it gets to be a chore as the number of subs grows.
Good reminder, thanks, it’s time I rolled up my sleeves and went through them again. (Just checked yours — I’m down for all your Newsletters.)
Mine could definitely be happier. The smoke alarms installed last August, which have been erroring out (3 of the 4) and driving me crazy were installed by a company that went out of business. The company that inherited their phone number won’t honor any warranties. I can see why that other company went out of business — they were awful, and I’ve regretted I ever heard of them. I’m not surprised they went out of business, but it screws me over on what to do about the units. 🤬
I look forward to hearing what you've found about ESC. Is it looking promising or have you found some flaws 😅? BTW, I think I forgot to say in the post, but you should be able to see the code I used in github, linked from the apps in the top right of the page.
It looks like you might be on to something. I've been playing around with text files, in part because it seems a bit easier to say whether a text file has "meaning" (as difficult as that is to define). I wrote some code that preserves the structure of a text file — the lines and spaces and punctuation — but can randomize the words in various ways. Then I'm comparing how they compress compared to the source text.
In one mode, the words are randomized as to where they appear. All the same words but shuffled around. In other modes, I keep the words in place but alter one or more characters while preserving their case (and replacing any digits with different digits). The latter mode I created just last night and don't have any results yet.
As I'll get into when I post about it, I've been working with two source texts, "Hamlet" and Neal Stephenson's "In the Beginning Was the Command Line". In the case of "Hamlet" I'm also treating the character names indicating who speaks which line separately from the other words. Scrambling or randomizing separately. It makes for some amusing outputs.
In "Hamlet", the source compresses from 188,602 bytes to 69,785 bytes (37.00% compression). Two word-scrambled versions compress to 41.57% and 41.83%. Which surprised me a bit. They have exactly the same byte size and word *content* as the source but the words are scrambled. The compression sizes aren't hugely different, but the scrambled versions consistently are a bit larger. I suspect something about word order matters — perhaps because of capitalized words appearing in the middle of sentences?
I haven't analyzed the altered word versions, but they do compress to higher sizes, I suspect because identical words in the text are no longer identical after random letters have been changed.
It's interesting, and I'll write more in detail in the upcoming post. (Maybe this week but more likely next.)
Sounds interesting! It makes sense that the word ordering would enable more compression. Lots of words appear together in the same order, eg in your comment you used "source file" 3 times, so a compressor could benefit from encoding those letters together. And then the larger the text the more occasions for spotting repeated phrases like that.
Have you measured the EF for them? It should, I think, be a fair bit higher for the correctly ordered texts. If not, the current ESC formula fails, since the randomised ones are compressed closer to 50%, so their AC x SS will be higher. This is a bit of a lingering concern I have.
Good point about word sequences. It would be interesting to see what the compression dictionary for the compressed versions actually is. The large file size factor is why I picked those two texts. The Stephenson text is 213 Kbytes compared to the 188 Kbytes of "Hamlet". Reasonably comparable.
I haven't gotten around to calculating EF, yet. I have to figure out how to accomplish it with the text files. As it stands, I have only the original and a single randomized version. The word-letter randomizing does give me a chance to generate multiple files with 1 letter randomized, 2 letters, etc. I may also create an algorithm to add progressive amounts of noise as with the images. Haven't decides whether to try to preserve structure in any way or just let it rip. Maybe both.
The week is filling up, so I may not get back to this until next week.
It's a lovely idea. I think it will definitely come through. Sometimes it’s hard managing multiple sections of your account but if you have time and you have the right resources to put in it, it will come true Wyrd.
I saw your comment on Hans Morgenstern post. I love that I decided to come to your page and I found this one extremely interesting.
I'm Ral, lovely meeting you sir. I’m glad you’re happy and doing amazing, beautifully retired 🌝
I was about to look for my squirrels, and I saw this come through.
I adjusted my settings. Thank you for this.
Looking forward to reading more from you, Wyrd.
Enjoy your evening.
From what I understand, existing subscribers should have been added automagically to the new Newsletter. Did you find yourself already subscribed?
You enjoy your evening, too, Neela!
It should, but from my experience it doesn't always work. I also have several sections and I had to refresh in February.
I did not ..
I had to select again. 😂
Thank you so much Wyrd. 🙏
Yeah, Substack’s software can be a bit iffy sometimes. I try to remember to occasionally go through all my subs and check what I’m subscribed to, but it gets to be a chore as the number of subs grows.
Good reminder, thanks, it’s time I rolled up my sleeves and went through them again. (Just checked yours — I’m down for all your Newsletters.)
I did that exercise a couple months ago… It is a pain in the butt but necessary.
Happy Wednesday Wyrd…
Happy Wednesday to you, too, Neela.
Mine could definitely be happier. The smoke alarms installed last August, which have been erroring out (3 of the 4) and driving me crazy were installed by a company that went out of business. The company that inherited their phone number won’t honor any warranties. I can see why that other company went out of business — they were awful, and I’ve regretted I ever heard of them. I’m not surprised they went out of business, but it screws me over on what to do about the units. 🤬
Oh no, that sounds maddening Wyrd........
Constant false alarms and no one to take responsibility? I’d be furious too. I hope you’re able to find a workaround soon (do you have earplugs?)
I've checked and I'm subscribed :)
I look forward to hearing what you've found about ESC. Is it looking promising or have you found some flaws 😅? BTW, I think I forgot to say in the post, but you should be able to see the code I used in github, linked from the apps in the top right of the page.
It looks like you might be on to something. I've been playing around with text files, in part because it seems a bit easier to say whether a text file has "meaning" (as difficult as that is to define). I wrote some code that preserves the structure of a text file — the lines and spaces and punctuation — but can randomize the words in various ways. Then I'm comparing how they compress compared to the source text.
In one mode, the words are randomized as to where they appear. All the same words but shuffled around. In other modes, I keep the words in place but alter one or more characters while preserving their case (and replacing any digits with different digits). The latter mode I created just last night and don't have any results yet.
As I'll get into when I post about it, I've been working with two source texts, "Hamlet" and Neal Stephenson's "In the Beginning Was the Command Line". In the case of "Hamlet" I'm also treating the character names indicating who speaks which line separately from the other words. Scrambling or randomizing separately. It makes for some amusing outputs.
In "Hamlet", the source compresses from 188,602 bytes to 69,785 bytes (37.00% compression). Two word-scrambled versions compress to 41.57% and 41.83%. Which surprised me a bit. They have exactly the same byte size and word *content* as the source but the words are scrambled. The compression sizes aren't hugely different, but the scrambled versions consistently are a bit larger. I suspect something about word order matters — perhaps because of capitalized words appearing in the middle of sentences?
I haven't analyzed the altered word versions, but they do compress to higher sizes, I suspect because identical words in the text are no longer identical after random letters have been changed.
It's interesting, and I'll write more in detail in the upcoming post. (Maybe this week but more likely next.)
Sounds interesting! It makes sense that the word ordering would enable more compression. Lots of words appear together in the same order, eg in your comment you used "source file" 3 times, so a compressor could benefit from encoding those letters together. And then the larger the text the more occasions for spotting repeated phrases like that.
Have you measured the EF for them? It should, I think, be a fair bit higher for the correctly ordered texts. If not, the current ESC formula fails, since the randomised ones are compressed closer to 50%, so their AC x SS will be higher. This is a bit of a lingering concern I have.
Good point about word sequences. It would be interesting to see what the compression dictionary for the compressed versions actually is. The large file size factor is why I picked those two texts. The Stephenson text is 213 Kbytes compared to the 188 Kbytes of "Hamlet". Reasonably comparable.
I haven't gotten around to calculating EF, yet. I have to figure out how to accomplish it with the text files. As it stands, I have only the original and a single randomized version. The word-letter randomizing does give me a chance to generate multiple files with 1 letter randomized, 2 letters, etc. I may also create an algorithm to add progressive amounts of noise as with the images. Haven't decides whether to try to preserve structure in any way or just let it rip. Maybe both.
The week is filling up, so I may not get back to this until next week.
It's a lovely idea. I think it will definitely come through. Sometimes it’s hard managing multiple sections of your account but if you have time and you have the right resources to put in it, it will come true Wyrd.
I saw your comment on Hans Morgenstern post. I love that I decided to come to your page and I found this one extremely interesting.
I'm Ral, lovely meeting you sir. I’m glad you’re happy and doing amazing, beautifully retired 🌝
Hi Ral, nice to meet you, welcome to my blog. 😊