Let's build a thing...: Data folding... yet another compression method

Recently, I wrote about a compression technique I've called "Iteratively Sequenced Data Compression" making magnificent claims about the level of compression it could attain.

I have another compression technique using [spoiler alert!] folded data structures. I'll explain what I mean by that shortly.

For this technique I won't bore you with a back-history of how I got involved in data compression, nor will I make claim to impressive compression ratios because this, currently, is a little more theoretical.

It's important to remember that when it comes to data compression, we don't really care about the data itself. The data itself can go in the trashcan because it's probably too big - after all, that's why we're looking to compress it right?! All we need is to remember is how to reconstruct the data once we're done forgetting about it. That bit is really important. So important in fact, it deserves a line of its own.

Forget the data; just remember how to reconstruct it.

So, with that in mind, what do I mean by folded data structures?

Imagine you have your data in one nice long strip. Easy enough.

Now imagine you start wrapping that data into rows, essentially like word-wrap on your fave text editor. Also easy.

Okay, so now imagine you had that data printed out on a piece of paper. Also not difficult to imagine.

Hold tight folks, here's where it starts to get a little more tricky.

Now look for a way of folding that piece of paper such that you align common values and drive a skewer through the paper to hold all those matched values in one place.

Let's further imagine you tried folding that piece of paper many different ways until you landed on the optimal folds to get the maximum continuous strip of values on your skewer.

Remember the value you skewered and the folding pattern. That's your first folded data item.

You can forget about that chunk of data now, so go ahead and erase all the data you just skewered from your piece of paper.

Go ahead and repeat this process of finding the optimal fold; remember the value and the folding pattern. That's your second folded data item. Forget the data you just skewered from the page.

Do it again. That's your third. Keep repeating this until you have skewered all your data.

You will find that as you reduce the data by skewering it off the page, your folding patterns become easier to spot, and less complex, so the process becomes faster the more you work through the page.

Once you've got all your folded data items, simply concatenate them and pop them in a small file.

Easy!

All you've had to remember is the skewered value and the folding pattern; which can then be unfolded back out all the way back out to that continuous stream of data that you started with.

So what this technique effectively does is allow you to do run-length-encoding across a landscape of data that, as a continuous strip, could not make the best use of high run lengths; which is therefore sub-optimal. By folding (or crinkling) the data in the right way, you can maximise your run-lengths to get highly effective compression.

Hypothesis:

Compression will be intensive and slow, whereas decompression will be fast.
Splicing byte values to enable layering of data prior to folding would improve compression and performance.
The compression method could be iterative to produce progressively smaller content.

Let's build a thing...

Ko-fi

Wednesday, 20 February 2019

Data folding... yet another compression method

1 comment: