Fractal Image Compression

Just for the fun of it, I spent some time implementing a fractal image compression scheme in JavaScript. Because this is really just an experiment and in no way intended for everyday use, it can only deal with square images of power-of-two dimensions. But besides this (admittedly big) limitation, it actually works.

Play with it

I have set up a separate page where you can play with the compression. If you have trouble figuring out how to use it or what it’s about at all, please read the rest of this post as it might help to get started.

How it works

The mathematically inclined might want to refer to the Wikipedia article linked above on why fractal compression works at all, here I’ll only outline how I implemented it. As already said, we start with the square power-of-two image we want to find a fractal representation for. This image is subdivided into non overlapping domains of let’s say 16px² (one reason for the square & POT restriction is to make this possible without ugly corner cases). We also perform another subdivision of the image into ranges. The only real requirement for those ranges is that they must be strictly larger than the domains. In this implementation, the image is subdivided using a quadtree scheme, starting with the whole image and going down to ranges which are twice the size of the domains. These get scaled down to the domain size (using bilinear interpolation).

Now we have a rather big pool of range images, and the real work starts: We do an exhaustive search for range images which cover our domains as good as possible. We use the mean squared error to determine the similarity here. To improve the quality of our matches, two additional operations are performed on the ranges:

  • The range images may be transformed to match the domains. The algorithm tries all eight symmetries of the square and chooses the best. This is the main reason for the “only squares” restriction.
  • Also, we apply a brightness/contrast adjustment to our source (range) image. The nice thing about this step is, that by doing a linear regression analysis we get both, the optimal coefficients and the resulting error by looping over the pixel values only once.
  • The heart of the encoding algorithm works only on gray-scale images. To support color images, their color information is converted from RGB to YUV color space, and the three resulting grey-scale channels are encoded in terms of each other.

Using this scheme, the pixel colors of every domain block are reduced to a block like this (in JSON notation):

{
   "pos": [3, 3, 2],
   "channel": 0,
   "transform": 6,
   "brightness": 80.87387183475751,
   "contrast": -0.837734338904282,
}

The meaning of these properties is as follows:

  • pos defines the position of the range block in the source image, where an empty list means the whole source image is used for this domain block, [0] would mean the upper left quarter is used, [0,3] means the lower right quarter of the upper left quarter is used, and so on.
  • channel is only really used for color images, and specifies the source channel where to extract from
  • transform is in the range [0..7] and specifies which of the eight transformations to apply to this source image and finally
  • brightness and contrast specify the value adjustment for the source image

You’re still reading… just start playing with the damn thing and if you care just post a questing in the comments section below.


About this entry