Tobias Worledge, Fall 2024
This project colorizes images by implementing efficient image alignment using edge detection and image pyramids to create well-defined features to align the images on. As an additional feature, the project also implements auto-cropping to remove excess artifacts remaining after the alignment is completed.
How do we find common features to align three images on? With the human eye, we can usually pick out a primary feature within any image and match it across all three colored images. Fortunately, we can achieve something similar with an algorithm. We first pre-process the image by measuring the difference between each pixel and its neighbors. This results in a new image where bright pixels (high values) indicate that the pixel is distinct compared to some of its neighbors. This is great because it allows us to capture features across the entire image that may be shared across the other colored images. To prevent differing intensities between the red, green, and blue colored images, I normalize each edge detection image by the max pixel value of that image. Below is a visualization of edge detection on the lady.tif image.
Now that we have isolated shared features across our red, green, and blue images, we can implement the image alignment. The most trivial approach is to test all possible alignments of the green and blue images with the red image within +-15 pixels on both the x and y axis. This produces excellent results for the first 3 .jpg images which is great! Unfortunately, this does not work well when we try using a .tif file that has ~100 times more pixels (3000x3000 vs 300x300). It seems that the alignments needed for the .tif image are much larger than just +-15 pixels. To solve this, we can use image pyramids to create downsampled versions of the images and align them first. This allows us to find a rough alignment that we can then use to align the full-sized images. This approach produces a substantial speedup to our original alignment method and allows us to search across much larger alignment dimensions for .tif images.
I noticed that the resulting aligned images were great! Unfortunately,
there were some pretty noticable borders that were produced by shifting
each of the red, green, and blue pictures around. To fix this, I wanted to
build an intelligent cropping algorithm. I first converted each pixel to a
form of grayscale by taking the sum of each pixel's RGB channels, each
scaled by 1/3 (I originally tried the human-vision-oriented scaling, but
that produced worse results since it emphasized certain colors). I decided
to use a similar edge detection algorithm to measure the differences of
pixels to their neighbors, but specifically to only either their
horizontal or vertical neighbors, and to pixels farther away. For example,
for ~300x300 images, I measure the difference between a given pixel and
pixels horizontal from it, up to 10 pixels away. This was because I
noticed that the artifacts on the borders were always either vertical or
horizontal rectangles and roughly uniform color. By measuring the
similarity of pixels to their distant neighbors, I could produce a metric
of the average horizontal/vertical pixel similarity of an image. Then, I
could attempt different cropping dimensions and choose the one that
minimized the average pixel similarity. This produced good results and
managed to remove the vast majority of the artifacts on the borders of the
images.
The original aligned images can be seen on the left
side below. The auto-cropped images can be seen on the right side. It
seems to work pretty well and I think it's pretty cool!
I've noticed that there are some cases where individual parts of images
look unaligned by a few pixels. This was initially concerning, but after
thinking about it, I realized that this was likely due to the subject
moving slightly while the pictures were taken. This is primarily because
the remainder of the image looks well-aligned.
There are a few cases where the cropping algorithm slightly
overcrops the image which is unfortunate. I believe this is due to the
existance of a semi-uniform-colored object that exists near the border of
the image, such as in the Monastery picture.