September 2023
Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a pioneer in color photography who foresaw its future potential as early as 1907. He obtained special permission from the Russian Tsar to travel across the vast Russian Empire and capture color photographs of various subjects, including a rare color portrait of Leo Tolstoy. Prokudin-Gorskii meticulously photographed a wide range of subjects, such as people, buildings, landscapes, railroads, and bridges, resulting in thousands of color pictures. His innovative approach involved recording three exposures of each scene on glass plates using red, green, and blue filters, despite the lack of immediate means to print color photographs. He envisioned the use of special projectors in educational settings throughout Russia for children to learn about their country through these images. Unfortunately, his plans never came to fruition as he left Russia in 1918 after the revolution and never returned. Fortunately, his RGB glass plate negatives, documenting the final years of the Russian Empire, survived and were acquired by the Library of Congress in 1948. Recently, the Library of Congress has digitized these negatives, making them accessible online to the public.
We were given digitized Prokudin-Gorskii glass plate images and, using image processing techniques, we were supposed to produce a color image with as few visual artifacts as possible.
First, I extracted the three color channel images. The next part was aligning them so that they form a single RGB color image. At first, I tried just placing them on top of each other with no alignment. Here are the results.
The first thing I tried was to use an exhaustive search over a window of shifts and choose the shift that produced the lowest loss over the whole image.
I tested two loss functions, L2 norm also known as the Sum of Squared Differences (SSD), and normalized cross-correlation (NCC).
There is an important distinction for these losses. For SSD, you want to get the smallest difference, whereas for NCC you want to get the largest cross-correlation.
I chose a window of 16x16 pixels, so I would have to iterate through every combination between [-16, 16] pixels in both the x-axis and the y-axis. That’s a total of 1,024 different shifts that we have to compare for each pixel. This works fine for smaller images which in our dataset were the JPG-format images (around 350x350, 122,500 pixels total, 100 million iterations); however, larger images such as the TIF-format images would take too long to iterate through (around 3500x3500, 12,250,000 pixels total, 100 billion iterations).
For the larger photos, it was necessary to do an image pyramid algorithm. The idea behind image pyramids is to rescale the image to a smaller resolution and then do an exhaustive search over that smaller image and then rescale it back to the original image dimensions.
For example, consider 4 rescaling operations, which I’ll call a 4-level pyramid. Scale it by 1/16, which you get by doing (1/2)4. This produces an image of 1/16 the resolution which is small enough for our align algorithm. We then rescale the image back to 1/8. The important thing to remember is that rescaling by 2 will double the resolution. Therefore a shift of 1x1 pixel will become a shift of 2x2 pixels. We can now iterate through all shifts of 2x2 pixels in our exhaustive search. Then just continue to do that until we reach our original resolution again.
This algorithm works as long as we scale to a reasonable amount. Intuitively, viewing this from a per-pixel perspective, when we look at an image we cannot process every single pixel and it will look similar to our eyes if we average out a window of pixels and create a smaller resolution image. The main problem with this that I found is that edges with sharp color changes will not look similar if we were to average it. That is why we have to iteratively rescale and do an exhaustive search over a much smaller amount of possible shifts, so we can recover any info lost from the averaging.
One problem that I encountered was that the images have borders that are one color usually black. The problem was that when I calculate the loss function over that region it does really well across many shifts now since the region was just a lot of every color. This affects the total loss and in some cases it will make images turn out poorly.
My solution was to crop out that region. I noticed that the borders tended to be similar maybe due to the camera being used. I eyeballed it and decided to take out 5% of the image on each edge when I was computing the loss.
Here are the aligned small images (jpg) using just exhaustive search and no image pyramid. All the photo info are included and follow the same format– green shift: (x, y) red shift: (x, y) filename
Here are the aligned large images (tif) using the image pyramid. Again all the photo info are included and follow the same format– green shift: (x, y) red shift: (x, y) filename
Here are the aligned large images (tif) that I chose from the dataset. Again all the photo info are included and follow the same format– green shift: (x, y) red shift: (x, y) filename
I also did an edge detection algorithm. There are many different implementations of edge detection algorithms. I decided to a Sobel operator, also called a Sobel filter. The operator uses two 3×3 kernels which are convolved with the original image to calculate approximations of the derivatives – one for horizontal changes, and one for vertical.
Here is the melons.tif photo with edge-detection.