Hello World! Photomosaic Project

Hi! This is my first blog post on this website, which is as of right now dedicated to my programming projects. I will be posting about the process of making them, and I’ll routinely update them on Github repositories on my account.

The project that I’m starting this blog off with is a program to take an image and make a photomosaic, like so:

I got the idea to do this from Robert Heaton’s Programming Projects for Advanced Beginners post.

That is much better than what I ended up getting as a minimum viable product, but we made progress.

I coded my version with Python (as most of my projects probably will be), and I heavily relied on Pillow and numpy to do it. Pillow made it extremely easy to load images in, and it worked very well with numpy as all I had to do to get the array of RGB values was pass the image object into numpy’s array function.

However, loading my first image was really stressful because I kept getting an IOError saying that there was a “broken data stream when reading image file”, and even at the time of writing this I have no idea why it happened. When I made the scraper to get a dataset of images later on, every so often it would run into the same error for seemingly random images, and I ended up just writing a try statement to log and skip it.

After that, I defined a variable for how large I wanted the smaller images to be (50 x 50 pixels), trimmed the original image so that boxes of that size would evenly fit into the image, and iterated through the image with 50 x 50 subrectangles that were each loaded into a numpy array to calculate the average RGB value. Each average RGB value was stored in a second numpy array with the dimensions trimmed to match the number of average RGB values, which would later be used to get the corresponding images.

To make a photomosaic, I obviously needed a set of images, so I wrote a separate script to collect some. Using the requests and lxml modules, I set it to collect from pexels.com and gave it a list of a few search terms to collect from, like “flowers” and “nature”, until it had about 100 images. (Later, I can probably change that so it uses the “related searches” mechanism to keep finding images until it hits a user-specified limit, but that worked for a start.) After downloading all of the images, I also wrote code to crop them all to squares and make a JSON index that had the filenames and average RGB values for each image while I was there.

Finally, in the original script I iterated over each element in the second numpy array that held the average RGB values for the subrectangles, and found the image whose average RGB value (handily in my JSON index) was closest to that pixel’s average value, and saved the image’s filename in a third array. Heaton’s post suggests using “Pythagoras’ theorem in three dimensions”, which I used without the square root to make calculations faster. After saving the corresponding filenames, I iterated over that array to get each image and pasted the image in the corresponding place over the original image (technically a copy of the original image), generating my photomosaic.

This was the program running on a picture of the Earth, with a dataset of 110 images. The most immediate improvement would probably come from getting a larger dataset with more potential average RGB values so that Australia and North America don’t get eaten by an army of pink flowers.

Source code here.