NumPy Image Processing Tips Every Data Scientist Should Know

Article by Bharath Raj | August 06, 2020

In this blog, we’ll explore how to solve problems with NumPy for image processing by leveraging concepts such as broadcasting, vectorization, advanced indexing and more. We’ll also show that by using the right NumPy functions for the right tasks, the result is a performance improvement (speed-wise) that is much better than basic inefficient solutions.

NumPy is a great library that provides optimized functionality for a large number of array-based operations. Though it’s fairly straightforward to leverage the full power of NumPy for simple tasks, it is not always readily evident how to implement certain functionality for some applications optimally.

Though we will not be providing an introduction to the concepts listed above, we will be demonstrating why they are important and how they can be used practically. For that purpose, we have designed a few toy problems which we will solve below. The optimal techniques discussed for these toy problems can be easily extended to issues that you might face while processing data for practical computer vision applications such as semantic segmentation and object detection.

And don’t worry if you are unsure about the concepts mentioned above; we’ve included an abundance of visual and textual explanations to help. If you’re after a refresher, there are plenty of resources available.

Alright, let’s get started by defining our toy problems.

 

The Toy Problems for Numpy Image Processing

We will be looking at two toy problems in our exploration of Numpy for image processing. They are:

 

Problem #1: Full Replacement

Given the input image in Figure 1, convert all the pixels with color [255, 0, 0] to the color [255, 255, 255].

Note: All colors in this blog are represented as [R, G, B] values.

Figure 1: Problem #1 for Numpy Image Processing
Figure 1: Problem #1; Full Replacement

 

Problem #2: Conditional Replacement

Given the input image in Figure 2, convert all the pixels within the specified boxes and with color [255, 0, 0] to the color [255, 255, 255]. The boxes are visualized in Figure 2 (middle) with stars representing the top-left and bottom-right coordinates of the boxes.

Figure 2: Problem #2; Conditional Replacement

As you can see, problem #2 is an extension of problem #1.

At first glance the problems may seem simple. But while the core problems are simple, the implementation details will show us why it is typically recommended to leverage the power of vectorization using NumPy rather than iterating over pixels using python loops.

In the following sections we’ll explore a few solutions to these problems. One solution uses explicit python loops to iterate over pixels to solve the problem. The other solution(s) use NumPy functions. Following these solutions, we will compare how long it takes for each method to solve our problems.

We will also measure the time taken by each method for different image sizes (or scales). This will show us how the speed-up obtained from an optimal method varies across multiple scales. The input image for both problems is available in five scales (or sizes), and the height of each image is half its width. For this reason, we’ll refer to the image in each scale solely by its width. The width of the images in the five scales are: 2000px, 1000px, 500px, 250px and 100px.

Note: The image was originally created with width 2000px and height 1000px and was resized to the other scales while maintaining the aspect ratio.

Please note that all the time comparisons and plots presented here are based on measurements taken on this colab notebook. You can replicate the experiment on your own device by cloning and following the instructions in the above notebook link or my GitHub repository.

Disclaimer: The measurements were taken on 2nd August 2020 on the colab notebook mentioned in the GitHub repository. The exact timings and speed-ups may vary based on several factors such as software/hardware versions and so on. It might also be affected by some extraneous factors such cpu/memory usage and so on. Hence, refer to these measurements only to get an approximate sense of the differences in performance. Refer to the colab notebook for more information about the versions of some of the software used.

Now that the problems are set, let’s begin.

 

Solution — Problem #1: Full Replacement

Let’s analyze the input image from Figure 1 in more detail. Figure 3 below includes some helpful annotations to help us understand the problem better.

The input image is represented in the form of a 3D NumPy array named img which has the shape [H, W, 3]. Let’s focus on just the first two dimensions (height and width).

Let’s denote row-indices by r and column-indices by c. From Figure 3, we can see that r can take any integer value in the range [0, H-1] (inclusive at both ends) and c can take any integer value in the range [0, W-1] (inclusive at both ends). We can represent one pixel in the image by its coordinate (r, c). Moreover, img[r, c, :] will give the RGB value of the image at (r, c).

Figure 3: Annotated Input Image for Problem #1

 

This should give you some ideas as to what variables and tricks you can leverage to solve this problem. Now let’s look at a couple of solutions.

 

Method I: Simple Python For Loops

Naturally, a simple for-loop based solution can be applied to this problem. Assuming that we loaded the image to the NumPy array img, the below code-block provides a solution to the problem:

 

output = img.copy()
H, W = img.shape[:2]
for r in range(H):
    for c in range(W):
        if np.all(img[r, c, :] == [255, 0, 0]):
            output[r, c, :] = [255, 255, 255]

 

In the above code we just looped over every single pixel and checked if its RGB value was equal to [255, 0, 0]. If so, we overwrite the value of that pixel location in the output NumPy array to be [255, 255, 255].

Pretty simple and concise, right? However, as we will soon observe this method is inefficient.

 

Method II: Using NumPy Image Processing Functions

So the question is, how do we make this more efficient? Let’s try to build a solution that leverages NumPy tricks and methods to harness the power of vectorization.

As before, we’ll assume that we loaded the image to the NumPy array img. The following code-block provides an optimized solution:

 

output = img.copy()

valid = np.all(img == [255, 0, 0], axis = -1)
rs, cs = valid.nonzero()
output[rs, cs, :] = [255, 255, 255]

 

The code still looks very concise, but is also a bit more complex. If you are familiar with how broadcasting and advanced indexing works, you’ll be able to see what’s going on here. If you aren’t, here’s an explanation:

  • Instead of manually looping over every pixel and checking for equality, we create a “mask” for the entire image using NumPy functions.
  • The “mask” (represented by valid) is a NumPy array with the same height and width as of the original image. It has a boolean value at each coordinate. If the value is True, then that coordinate in the image has the color [255, 0, 0]. If the value is False, then that coordinate does not have the color [255, 0, 0].
  • We can now use valid.nonzero() to get the indices of the elements of valid that are non-zero (here, not False). Figure 4 provides a neat visual example of how it works. You can also refer to its documentation for more information and examples.
Figure 4: Demonstration of how arr.nonzero() works. The variable arr is a boolean NumPy array of shape [4, 3] represented on the left side of the figure. White squares represent the value True, whereas black boxes represent the value False. In our actual code, we use the variable named valid.
  • Now, we can directly use rs and cs to access the pixels with value [255, 0, 0] of the output array and assign them the value [255, 255, 255].

 

Figure 5 below shows an approximate depiction of the full workflow.

Figure 5: Approximate Workflow of Method II, Problem #1

 

By leveraging the concepts of advanced indexing and broadcasting, we can remove the need for explicit python loops. This enables us to harness the power of vectorization.

But was this worth the additional effort? Let’s find out.

 

Performance Comparison

Lets apply Methods I and II to the image with width 2000px (and height 1000px) to solve Problem #1. We’ll use the timeit module from Python to measure the time taken to solve the problem.

Figure 6: Time Taken to solve Problem #1

 

From the image above we can see how much more efficient Method II is compared to Method I.

The time taken to solve the problem using both methods in all the five scales is presented in the bar plots below. The plot on the right is a zoomed in version of the plot on the left. The zoomed in version was included to highlight the performance difference.

The zoomed in version also includes the speedup obtained using Method II (For Method II, speedup is the time taken by Method I divided by time taken by Method II).

Figure 7: Comparison of time taken to solve Problem #1 by Method I and Method II at multiple image scales.

 

Solution — Problem #2: Conditional Replacement

Let’s analyze the input image from Figure 2 in more detail. Figure 8 below has some helpful annotations added to the input image to help us better understand the problem.

The axis, indices, height and width parameters of the image are the same as Problem #1. The only difference is that we have to perform replacement only inside the dotted boxes. The dotted boxes are represented by the top-left and bottom-right coordinates (marked by stars in the figure below). Note that the boxes and stars are imaginary (i.e. they don’t exist on the input image) and are only displayed for visualization purposes.

Annotated Input Image for Numpy Image processing Problem #2
Figure 8: Annotated Input Image for Problem #2

 

We are given the top-right and bottom-left coordinates of all the boxes in a NumPy array called boxes. The coordinates have been cast to integers to simplify our problem. In the code-block, w refers to 1/10th of the width of the image and h refers to 1/10th of the height of the image.

 

height = width // 2
h, w = height / 10, width / 10

boxes = np.array([
    [[h, w], [9*h, 2*w]],
    [[6.5*h, 2.5*w], [9*h, 8*w]],
    [[h, 2.5*w], [5.5*h, 6*w]],
    [[h, 6.5*w], [5.5*h, 9*w]],
    [[6.5*h, 8.5*w], [9*h, 9*w]]
]).astype(np.int32)

 

Since w and h both depend on the image dimensions, the size of our boxes are proportional to the size of the image. This makes it easier for us to analyze the performance at different scales.  Now let’s look at a couple of solutions.

 

Method I: Simple Python For Loops

The simple Python loop based solution is very straightforward. Essentially, it has the same structure as Method I in Problem #1, but some minor changes were made so we only traverse the pixels within the imaginary boxes.
The code-block below displays the solution. Since it is similar to Method I of Problem #1, we will not analyze this further.

 

output = img.copy()
H, W = img.shape[:2]

## Looping over all boxes
for box in boxes:

    ## Getting the box coordinates
    top_h, top_w = box[0]
    bot_h, bot_w = box[1]

    ## Similar to Method I of Problem #1
    for r in range(top_h, bot_h + 1):
        for c in range(top_w, bot_w + 1):
            if np.all(img[r, c, :] == [255, 0, 0]):
                output[r, c, :] = [255, 255, 255]

 

Method II: NumPy Image Processing Functions (Type 1)

The NumPy based solution can be approached in a variety of ways. One simple solution is to extract the portion of the image within each imaginary box into sub-images and repeat the same solution that we used for Method II of Problem #1. You can see that in the code-block below:

 

output = img.copy()

## Looping over all boxes
for box in boxes:

    ## Getting the box coordinates
    top_h, top_w = box[0]
    bot_h, bot_w = box[1]

    ## Extracting sub-images
    sub_img_inp = img[top_h: bot_h + 1, top_w: bot_w + 1]
    sub_img_out = output[top_h: bot_h + 1, top_w: bot_w + 1]

    ## Similar to Method II of Problem #1
    valid = np.all(sub_img_inp == [255, 0, 0], axis = -1)
    rs, cs = valid.nonzero()
    sub_img_out[rs, cs, :] = [255, 255, 255]

 

In the above code-block, we are looping over the boxes array. The variable box represents one box. We can directly use the coordinate information available in box to slice the NumPy arrays img and output to get sub_img_inp and sub_img_out respectively.

Both the newly created NumPy arrays (sub_img_inp and sub_img_out) are views of the original NumPy arrays (img and output respectively). This means that if we modify values in the new arrays, the change will be reflected in the respective old arrays.

This trick makes our problem much simpler. Now we can just use Method II of Problem #1 on sub_img_inp and sub_img_out for each box. The diagram below provides an approximate visual representation of the workflow.

Figure 9: Approximate Workflow of Method II, Problem #2

 

Please note that we did use a python loop to iterate over boxes. However, it will not have a profound effect because there are only five boxes and the code executed at each iteration is rather fast.

Similar to Method II of Problem #1, we can expect this method to be much faster than using Python loops to traverse every pixel.

 

Method III: NumPy Image Processing Functions (Type 2)

Let’s look at one more approach that can be used to solve this problem. It’s a little more complex than the previous method and also a little less efficient, but I want to include this method to demonstrate a few tricks that could come handy for other problems, which I’ll give examples of further down.

The following code-block demonstrates the method. I’ll explain it in greater detail below.

 

output = img.copy()

## Getting valid coordinates of color [255,0,0]
valid = np.all(img == [255, 0, 0], axis = -1)
rs, cs = valid.nonzero()

## Initializing rcs mask.
all_valid_rcs = np.full(rs.shape, False)

## Looping over all boxes
for box in boxes:

    ## Getting the box coordinates
    top_h, top_w = box[0]
    bot_h, bot_w = box[1]

    ## Finding valid coordinates that are also
    ## inside the current box.
    cur_valid_rs = ((rs >= top_h) & (rs <= bot_h))
    cur_valid_cs = ((cs >= top_w) & (cs <= bot_w))
    cur_valid_rcs = cur_valid_rs & cur_valid_cs

    ## Updating the rcs mask with valid coordinates
    ## that are inside the current box.
    all_valid_rcs |= cur_valid_rcs

## Selecting indices that belong to any box
rs = rs[all_valid_rcs]
cs = cs[all_valid_rcs]

output[rs, cs, :] = [255, 255, 255]

 

Here’s an explanation as to how it works:

  • Similar to Method II of Problem #1, we create the mask array valid and the indices rs and cs for the whole image.
  • Now, we want to select those indices from rs and cs that lie within any of the five boxes.
  • To do this, we create an array named all_valid_rcs which has the same shape of rs (Note: rs and cs have the same shape) but is filled with the value False.
  • Now, if any coordinate [r, c] lies within a box, then its corresponding index in all_valid_rcs is set to True.
  • For instance, if rs[i] and cs[i] are both inside a box, then all_valid_rcs[i] will be set to True. The code within the for-loop of the above code-block performs the vectorized version of this logic.
  • Once this process has been completed for all elements in rs and cs (and for all the boxes), we can use all_valid_rcs to select those indices from rs and cs which lie inside any box. The below diagram gives an approximate representation of this process.
Figure 10: Visual representation of selecting valid coordinates which lie inside a box.
  • Once we are done selecting the rs and cs values that lie within any box, we can use them to change the color to [255, 255, 255]. The below diagram gives an approximate representation of the full method.
Numpy image processing Workflow of Method III, Problem #2
Figure 11: Approximate Workflow of Method III, Problem #2

 

Through this method you can see that with the use of a few NumPy image processing tricks we can select the correct indices and perform replacement just once rather than performing replacement for each box separately (i.e. Method II).

While for our particular problem this workflow may not offer any benefits, this may be useful for other practical problems. For instance, instead of “replacement” you might deal with some costly operation which takes a lot of time to execute. In such a case this method might be more useful.

In the case of overlapping boxes, techniques from this method can also be used to ensure that coordinates belonging to multiple boxes are not operated on multiple times. This may not be a concern for all problems, but it is good to know that you can handle such a problem if it arises.

 

Performance Comparison

As before, let’s apply Methods I, II and III to the image with width 2000px (and height 1000px) to solve Problem #2 and measure the time.

Time Taken to solve Numpy Image Processing Problem #2
Figure 12: Time Taken to solve Problem #2

 

From the image above we see that Method I is not as efficient as Methods II and III. We can also see that Method III is slightly slower than Method II.

The time taken to solve the problem using all three methods in all the five scales is presented in the bar plots below. The plot on the right is once again a zoomed-in version of the plot on the left. The zoomed-in version is there to highlight the performance difference.

The zoomed-in version also includes the speedup obtained by using Method II and Method III. For Method II, the speedup is the time taken by Method I divided by time taken by Method II. Similarly, for Method III, the speedup is the time taken by Method I divided by time taken by Method III.

Figure 13: Comparison of time taken to solve Problem #2 by Methods I, II and III at multiple image scales.

 

Conclusion

Through the help of a few toy problems, we explored some NumPy image processing tricks and tips which can greatly improve the efficiency of your code. By comparing solutions by their time to completion, we could also verify that efficiency.

As mentioned earlier, though we only focused only on toy problems, the tricks demonstrated here can be easily used in practical applications as well. For instance, dealing with bounding boxes and pixels of a certain color are scenarios you typically face in object detection and semantic segmentation problems.

The solutions presented here were written so the code would be relatively easy to read and follow. However, there may be more elegant or optimal solutions outside of those presented above; if you’re interested in trying out your own solution to these problems, please feel free to check out my GitHub repository and colab notebook.


Be sure to check the related resources below for more of Bharath’s technical articles, and sign up to the Lionbridge AI newsletter for interviews and articles delivered directly to your inbox.

Subscribe to our newsletter for more technical articles
The Author
Bharath Raj

Bharath is a curious learner who explores the depths of Computer Vision and Machine Learning. He often writes code to build hobby projects and jots down his findings and musings as blogs. You can check out more of his work on Medium and GitHub @thatbrguy.

    Welcome!

    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.