Anyone who has dabbled in computer vision or image processing in Python is familiar with OpenCV, NumPy, or other libraries for image manipulation. Each library has its own unique features and pros and cons, but most importantly, each library may differ when it comes to handling, manipulating, and processing images.
One important concept to be aware of when using such libraries is the coordinate system in which the points in an image are defined. These concepts come in handy when creating and using algorithms for image manipulation. For instance, it can help you build and better understand components of some feature extraction and data augmentation techniques.
These concepts are also useful if you have to use multiple libraries and each has a different coordinate system for manipulating images. For example, imagine a scenario where you have to parse the bounding box coordinates output by a deep learning model. The model may output the coordinates with respect to a different coordinate system than the one your target library accepts. In this case, you need to manually transform the bounding box coordinates to your target coordinate system appropriately.
In this blog we explore how images can be represented using coordinate systems in OpenCV and NumPy. We will also explore how to move points from one of the above image coordinate systems into another.
This section provides a quick refresher about the various terminology involved, and how images can be represented using coordinate systems.
Coordinate systems can be used to define the location of a point with respect to some reference. In the 2D Cartesian Coordinate System, each point is represented by an ordered set of 2 coordinates: (x, y). Here, x and y are signed distances to two perpendicular axes (commonly labelled the X-Axis and the Y-Axis). It is called an ordered set because the order matters (i.e. (y, x) refers to a different point). The point where the axes intersect is called the origin and it has the coordinates (0, 0).
Similarly, the 3D Cartesian Coordinate System has three mutually perpendicular axes (X-Axis, Y-Axis and the Z-Axis). Here, each point is represented by an ordered set of 3 coordinates: (x, y, z). For more information, you can refer to this resource.
One common way of representing images is by using multi-dimensional arrays. In this representation, the smallest individual element of an image is a pixel. Here, the image has two spatial dimensions where one axis is along the width of the image and another axis is along the height of the image. Clearly, both axes are perpendicular to each other.
We can also view the above two mutually perpendicular axes as the axes of a 2D Cartesian Coordinate System. Here, each pixel would correspond to a point represented in 2D Cartesian Coordinates. We refer to the above coordinate system as the image coordinate system. Different libraries may define certain components of their image coordinate system (such as the location of the origin, order of coordinates, orientation of axes etc.) differently.
In this blog we’ll focus on the image coordinate systems of OpenCV and NumPy. We will also discuss how to transform NumPy image coordinates to OpenCV image coordinates.
Image Coordinate Systems of OpenCV and NumPy
The origin of the OpenCV image coordinate system is at the top-left corner of the image. The X-Axis is along the width of the image, and the Y-Axis is along the height of the image. Each pixel in the image can be represented by a spatial coordinate (x, y), where x stands for a value along the X-Axis and y stands for a value along the Y-Axis.
The origin of the NumPy image coordinate system is also at the top-left corner of the image. The C-Axis is along the width of the image, and the R-Axis is along the height of the image. Each pixel in the image can be represented by a spatial coordinate (c, r), where c stands for a value along the C-Axis and r stands for a value along the R-Axis.
From figures 2 and 3, it is evident that the X-Axis and Y-Axis of the OpenCV image coordinate system are aligned with the C-Axis and R-Axis of the NumPy image coordinate system respectively. The origins of both coordinate systems also point to the same pixel.
So far so good right? However, there is a subtle concept we must understand if we need to move between the above two coordinate systems.
Let us consider an arbitrary point P in the OpenCV coordinate system whose coordinates are (x, y). The same point in the NumPy image coordinate system is given by the coordinates (c, r). Clearly, when (x, y) == (c, r) (or alternatively, (x == c) && (y == r)), both coordinate systems refer to the same point.
Now let’s say we need to manipulate the image at point P and that the image is stored in the NumPy array img. OpenCV functions typically expect us to provide the coordinate information in the order (x, y) for its image manipulation functions. However, to access the element at point P using NumPy, we need to do img[r, c].
Notice how the dimension order for indexing the NumPy array img is r, c and not c, r. Hence, we need to be careful about the order in which the coordinate information is used when we have to work with both libraries.
The above point will be more clear if we explore OpenCV and NumPy through examples. The next section presents a toy problem where we would need to use both libraries and handle moving coordinates from one coordinate system to another.
Moving Coordinates between OpenCV & NumPy: an Example
The toy problem is defined as follows:
Problem Statement: Given the input image as shown in the below figure, draw one red square around each white pixel. Each square should have a side length of 6 pixels and its corresponding white pixel must lie in its centroid.
The dimensions of the input and output images in the above figure are rather small (width = 50 pixels, height = 25 pixels). For ease of visual analysis and explanation, they have been enlarged and displayed. Each “tiny white square” that you see in the input and output images are in-fact individual white pixels.
The following section walks through the solution for the above problem.
Clearly, there are five white pixels, and so we need to draw five boxes. Assuming that the input image is stored in the NumPy array img, we can solve this problem by using the following steps:
- Step 1: Find the coordinates of the white pixels using NumPy. This will give us the centroid of each square.
- Step 2: Calculate the top-left and bottom-right coordinates of each square using the centroid and the side length.
- Step 3: Using the top-left and bottom-right coordinates, draw each square using OpenCV.
Let us begin with step 1. We can get the NumPy coordinates of the white pixels using the below code snippet. For a more detailed explanation of its working, you can refer to my article on image processing with NumPy.
mask = np.all(img == [255, 255, 255], axis = -1)
rows, cols = mask.nonzero()
Here, the variables rows and cols contain the coordinate information of all the white pixels. More specifically, rows[i] and cols[i] provide the row and column information of the i-th white pixel (where i is one among 0,1,2,3 or 4). Since there are 5 white pixels, there are 5 values stored in both rows and cols. With this, step 1 is complete.
Now, let us consider steps 2 and 3 together. We can use OpenCV’s cv2.rectangle to draw a square provided that we know its top-left and bottom-right coordinates. We can calculate the top-left and bottom-right coordinates of each square using its centroid and its side length.
Assuming the centroid is (x, y) and the side length is A, the top-left coordinate is given by (x – A/2, y – A/2) and the bottom-right coordinate is given by (x + A/2, y + A/2). The below figure provides an illustration of the same:
While the above calculations are rather simple, do note that the centroid information is computed using NumPy, and hence is available in NumPy coordinates. However, the function cv2.rectangle expects the coordinates of the top-left and bottom-right points in OpenCV coordinates.
Here is where the concepts that we learned in the previous section come into play. Let us assume that a point in OpenCV coordinates is referred by (x, y) and a point in NumPy coordinates is referred by (c, r).
From our earlier discussion, we know that if (x, y) == (c, r), then they both refer to the same point. In other words, all our required x coordinates are stored in the variable cols, and all our required y coordinates are stored in the variable rows. Now that we know how the x and y coordinates are stored, we can easily write code to compute the top-left and bottom-right coordinates.
The below code snippet shows how we can perform the above two steps based on our discussion (here, the variable side_length is equal to 6 as per the problem statement):
for x, y in zip(cols, rows):
top_left = (int(x - side_length / 2), int(y - side_length / 2))
bot_right = (int(x + side_length / 2), int(y + side_length / 2))
img = cv2.rectangle(
img, top_left, bot_right,
color = (0, 0, 255), thickness = 1
With that, the toy problem is solved. Though this was only a toy problem, learning OpenCV and NumPy in this way can be extended to more practical problems. For instance, let’s say you have an algorithm that detects the midpoint and dimensions of an object of interest (e.g. an apple) in OpenCV image coordinates. If you want to crop out a rectangular region containing the apple using NumPy indexing, you will need to know the corner dimensions in the NumPy coordinate system.
If you would like to interact with the code, you can refer to this colab notebook.
In this blog we explored how images can be represented by using and learning OpenCV and NumPy’s coordinate systems. We also explored how to move points from one image coordinate system to another.
As mentioned earlier in the text, these concepts are handy for understanding and implementing image manipulation applications. For instance, if you would like to implement some custom image based data augmentation algorithms from scratch (such as rotating an object around a specific point), knowledge of coordinate systems is important. Moreover, to implement or to understand some feature extraction techniques (for instance, the Hough transform), knowledge of coordinate systems is a necessity.
Furthermore, even if you want to use a different library for manipulating images, getting familiar with these concepts will help you think about the nuances of any image coordinate system.
Be sure to check the related resources below for more of Bharath’s technical articles, and sign up to the Lionbridge AI newsletter for interviews and articles delivered directly to your inbox.