When most of us think about Linear Algebra, the first idea that comes out from our mind is a system of linear equations that needs to be solved, or its graphical variant, a Cartesian plane with two lines, in order to solve the problem of finding the intersection point; but what probably most of people don't know is that tools like Photoshop or libraries like OpenGL or DirectX make an extensive use of linear algebra for most of their functionalities.
It this series of posts we will discuss some basic concepts of the application of Linear Algebra in Image Processing. In the first post we will focus on its applications and how an image can be represented as a matrix, in the second and third one we will explore matrix operations on an image, and in the last one we will show some practical examples in JavaScript.
Image processing can be defined as the processing of images using mathematical operations. With the introduction of computers, the processing is performed by means of computer graphic algorithms to digital images, which are obtained by a process of digitalization or directly using any digital device. The use of computer to perform image processing on digital images is called digital image processing.
Digital Image processing is not just limited to retouch or resize images captured by the camera; it is widely used nowadays. Some of the major fields are: medicine, remote sensing, data transmission and encoding, robotics, computer vision, pattern recognition, film industry, microscope imaging and image sharpening and restoration.
Some of the computer graphics operations that can be easily done by using the linear algebra are: Rotation, skewing, scaling, Bezier curves, reflections, dot and cross products, projections, and vector fields. Other more complex operations like filters, require the combination of linear algebra with other mathematical tools.
Let's consider the following image and its black & white variant .
If we zoom in the black & white image, we get:
Notice that the image can be represented as a grid of 16x16 small pieces, which are called pixels (the smallest graphical element of an image, which can take only one color at a time). If we can assign numbers to each color, then, the grid of pixels can be represented as a numerical matrix.
If in the previous image, we assign 1 to the white color, and 0 to the black one, then, the image can be represented as a 16 x 16 matrix, whose elements are the numbers 0 and 1.
┌ ┐ │ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 │ │ 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 │ │ 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 │ │ 1 1 1 0 1 0 0 0 0 0 0 0 0 1 1 1 │ │ 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 │ │ 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 │ │ 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 │ │ 1 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 │ │ 1 0 1 0 1 1 1 0 1 1 1 0 0 0 0 1 │ │ 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 1 │ │ 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 │ │ 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 │ │ 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 │ │ 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 │ │ 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 │ │ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 │ └ ┘
Using the same procedure, we can also represent grayscale image as matrices, but in this case, there are more than two numbers. For this purpose most of the digital files use numbers between 0 (black) and 255(white) as a representation of the intensity.
Matrix representation of color images depends on the color system used by the program that is processing the image. For didactic purpose we will use the RGB (the most popular one), where each pixel specifies the amount of Red (R), Green (G) and Blue (B), and each colour can vary from 0 to 255. Thus, in the RGB, a pixel can be represented as a tri-dimensional vector (r, g, b) where r, g and b are integer numbers from 0 to 255.
Most of the programs store the tri-dimensional vector as a single integer, using the following mapping function:
v = f(r, g, b) = r*65536 + g*256 + b
Notice that 65536 = 2562
The opposite procedure (get the numerical value for every color from the integer value) can be done using the following formulas:
r = v / 65536
g = (v % 65536) / 256
b = v % 256
where % is an operator to get the reminder of the integer division and / is referring to the integer division operator.
Other programs store the vectors as hexadecimal values, concatenating the three values in hexadecimal notation. This is how colors are stored in web pages. The previous example is represented as 83C51D in this notation.
In the next post we will discuss some matrix operations on the image matrix representation, and how they affect the original image.
Your comment