One common problem in the game industry, is how to handle image resolutions on multiple types of screen. Here we will discuss the sort of scaling types that exists, scaling techniques, some details on how they work, and finally, the solution that I have chosen to handle scaling problems on Kidoteca mobile games.
The first problems related to that, were the ones about how to play older games on newer screens, thus requiring you to scale upwards. The most obvious solution is just measure the nearest pixel in some arbitrary direction and replicate it.
As can be seen in the previous image (scaled 2x), this has a obvious problem, the result is just a blocky image, not necessarily clearer, and certainly not prettier. And this technique has a less obvious problem, it only work for integer scaling, you can only make the images two, three, four times bigger and so on, in some cases this is enough, old console games, and early computer games tended to use as resolution something close to 320×240, you can make this two times bigger to 640×480 and still fit a 480p TV screen, or a 720 pixel tall laptop screen. But sometimes you have a game that uses 55% of the screen size, and if you double it, it gets bigger than the screen.
The most easy way to scale by a non-integer ratio is to use a technique similar to the last one, but mixing it with filtering.
Filters and Interpolation
Filters as some people like to call the scaling techniques, are algorithms and mathematical formulas to calculate the value of a certain pixel based on the information that we can capture from its neighbours. It is easier to understand this, when you understand that pixels are NOT squares, pixels are just points, with a colour, a image on a computer is just a matrix of points with no size, the fact that modern screens displays them as squares or rectangles do not matter when doing the necessary maths (and by the way: some TV sets used to display pixels triangles composed of circles, one for each colour…). The most simple filters, are just mathematical formulas, without a algorithm, those actually are interpolation functions, getting some values, and interpolating between them.
Bilinear interpolation is the most common of the simple techniques to upscale a image by a non-integer value, it can also be used to downscale a image. Theoretically it is simple, it get the values of the nearest four pixels, and use linear interpolation in 2D, linear interpolation is for example when you want to find a value in a certain position between two other values.
A example of linear interpolation is when you have a graph full of points, if you draw straight lines between each point and the next, the result is the same if you used linear interpolation several times to find all other points.
But we can have a even simpler example, suppose you are planning a car trip, and you want to know the temperature in your destination, but when you go looking for it, you discover only that on marker 30 of the road, the temperature is -20 Celsius degrees, and on marker 2000 of the road, the temperature is 30 Celsius degrees, and your destination is at market 800, and you have no other information of temperature.
The solution is use linear interpolation, you first figure at what percent position of the road is your destination, thus you divide the distance from the first marker to your destination by the size of the road between the markers, the formula becomes “(target – start) / (end – start)” thus for us this is 800 – 30 / 2000 – 30, that is 770 / 1970, thus our target is 39% along the road. Then you need to know how much is the temperature range between the markers, and how much 39% of the temperature is, our formula then is “(final – start) * last formula result” thus it is (30 -(-20)) * 0.39 that is 19.5.
You can combine the formulas for linear interpolation, they become y = y0 + (y1 – y0) * ((x – x0) / (x1 – x0)) .
Bilinear interpolation, is when you do this twice, one time for each direction (vertical and horizontal in our case), and then multiply the results, getting a quadratic result. Or in simple terms, each resulting pixel is the weighted average of the nearest four pixels of the original image.
The image above is the result of using bilinear interpolation, notice that it has some weird defects now, looking a bit fuzzy, blurry and a bit aliased, it is clear that although bilinear interpolation is really a great start to do non-integer scaling, the result is still lacking, or to be frank, ugly. It has one advantage though: it can deliver a decent quality when you are placing textures on a 3D object, and is very fast to calculate relative to other methods of doing that (obviously, doing no interpolation, that is, using something akin to the already shown nearest neighbour, is faster, but the result is so ugly and jarring that is pointless to attempt that).
Bicubic interpolation is the next best step compared to the previous one, it is also used to paste 2D textures in 3D images. It is a similar principle, you take two cubic interpolations, vertical and horizontal and use them. It also requires more data than 4 pixels, it is because each direction needs more than a start and finish point, if this was a graph, the result would be instead of straight lines toward each point, it would be some smooth lines, that take in account more points, so if we have for example in the graph a portion of it where there are 3 points a bit not aligned, instead of seeing two straight lines with a oblique angle, you would see a very smooth arc.
The image above shows bicubic interpolation in action, yes it is smoother than bilinear, but it has a problem of halos near edges (take a look at the point where black and blue background meets, it has now a light blue line, or the life bars that now look a bit glowy), in a way it might be desired, increasing the perception of sharpness of image compared to a bilinear interpolated image, but sometimes the effect is very blaring and ugly, this is result of the smoothing between several points mentioned previously, sometimes the effect “overshoots”, and generate some sort of waves in the image.
The solution beyond using interpolation (there are several others we don’t covered, we will cover them on downscaling, where they are more important) is to use filters with more complex algorithms, that instead of just taking samples and doing math on them, they make decisions based on the available information.
One early filter of that was the Eagle scaling filter, it was made only to double a image size, it works by taking a pixel, and making it become four pixels, then it checks in each new pixel if the pixel to the left, and the pixel to the top, have the same colour, if that is true, that pixel became the same colour as the pixels checked, after doing that for each of the four pixels, it go to the next pixel and it becomes four again, and this repeats until all pixels are doubled.
This algorithm was interesting because it made images that look very different from the interpolated images, sometimes making them much nicer, several other algorithms were made then based on Eagle idea, like 2xSal, Super Eagle, Eagle 3x and several others.
The HQx family of filters were inspired by the Eagle filter too, they work by trying to detect lines on the original image, it is unfortunately too complex to explain on this single post, but it has a official site.
HQx (HQ2x on this case) is clearly really great in making simple shapes become clearer, the text is much better now, also the arrow near DUKE looks very good, the problem with it is that the complex parts of the screen, like the foliage or the moss on the ground look all blotchy, weird and confusing. Also not shown on this picture, but HQx tend to make curved objects become a series of straight lines, that look much less curved or round.
Seemly inspired by the HQx filters, someone made a even better filter, it works by having multiple levels of filtering, it first do something similar to HQx, and then it scans the image again and try to improve. The result is rather interesting, also it has a very detailed explanation.
The image above was made with a older version of xBR, because the newer versions only do 4x (that are too big for the blog post), you can see clearly that it detect round shapes much better than HQ2X, with the background now looking more organic, also the ground is a bit less messy, the problem that it has now is that letters also became too rounded, and the ammo icon also became quite distorted, but overall it is also very interesting.
For newer games the problem is not upscaling, it is downscaling instead, specially when considering mobile games, or very high resolution games on TV or Computer. A extreme example would be make a game that uses the maximum resolution of a new TV, 7680 × 4320 pixels, and then you also want to make it possible to play on the laptop I am using now, 1366 x 768, this mean making images for the TV, and then figuring a way to display them downscaled to 17% of their original size.
A more practical example would be the issue I have when working on mobile games, displaying things on the newest iPad, 2048 x 1536 but also allowing them to work on old iPhones 480 x 320, the usual solution in the industry is ship multiple sets of images with your applications, and use the most appropriate ones (and when they are not perfect, scale them using some interpolation in real-time), on Kidoteca we create the images in the iPad size, and then make versions with half, and one quarter the resolution, as naming convention we act like if the smallest image was the original, and add @somenumber to the bigger ones, a image called “test.png” thus will have a version named “[email protected]” and “[email protected]” that have double, and quadruple size respectively.
Originally we created the smaller images using whatever filtering came on the image editing software, thus they were usually bilinear or bicubic interpolated, as we saw previously, those two mathematical interpolation functions are really simple, and although they result into acceptable images, it is not great images.
The Gaussian filter is actually a filter, it can be used for interpolation, but it can also be used in non-scaled image, just filtering it, it works by applying a mathematical formula on a group of pixels. The formula is f(x) = a * e * -((x – b)² / 2 * c² ) + d where e is Euler Number, its shape in a graph is a bell curve, the filter effect on a image you can imagine as a 3D bell over each pixel, with the height of the bell meaning how much the pixel below that part affects the pixel in the center.
As you can see in the image above, the result of a gaussian filter is blurry, in fact when you DO want to blur a image on purpose, gaussian filter is really great for that, with careful tuning
The quadratic filter works similarly to the gaussian filter, but with another equation.
The result is even more blurry than the gaussian filter.
The “sinc” filter name come from the mathematical “sinc” function, that actually means “Cardinal Sine”, the Cardinal Sine formula is “sinc(x) = sin(x) / x”, but for signal processing (like scaling images or meddling with audio) another formula is used, its correct name is normalized cardinal sine, but most people call it “sinc” anyway, that formula is “sinc(x) = sin(π * x)/(π * x)”, its shape looks like a sine wave, but with each peak near the center being taller, and the distance between valleys near the center being slightly bigger.
The sinc filter has two problems, first, it easily cause severe ringing artifacts on the image, as I mentioned earlier its shape is a sine wave, and thus on a image it can make the entire image “wavy”, like when you throw a stone in a pond. The second problem is that although all previous math functions have a limited scope of pixels, ranging from 4 pixels in bilinear interpolation, to a arbitrary but limited number in Gaussian filter, the sinc filter can take the whole image as input, that is because unlike the Gaussian filter bell shape, that eventually gets so close to zero that you can safely ignore the effects of that pixel, the sinc filter waves take a very long distance from the center to be near zero, because of this, the sinc filter is actually never used in its pure form, instead most people use a “window” version, where the choose a window, or a part of the image around each pixel to calculate the effects of the filter.
The Lanczos filter is a variation of the sinc filter, it uses the same sinc function, but also decide with math the window of the filter (and thus the contribution of each pixel to the sinc filter).
Lanczos for downscaling actually is great, for upscaling (not shown here) not so much, its biggest issue is that although it has less ringing than the sinc filter, it can still create ringing, the advantage is that on images that are supposed to look sharp, they end looking sharp, contrasting to bilinear, bicubic or gaussian resampling techniques, that make downscaled images blurry instead. Also Lanczos is slow, not really suitable for anything realtime.
Solution for Mobile Software
As it was mentioned earlier, mobile software frequently ships with several sets of images, and the technique that was being used at Kidoteca was to just create the largest image needed, and then scale them down with interpolation, it was good enough, but since there is a incredible amount of screen shapes and sizes, further interpolation was needed on most screens, thus making the image quality suffer quite a bit. I downloaded a program for command line image processing named ImageMagick, and then I fooled around downscaling images until I could find the best results, and Lanczos won, for our art style, and technique (starting big and downsizing) the result is really great, with the smaller images looking very sharp.