In the following video, you can find the summary of this article.
Canvas introduction
HTML Canvas is a technique for rendering two-dimensional graphics in web browsers. It allows sites to use Javascript technologies to generate images of 2D shapes or texts.
HTML Canvas is a technique for rendering two-dimensional graphics in web browsers. It allows sites to use Javascript technologies to generate images of 2D shapes or texts. You can see that the following Javascript code is generating the following image step by step. Every line of code is setting up something on the Canvas Engine or draws something.
Step 1
Initializing the canvas, and drawing an orange rectangle with the given size, at the given position.
var canvas = document.getElementById("myCanvas");
var context = canvas.getContext("2d");
context.fillStyle = "#f60";
context.fillRect(125,1,62,20);
Step 2
Changing color to blue, and printing a text with Arial 14px fonts to the given position
context.textBaseline = "top";
context.font = "14px 'Arial’”;
context.textBaseline = "alphabetic";
context.fillStyle = "#069";
context.fillText("kameleo.io <canvas> research",2,15);
Step 3
Changing color to green, and printing the same text over the other one.
context.fillStyle = "rgba(102, 204, 0, 0.7)";
context.fillText("kameleo.io <canvas> research",4,17);
This feature can be useful for several use cases, such as generating avatar images automatically or creating special graphics for a website.
However, cavnas browser fingerprinting has also been used for bot detection. First it was used to detect multi-accounting, but later it could be used to flag bots. We will describe this in detail below.
Canvas for fingerprinting
Due to the diverse configurations of desktop PCs and mobile devices, the same HTML canvas image can be rendered differently. This is happening because it is not clearly defined how computers should render complex shapes, color gradients, shadows, texts with different fonts, and emojis. Although the rendered images may appear similar to human eyes, there can be a few differences if we check every pixel one by one.
Even if we are talking about very small images, containing just a random number which is used by Akamai’s protection system to filter traffic coming from bots.
We often see invalid information about canvas fingerprinting on the web. According to a 2nd class anti-detect browser's website, these 16x16 pixel-sized images shouldn't be altered in any way, because they use them to check if Canvas Noise is turned on. This is invalid, they use it for fingerprinting, but as it is visible, they can be different on different machines, so it must be altered without noise. Keep reading, and you will find the solution.
The difference between the results of image rendering will be more visible if the configurations of the machines that rendered the image are very divergent, and the image is more complex. This canvas image is used by Google's technology called Picasso for browser fingerprinting purposes. The differences in results are more striking when it is rendered on different operating systems and browsers.
Based on the hash generated from the rendered image, websites can get information about your PC. It may be familiar to you from browserleaks.com.
We will see how it works, but first, we explain what is Canvas Noise.
Canvas Noise
If two images are different, their hashes are distinct as well, so they were drawn on computers with different configurations.
Several years ago if you wanted to do multi-accounting you could use this method, called "noise" with your virtual browser profiles. Anti-detect browsers can modify the rendered image: if they change the color of one single pixel of the image randomly, human eyes won’t recognize the difference but we will get a very different canvas image hash.
Nowadays, websites are using machine learning algorithms to detect canvas fingerprint spoofing; this is why this technique isn’t as effective anymore. Let's see how these algorithms work.
Machine learning + Canvas fingerprinting
Websites using advanced browser fingerprinting systems have a database of hashes that show how canvas image is rendered on existing, real-world devices. These images with the hashes are stored and once they got a new visitor, they generate the hash with the visitor's browser. Then they can look it up in the database, and if they find a matching hash they will know that this hash appeared before. They will also have information about the device, as basically the same hash can only be generated on a device with the same configuration.
Once “canvas noise” is applied to an image the hash won’t match any valid configuration, it will be 100% unique, and it will be obvious that "noise" spoofing is applied.
Anti-detect vendors suggest you turn the canvas spoofing off but this is still not the proper way to stay under the radar. That's why Kameleo developed Intelligent Canvas Spoofing, which is a possible solution.
Our research on canvas
Our team has analyzed thousands of canvas images used for browser fingerprinting. We have collected data from over 1 million websites. The results are clear: the main parameters that affect the canvas image hash are the
- operating system,
- browser,
- and GPU vendor.
The version of the browser and model of the graphical card may also affect it. However, what we see is that a Windows 10 PC with the same GPU will generate the same canvas image hash with any version of any Chromium-based browser from the past year.
As a result, the previously mentioned algorithms can determine the OS, browser, and GPU vendor of the visitor, based on only the generated canvas image hash. This is how browserleaks.com tells the OS and browser based on the canvas fingerprinting.
See our video if like to see the research as well.
These derived values must match the values read from the user-agent and other parts of the browser fingerprint, in order to prevent looking suspicious.
On the following image's left side you can see the values coming from the user-agent, but on the right side, the machine learning algorithm shows that this specific canvas image hash is a Windows with Chrome. This happened because we turned off the canvas spoofing when we created this screenshot.
Intelligent canvas spoofing
The Kameleo Intelligent Canvas Spoofing is the solution to canvas spoofing. After analyzing thousands of canvas images, we have developed a method that alters the operation of your HTML canvas, resulting in a non-unique, but modified canvas image hash.
This doesn’t mean that each virtual browser profile you start will have a different canvas image hash (like it is normal if we check real devices), but it will trick the websites and the derived data will match your spoofed user-agent. As you can see on the following screenshot, I run a virtual browser profile with macOS-Safari configuration on my Windows PC with our custom-built Chromium browser, the website still thinks I’m using macOS with Safari as my canvas is spoofed but it is non-unique.
Intelligent Canvas Spoofing for browser automation users
We believe that this is the most modern method of canvas spoofing, and it is recommended for everyone. However, it becomes even more important when performing browser automation.
You may encounter difficulties when running automation on Linux servers, as protection systems can detect the generated hashes and noise hashes. To avoid detection while running virtual browser profiles with Windows and macOS, you should use the Kameleo Intelligent Canvas Spoofing.
Our cross-platform solution is on the roadmap.