Some websites receive traffic from malicious users interested in shutting down or breaking into their sites. Let’s see how they can defend themselves and what their tools are.
The defender lines
The protection of the big companies’ sites is very costly and needs a lot of energy. Therefore they outsource this task. Of course, there are exemptions, e.g., Google, which has its own protection system, Google Antibot.
Most of the protection services filter the traffic, standing like a shield in front of the sites, and only let pass the approved visitors.
They can block bots, DDoS attacks, and mass-hacker invasions. These are the most common companies: Cloudflare, Akamai, Datadome, Kasada, and Imperva.
They cannot access the website’s database, but they monitor the traversing traffic and observe the visitor’s
- IP address and Geolocation
- TLS fingerprint
- HTTP headers
They also monitor the frequency of received requests by one client.
Modern protection systems check if the captured data package (browser fingerprint) is consistent or not. For example, if the client’s user-agent states that the client’s browser is Chrome 100, they are checking if other parts of the browser fingerprint also suggest that the browser is actually a Chrome 100. To achieve this, they often compare results with other visitor’s browser fingerprint.
These frameworks have a big advantage: they are processing several million GB of data every day; therefore, they exactly know what an intruder looks like.
Chink in the armor
There are some tools, which can decrypt these challenges, which can be really useful, but not in the long term. The companies can make some changes in their code at any time, then it needs more time to decrypt the new code from the open-source community. Examples:
Possible outcomes from a user perspective
- If the generated browser fingerprint is familiar to the visited websites, then it can pass
- If the browser fingerprint is on the block list, then the user cannot browse the website.
- If the browser fingerprint is brand new for the website, then it can be someone who is doing some cheating
Block list (also known as Blacklist)
Most of the cloud servers are on the block list, so you cannot scrape the web from e.g., Google Cloud, Amazon AWS, Microsoft Azure, etc. Some tools are also there initially like Selenium. The block lists also contain plenty of IP addresses.
Solution? Disguise yourself.
If you are not on the blocklist, you just need to blend into the crowd – then the websites won’t detect you.