A huge topic indeed.
I'm being shaken lately in my little hosting stack, which has been running fairly well for about twenty years...
I saw Anubis pass by, a kind of challenge without Cloudflare, never tried it, it looks pretty good.
https://anubis.techaro.lol ?
To be honest, I have several types of problems:
- Legitimate bots that do whatever (Facebook, ClaudeBot…)
- Legitimate bots that used to behave well but are getting harder to handle (Google…)
- Aggressive bots/scrapers clearly aimed at taking the site offline (residential proxies sometimes with millions of IP addresses) or exploding hosting costs if you’re on a cloud with autoscaling.
In these three cases, Crowdsec is useless for me.
I'll detail the third case, the one where Cloudflare can become indispensable.
Adding Nginx allowed me to stop fearing Apache workers exhausting when you end up handling 5 000 new connections in a minute.
I know, it’s been a standard for many years, but why change what works… I’m in the process of migrating almost all my machines.
Next comes the problem of PHP workers and DB queries → CPU resource exhaustion.
And there, I act based on the profile of the most problematic bot, the aggressive one.
The one that throws 5 req / sec at a listing page with a bunch of filters and shows the maximum number of products.
Each request comes from a unique IP address that you see again three weeks later… for another request of the same kind.
Of course these are only residential IPs (Orange, SFR, etc.) – otherwise you could just ban the whole ASN.
The user‑agent also changes (obviously, otherwise, again, it wouldn’t be fun).
Example stats for one minute:
ASN ORGANISATION COUNTRY UNIQUE_IPS REQUESTS
7018 ATT‑INTERNET4 - AT&T Enterprise US - United States 429 447
21928 T‑MOBILE‑AS21928 - T‑Mobile… US - United States 188 193
22773 ASN‑CXA‑ALL‑CCI‑22773‑RDC … US - United States 130 136
20001 TWC‑20001‑PACWEST - Charter… US - United States 124 132
20115 CHARTER‑20115 - Charter Comm… US - United States 124 128
6167 CELLCO‑PART - Verizon Business US - United States 103 109
5650 FRONTIER‑FRTR - Frontier Co… US - United States 91 97
11426 TWC‑11426‑CAROLINAS - Chart… US - United States 70 73
5089 NTL, GB GB - United Kingdom 66 72
6128 CABLE‑NET‑1 - Cablevision S… US - United States 69 71
10796 TWC‑10796‑MIDWEST - Charter… US - United States 64 65
33363 BHN‑33363 - Charter Communi… US - United States 62 64
14593 SPACEX‑STARLINK - Space Exp… US - United States 44 48
2856 BT‑UK‑AS BTnet UK Regional … GB - United Kingdom 42 43
5607 BSKYB‑BROADBAND‑AS, GB GB - United Kingdom 38 42
209 CENTURYLINK‑US‑LEGACY‑QWEST… US - United States 39 39
[...]
It’s not ideal, but for now I have two strategies:
Netfilter firewall
The advantage is that the request never reaches Nginx/Apache/PHP/web site.
I switched last year to nftables to replace iptables – incredibly faster and more functional, a must when you start dealing with large volumes of IPs/CIDRs. It does require a real learning curve if you’re familiar with iptables.
Apache config per bot
Here AI is a huge help to build a sort of bot profile and create specific Apache configurations so that, under certain conditions (URL, string length, User‑Agent, referrer, etc.), Apache replies with a 429.
The result is very little CPU usage and most of the time the bot thinks the attack succeeded and backs off.
I’m still looking to make something 100 % automated, because I’m tired of receiving alerts. I’ve seen that Nginx offers a lot more possibilities compared to Apache on many aspects (including interleaving human‑verification challenges), but I haven’t had time to dive deeper yet.