Bot resistance

Alright, I'm starting the topic/debate.

How do you handle the ever‑increasing number of bots that scan our sites?

For me, Cloudflare whenever possible, bot protection, managed challenge for anything that doesn't come from Europe or the USA, blocking a whole range of user agents.

Otherwise, Crowdsec, with a scenario that bans a good number of user agents directly.

I've seen Anubis, a kind of challenge without Cloudflare, never tested, looks pretty good.

We add rate limiting at the level of an entire server (deployed only for a client) against AI bots via Nginx.

I try to move my clients to static sites when possible, a very long battle for many who have been raised on WordPress. But WordPress, with the mass of bots that spend all their time scanning the whole site, becomes a real server‑load problem.

And you, how do you handle this?

A huge topic indeed.
I'm being shaken lately in my little hosting stack, which has been running fairly well for about twenty years...

I saw Anubis pass by, a kind of challenge without Cloudflare, never tried it, it looks pretty good.

https://anubis.techaro.lol ?

To be honest, I have several types of problems:

  • Legitimate bots that do whatever (Facebook, ClaudeBot…)
  • Legitimate bots that used to behave well but are getting harder to handle (Google…)
  • Aggressive bots/scrapers clearly aimed at taking the site offline (residential proxies sometimes with millions of IP addresses) or exploding hosting costs if you’re on a cloud with autoscaling.

In these three cases, Crowdsec is useless for me.

I'll detail the third case, the one where Cloudflare can become indispensable.

Adding Nginx allowed me to stop fearing Apache workers exhausting when you end up handling 5 000 new connections in a minute.
I know, it’s been a standard for many years, but why change what works… I’m in the process of migrating almost all my machines.

Next comes the problem of PHP workers and DB queries → CPU resource exhaustion.
And there, I act based on the profile of the most problematic bot, the aggressive one.

The one that throws 5 req / sec at a listing page with a bunch of filters and shows the maximum number of products.
Each request comes from a unique IP address that you see again three weeks later… for another request of the same kind.
Of course these are only residential IPs (Orange, SFR, etc.) – otherwise you could just ban the whole ASN.

The user‑agent also changes (obviously, otherwise, again, it wouldn’t be fun).

Example stats for one minute:

ASN ORGANISATION COUNTRY UNIQUE_IPS REQUESTS
7018 ATT‑INTERNET4 - AT&T Enterprise US - United States 429 447
21928 T‑MOBILE‑AS21928 - T‑Mobile… US - United States 188 193
22773 ASN‑CXA‑ALL‑CCI‑22773‑RDC … US - United States 130 136
20001 TWC‑20001‑PACWEST - Charter… US - United States 124 132
20115 CHARTER‑20115 - Charter Comm… US - United States 124 128
6167 CELLCO‑PART - Verizon Business US - United States 103 109
5650 FRONTIER‑FRTR - Frontier Co… US - United States 91 97
11426 TWC‑11426‑CAROLINAS - Chart… US - United States 70 73
5089 NTL, GB GB - United Kingdom 66 72
6128 CABLE‑NET‑1 - Cablevision S… US - United States 69 71
10796 TWC‑10796‑MIDWEST - Charter… US - United States 64 65
33363 BHN‑33363 - Charter Communi… US - United States 62 64
14593 SPACEX‑STARLINK - Space Exp… US - United States 44 48
2856 BT‑UK‑AS BTnet UK Regional … GB - United Kingdom 42 43
5607 BSKYB‑BROADBAND‑AS, GB GB - United Kingdom 38 42
209 CENTURYLINK‑US‑LEGACY‑QWEST… US - United States 39 39
[...]

It’s not ideal, but for now I have two strategies:

Netfilter firewall
The advantage is that the request never reaches Nginx/Apache/PHP/web site.
I switched last year to nftables to replace iptables – incredibly faster and more functional, a must when you start dealing with large volumes of IPs/CIDRs. It does require a real learning curve if you’re familiar with iptables.

Apache config per bot
Here AI is a huge help to build a sort of bot profile and create specific Apache configurations so that, under certain conditions (URL, string length, User‑Agent, referrer, etc.), Apache replies with a 429.
The result is very little CPU usage and most of the time the bot thinks the attack succeeded and backs off.

I’m still looking to make something 100 % automated, because I’m tired of receiving alerts. I’ve seen that Nginx offers a lot more possibilities compared to Apache on many aspects (including interleaving human‑verification challenges), but I haven’t had time to dive deeper yet.

Here while I was drafting this, it happens again on a client:

Stats for the last minute:

3106 unique IPs, 761 distinct ASes, 3163 requests

Each minute more than 3,000 new IPs bombard a request each.
Across 761 different ASes… it’s really annoying.

Well, there are people who don’t like you!

Crowdsec, even with community lists, doesn’t that limit the problem?
Did you manage to identify a pattern in the requests? Or is it really legit but massive?
Otherwise, I don’t know how your sites work, but have you tried putting an Nginx in front as a reverse proxy that caches the responses? That relieves Apache a lot. But you need to provide a cache‑purge mechanism if necessary, and, of course, all the appropriate exclusions.

At this point, apart from Cloudflare to block upstream (bot protection, managed challenge) or possibly Anubis to filter before reaching Apache, I don’t really see what you can do.

The main objective being to solicit the – possible Apache.

You have people who don’t like you!

Yeah, you saw them, right :smiley: :smiley:
It’s a Prestashop site that sells weapons… I also have a client who sells alcohol and is in quite a pickle too…

This machine already has Nginx as a reverse proxy (but I think I can improve it; I’m still at the beginning with it).
I mitigate with the firewall and the request type… For this wave, the attacker apparently understood that I was responding with 429s and adapted…

I played with the firewall a bit aggressively. It’ll be fine for tonight and this night; I’ll fine‑tune it tomorrow.

857 unique IPs, 349 distinct ASes, 901 requests

The server can easily handle this kind of load:

But the job just becomes annoying, actually.

Here, concretely you need to block upstream...
So either Cloudflare + CrowdSec with the CrowdSec bouncer.
Or directly CrowdSec locally + community lists that block on the firewall, possibly test Anubis (I’ve never tested it myself, but that’s basically what you need if you’re not using CF).

With PrestaShop, you can hardly cache anything at the CDN/reverse‑proxy level, since that idiot throws cookies at everyone. And if each IP only "ping"s once, CrowdSec won’t perform miracles anyway, except to help a bit via the community lists that can "pre‑ban" 30k IPs easily.

I’d like to avoid CF as much as possible; the client was complaining about it.

Are you talking about https://anubis.techaro.lol ?

yes, I’m talking about that.

I’ve never tested it personally, but it could help you.

Test it on another server to get the main.

https://github.com/TecharoHQ/anubis

Indeed, 20K stars. You can add Proof of Work on certain pages only—it really looks cool, I'm going to check it out!

Will you give me feedback? I've never tried it, since I usually use Cloudflare.

Thanks for this topic.

In case it might interest anyone, as for me, a small hobby blog, formerly on WordPress, which I recently migrated to MediaWiki. It caused me a lot of trouble with AI bots that blew up the request count by crawling all the technical links, especially by requesting all diffs between all previous versions of all pages!

After a fair amount of struggle, I think I finally managed to get it working with:

Cloudflare free: (managed challenge on all special pages)

  • Extension:CrawlerProtection (blocking most "technical" pages for anonymous users)

Just my two cents in case it might be useful to someone.

Yes, Cloudflare does an excellent job and it's free.
But some sites can't use CF depending on their activity, and others don't want to use CF (it's limited by GDPR, or simply a US solution).
Personally, I try to stick to European solutions, but Cloudflare is really too good.

Yes ok but you still sometimes have to pay (and that's normal):

  • If you want detailed logs of what's happening on the inbound side
  • If you want to set more than 5 rules
  • ...

I had a case where CF was throwing random incomprehensible 502s; they also have incidents.
Proof of work or the under‑attack mode, for example, still constitute a friction for an e‑commerce site, for instance.
There are also a whole bunch of good bots that can end up being blocked.

When your client is behind a CloudFlare, your server's firewall becomes useless (because it only sees CF's IPs, which are whitelisted), so you can get big surprises since Crowdsec / F2ban / etc. no longer serve any purpose.

yes, absolutely, after that, you have a Cloudflare bouncer for Crowdsec, which will detect the IPs and block them directly on Cloudflare (which is impossible with the OVH Infra CDN).
And if your server only sees Cloudflare IPs, it’s badly configured :wink:

And if your server only sees Cloudflare IPs, it’s because it’s misconfigured

You’ve caught my interest there, I do have the IPs in the Apache or Nginx logs but netfilter doesn’t see them, and I understood that this shouldn’t be possible.
A murky L7‑layer story (where the source IP is forwarded by CF) but that can’t be accessed by the kernel (that part isn’t my forte).

Ha, but you won’t be able to filter at the firewall level, that’s impossible.
But you need to look at the right IPs in the Apache/Nginx logs.

Then, if you want to block directly on Cloudflare with CrowdSec, you have to use the CrowdSec bouncer, or write one yourself.
Here’s that bouncer: https://docs.crowdsec.net/u/bouncers/cloudflare-workers/

There was also an Nginx bouncer, to block at the Nginx level, which should work though: https://docs.crowdsec.net/u/bouncers/nginx/

Otherwise the Apache bouncer, which I’ve never tested: https://docs.crowdsec.net/u/bouncers/apache_bouncer/

So either you block on Cloudflare directly, or you block via Nginx/Apache.
But you’ll never be able to block at the firewall level.

still for my arms‑dealer client, stats from midnight to 11:00 am:

196,280 unique IPs, 4,269 distinct AS, 507,216 requests

over 42,000 different AS... no kidding, guys :downcast_face_with_sweat:

And it could even be worse because:

List of countries DROPped by the firewall:

AE United Arab Emirates
AR Argentina
AZ Azerbaijan
BD Bangladesh
BO Bolivia
BR Brazil
BY Belarus
CL Chile
CO Colombia
DZ Algeria
EG Egypt
ET Ethiopia
HU Hungary
IQ Iraq
IR Iran
JO Jordan
KE Kenya
KP North Korea
KZ Kazakhstan
MA Morocco
MX Mexico
OM Oman
PA Panama
PK Pakistan
PS Palestine
PY Paraguay
SC Seychelles
SN Senegal
SY Syria
UA Ukraine
UY Uruguay
UZ Uzbekistan
VE Venezuela
VN Vietnam
ZA South Africa

List of AS drop by the FW:

AS701 601 IPv4 12 IPv6 UUNET - Verizon Business, US (US)
AS4837 553 IPv4 295 IPv6 CHINA169-BACKBONE CHINA UNICOM China169 Backbone, CN (CN)
AS5089 101 IPv4 2 IPv6 NTL, GB (GB)
AS5607 34 IPv4 6 IPv6 BSKYB-BROADBAND-AS, GB (GB)
AS5650 568 IPv4 7 IPv6 FRONTIER-FRTR - Frontier Communications of America, Inc., US (US)
AS6128 45 IPv4 2 IPv6 CABLE-NET-1 - Cablevision Systems Corp., US (US)
AS6167 556 IPv4 887 IPv6 CELLCO-PART - Verizon Business, US (US)
AS7018 546 IPv4 15 IPv6 ATT-INTERNET4 - AT&T Enterprises, LLC, US (US)
AS7922 273 IPv4 5 IPv6 COMCAST-7922 - Comcast Cable Communications, LLC, US (US)
AS7979 336 IPv4 17 IPv6 SERVERS-COM - Servers.com, Inc., US (US)
AS10796 304 IPv4 11 IPv6 TWC-10796-MIDWEST - Charter Communications Inc, US (US)
AS11426 191 IPv4 8 IPv6 TWC-11426-CAROLINAS - Charter Communications Inc, US (US)
AS11798 137 IPv4 1 IPv6 ACEDATACENTERS-AS-1 - Ace Data Centers, Inc., US (US)
AS14593 229 IPv4 222 IPv6 SPACEX-STARLINK - Space Exploration Technologies Corporation, US (US)
AS20001 406 IPv4 12 IPv6 TWC-20001-PACWEST - Charter Communications Inc, US (US)
AS20115 638 IPv4 109 IPv6 CHARTER-20115 - Charter Communications LLC, US (US)
AS21928 16 IPv4 3 IPv6 T-MOBILE-AS21928 - T-Mobile USA, Inc., US (US)
AS22773 365 IPv4 16 IPv6 ASN-CXA-ALL-CCI-22773-RDC - Cox Communications Inc., US (US)
AS33363 206 IPv4 19 IPv6 BHN-33363 - Charter Communications, Inc, US (US)
AS36352 916 IPv4 16 IPv6 AS-COLOCROSSING - HostPapa, US (US)
AS36924 25 IPv4 14 IPv6 GVA-Canalbox, CI (CI)
AS39798 9 IPv4 5 IPv6 MIVOCLOUD, MD (MD)
AS41564 141 IPv4 7 IPv6 AS41564, GB (GB)
AS44382 7 IPv4 3 IPv6 WHITELABEL, US (US)
AS45102 183 IPv4 27 IPv6 ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN (CN)
AS46635 4 IPv4 ? IPv6 NET3-AI - Contact Consumers, US (US)
AS55286 232 IPv4 4 IPv6 SERVER-MANIA - B2 Net Solutions Inc., CA (CA)
AS60729 3 IPv4 3 IPv6 TORSERVERS-NET, DE (DE)
AS62874 59 IPv4 6 IPv6 WEB2OBJECTS - Web2Objects LLC, US (US)
AS64267 196 IPv4 1 IPv6 AS-SPRIOUS - Sprious LLC, US (US)
AS134450 26 IPv4 2 IPv6 HOSTROYALETECHNOLOGIES-AS-AP HostRoyale Technologies Pvt Ltd, IN (IN)
AS136907 180 IPv4 33 IPv6 HWCLOUDS-AS-AP HUAWEI CLOUDS, HK (HK)
AS137409 264 IPv4 505 IPv6 GSLNETWORKS-AS-AP GSL Networks Pty LTD, AU (AU)
AS152194 117 IPv4 2 IPv6 CTGSERVERLIMITED-AS-AP CTG Server Limited, HK (HK)
AS198953 3 IPv4 ? IPv6 PROTON66, RU (RU)
AS200593 3 IPv4 ? IPv6 PROSPERO-AS, RU (RU)
AS200651 17 IPv4 3 IPv6 FLOKINET, IS (IS)
AS203020 830 IPv4 52 IPv6 HOSTROYALE, IN (IN)
AS204957 123 IPv4 13 IPv6 GREENFLOID-AS, US (US)
AS206092 107 IPv4 ? IPv6 SECFIREWALLAS, CY (CY)
AS208091 ? IPv4 ? IPv6 WOLFSEC-AS3, CH (CH)
AS208323 1 IPv4 1 IPv6 APPLIEDPRIVACY-AS, AT (AT)
AS209605 6 IPv4 ? IPv6 HOSTBALTIC, LT (LT)
AS210644 155 IPv4 21 IPv6 AEZA-AS, RU (RU)
AS210906 63 IPv4 ? IPv6 BITE-US, LT (LT)
AS211298 4 IPv4 2 IPv6 DRIFTNET, GB (GB)
AS212286 25 IPv4 1 IPv6 LONCONNECT, GB (GB)
AS213412 5 IPv4 ? IPv6 ONYPHE, FR (FR)
AS213790 5 IPv4 ? IPv6 LIMITEDNETWORK-AS, GB (GB)
AS214940 2 IPv4 1 IPv6 KPRONET, US (US)
AS215125 1 IPv4 1 IPv6 CYBEROLOGY-AS, NL (NL)
AS216071 75 IPv4 4 IPv6 VDSINA, AE (AE)
AS329166 3 IPv4 1 IPv6 Absolute-Hosting-PTY-LTD-AS, ZA (ZA)
AS396319 29 IPv4 1 IPv6 US-INTERNET-396319 - Oxylabs, US (US)
AS396356 455 IPv4 113 IPv6 LATITUDE-SH - Latitude.sh, US (US)
AS398324 13 IPv4 4 IPv6 CENSYS-ARIN-01 - Censys, Inc., US (US)
AS398705 2 IPv4 2 IPv6 CENSYS-ARIN-02 - Censys, Inc., US (US)
AS398722 1 IPv4 2 IPv6 CENSYS-ARIN-03 - Censys, Inc., US (US)
AS400463 3 IPv4 ? IPv6 DYNANODE-ASN-01 - DynaNode LLC, US (US)
AS401152 114 IPv4 ? IPv6 ADCIL-ASN-01 - Ace Data Centers II, L.L.C., US (US)

This wave attack has been ongoing since June 2025.

I've never seen this before; I'm still up against someone very motivated.

Goodness, indeed, someone doesn't like your client!

I know you don't want Cloudflare, but I don't really see how you'd manage without it in these conditions.
Maybe deploy Anubis, but don't claim it'll do the job.

For now with power and RAM.... Haven’t had time yet to seriously look at Anubis.

Otherwise, have you looked into imposing a "challenge" on all visitors (tricky on a shop)?
Something like hCaptcha. Be careful to configure it to allow "legit" bots and not block the payment process's ping back.