503 Backend fetch failed

Hello everyone,

I opened a ticket and I’m not convinced by the support’s response.
I’m therefore turning to you for additional opinions/experiences.

The issue

For periods ranging from a few minutes (5 min) to an hour and a half, all pages of my website become inaccessible (static resources are not affected).
During these periods I can still interact with the database via phpMyAdmin—no slowdown or blocking on the DB side.

When a page loads, the site stalls for several seconds/minutes and then displays:
Error 503 Backend fetch failed

In the Apache access logs there are no requests during the blockage periods — there is a gap in the logs.

In the Apache error logs, many lines like:
AH10141: FastCGI: comm with server "..." aborted: idle timeout (160 sec)
AH10149: FastCGI: incomplete headers (0 bytes) received from server "..."

I rule out overload because the outages sometimes occur during low‑traffic periods.

I haven’t touched the site’s code during those periods: the site becomes inaccessible and then accessible again without any action on my part.
So I dismiss the bug hypothesis, especially since I have a twin site (same code, same DB structure, same cluster) that shows no anomalies.
It’s custom code, no plugins.

Below are screenshots showing a few days.

2026-05-01: everything is fine:

2026-05-10: ~2 h of interruption:

For your information:

  • hosting offer: Performance
  • cluster110

Support’s reply

Regarding the 503 Backend fetch failed error (Varnish cache server), we regret the situation and the delayed response.

The 503 error you are seeing is not caused by our infrastructure.
In short, the CDN was enabled, and when it tries to contact your site it does not receive a “200” status but an error.
For example, the website might return a 500 Internal Server Error and the CDN does not know how to interpret it, so it returns a 503.
Usually this type of error can be caused by several factors:

  • A bug in the site’s PHP or .htaccess code.
  • A broken script, plugin, or code file.
  • When you put a site into maintenance mode.
  • Plugins manipulating the site’s state.

You can enable the website’s debug mode and/or PHP development mode to let the browser show the internal error.

To enable PHP developer mode, follow these steps:
OVHcloud client area → Web Cloud section → Hosting → [hosting service name on the left] → Configuration → PHP version → click (…) → Edit configuration → Edit current configuration → set the "mode" entry to "development"

The change is not immediate; you’ll need to wait about 10–15 minutes, after which you can access your website again using a private/incognito browser window or by clearing your current browser’s cache and cookies before visiting the site.

If the error returns no details, make sure the domain no longer points to the CDN but directly to the hosting.
To find the “basic” IP address, first check the current IP in the DNS zone, then open this guide:

https://help.ovhcloud.com/csm/fr-web-hosting-clusters-ip-addresses?id=kb_article_view&sysparm_article=KB0052378

For example, if your current IP is 46.105.204.8, scroll a little up in the guide and find the line for the same cluster that says “France”; you will see 213.186.33.2, which is the base IP for cluster102.
I recommend this only as an example; you should verify in the guide which IP to set.

Then you can modify the A record of your domain to point to the original IP of your hosting.
Here is our guide on DNS zone configuration:

https://help.ovhcloud.com/csm/fr-dns-edit-dns-zone?id=kb_article_view&sysparm_article=KB0051684

Changing the A record can take up to 24 hours to propagate.

Finally, since OVHcloud does not provide technical support for the configuration and/or content of your hosting, if you need further assistance with this issue I recommend contacting one of our partners using the link below:

https://partner.ovhcloud.com/fr/directory/

What do you think?

Thank you!

Yes, same symptoms but with other 5xx errors. The support response: "It's not us, move along!" Especially since OVH sometimes displays an error page specific to their theme "hosting in maintenance" from time to time (in my case), and they are currently unable to see where it comes from (someone reopened the "502" ticket to try to see where it comes from), worse, they shift the blame onto us!

OVH, it's high time to investigate further.

My 502 thread: Erreur 502 Bad Gateway, pas d'accès au site depuis 8 heures (cluster024) - numéro 31

My 504 thread: 502 puis maintenant c'est 504 Gateway Time-out, où va t'on chez OVH? - #5 par SteveT

And what do the raw access logs say? Excessive bot traffic?

@TTY during outage periods, the "access" logs are almost empty (only static resources). However, the "error" logs are full of

AH10141: FastCGI: comm with server "..." aborted: idle timeout (160 sec)
AH10149: FastCGI: incomplete headers (0 bytes) received from server "..."

This does not seem related to traffic: the graphs show that it crashes even during periods when traffic is below the max.

For bots, I have blocked entire IP ranges via .htaccess and also block user‑agents (also via .htaccess) and finally I return a 403 as soon as the request comes from a bot on a page disallowed by robots.txt

I think your PHP pool is saturated:

  • Apache idle during the downtime only serves static files
  • idle timeout 160 sec + incomplete headers (0 bytes)

Apache forwarded the request to the FPM socket, but the PHP pool no longer has any workers available to handle the request, resulting in a timeout on the fcgi proxy side; FPM closes the connection without sending a header.

Unfortunately this is becoming more and more common; I see it every day on my hosts. htaccess actions have limits in the face of the democratization of this kind of practice (see an extreme case here Résistance aux bots - #18 par Sich). OVH or elsewhere it will be the same.
It will be hard to defend yourself against this cheaply while keeping your shared hosting. Cloudflare, with an investment to understand how it works, is certainly your best option.

@Sich [mode complot]I wonder if it's not CF that organizes such bot traffic to capture customers[/mode complot]

It's a whole set of things.

The explosion of bots by “legitimate” AI actors – they’re extremely aggressive, scan frequently, and are numerous.
On top of that, more and more “aggressive” bots are looking for vulnerabilities. With AI, it’s getting easier and easier to spawn a bot of this kind, and likewise easier to search for and exploit flaws.

My take on the issue? Go back to the good old static site whenever possible (obviously not an option for an online store). For example, my professional site and my personal blog are completely static (using Cecil).
There are plenty of tools to publish fully‑HTML sites by writing articles in Markdown and then building the site. You just have to transfer the data afterward.

A static site means zero server load; it can be deployed on any shared hosting for €5 /month, it can even be hosted for free on Cloudflare Pages, and behind a CDN it can handle tens of thousands of simultaneous visitors at no extra cost.

For those who can’t do without dynamic sites (honestly, this only concerns a small number of small‑scale operators), first look at SaaS solutions (e.g., an online shop) to outsource the site’s maintenance and security as well as the underlying infrastructure. Self‑hosting should only be considered if you have the skills and/or budget to do it.

The web of the 2000s is dead, and I miss it terribly. But there’s no point in lamenting; we have to adapt to this new reality.

As for Cloudflare, they’re simply exploiting this new goldmine commercially, running servers that are virtually impossible to secure given the flood of bots crawling the internet today.

Regarding the original topic, two things need to be distinguished:

  • the OVH infrastructure issue – a noisy neighbor, in short, a problem unrelated to the site itself but affecting the broader infrastructure. This kind of thing can happen on shared hosting. It’s supposed to be limited, but it does occur.
  • The other problem can indeed stem from the site itself consuming too many resources, typically due to aggressive bot scanning that overwhelms the modest resources of a shared host.

In the first case, there’s nothing to do except make the site less dependent on dynamic components (static caching, CDN‑level caching, static site). In the second case, likewise, make the site “lighter” for the host. This can involve htaccess rules that block certain bots, implementing caching (local or at the infra level), or, again, stripping out the dynamic part in favor of plain HTML.

And in any case, you can’t perform miracles with a shared‑hosting budget of just a few euros per month.

@TTY thanks for your answer!

What troubles me is that I’m paying for a "Performance" hosting, so with guaranteed resources, and the load on my site is low at the moment.

For your info, I changed the DNS records so as not to go through the CDN (that’s the support’s advice to see the source of the error). However, now it’s a "504 Gateway Time-out".

Hello @ChristopheM5

I couldn't find the address of your site anywhere in the comment.

Could you give it to us?

Also, do most of your site's visitors come from France, Europe, or worldwide?

If the majority of your visitors are from France, the CDN option isn’t useful at all and could even disrupt access to your site.

I suggest checking your site's configuration using pages A through J of my guide:

https://wordetweb.com/word-et-web/WORDPRESS-guide-installation-de-WordPress-premier-domaine-chez-OVH-FR.htm

Thanks @Gaston. The address is chat‑perdu.org and it’s not based on WordPress.

And I suppose that by changing the DNS records, I’ve effectively disabled the CDN.

The site works for me but the performance isn’t great :frowning:
After that it’s not a "standard" site. Lots of pages -> a lot of surface for bots.
Did you make any code changes recently?

EDIT: the performance is spot‑on now, so yes, it’s very variable.
Share an hour of access logs to see (pastebin or other).

No @ChristopheM5, you also need to configure it in:

Manager OVH > Web Cloud > Hosting > YourDomain > General information:

Is there a lot of activity on the site?

Because, in principle, there is still a lot of data, I assume there is activity.
No wonder a shared hosting can sometimes struggle.

Performance does not guarantee huge resources.

First of all, via Cloudflare, we should make sure an unauthenticated user receives cached content without hitting the server. And for that, we should avoid sending a session cookie just for browsing the site without logging in.

When the site is updated, initially purge all CF cache via the API, and ideally only remove from the cache the pages that changed, to keep a long‑term cache on CF (1 week, maybe more).

Maybe here it is simply that there is too much activity for a shared‑hosting/performance plan.
So either the infrastructure needs to be revisited (see my Cloudflare suggestion), or move to a higher‑performance hosting.

Is there a lot of server‑side cache? Or is everything recalculated each time?

@TTY I hadn't made any major changes. And especially, the anomaly would appear/disappear without any intervention on my part, in the middle of the night or in broad daylight (low traffic vs high traffic).

@Sich I don't use Cloudflare. I already generate a local static cache, but there is a lot of activity: +200 posts/day +1000 comments/day which require purging the static cache of the affected pages with each new post/comment. This slightly reduces its effectiveness.

I made an update yesterday that seems to block bots much better than before.
In short, I’m being harassed by Bytespider (TikTok), which is particularly nasty.

  1. it doesn’t always identify itself in the User-Agent
  2. it of course doesn’t respect robots.txt
  3. and it mainly uses a trick that is especially hard to block.

The trick is as follows:
31.126.220.178 ``www.chat-perdu.org`` - [27/May/2026:00:03:24 +0200] "GET /en-ng/921941 HTTP/1.1" 200 859 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.6848.1794 Mobile Safari/537.36"
151.241.251.87 ``www.chat-perdu.org`` - [27/May/2026:00:03:26 +0200] "POST /challenge.php HTTP/1.1" 404 30369 "https://www.chat-perdu.org/en-ng/921941`" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.8329.1633 Safari/537.36"`

These two requests follow each other, and the second even mentions the referrer of the previous one. But they come from two different IPs.

And I have thousands of identical cases where the IP is used for only one request, occasionally a tiny bit more.
So it’s impossible to block the IP at the htaccess level (even though I already block entire, clearly identified ranges).

From now on, when a sensitive page loads, there’s a JS challenge that requires:

  • executing the JS
  • accepting cookies and sending a cookie that matches the IP

So far, the result is: when a page loads without a valid cookie, Bytespider sees the challenge (it’s just a tiny piece of HTML/JS, so the server load is minimal).
It executes the JS but makes the POST request to the challenge from a different IP. Hence the challenge fails.

Fingers crossed this trick holds up a bit.

According to the latest news, Bytespider is using about 12,000 IPv4 addresses.

Your thing looks pretty good to me, kudos already :slight_smile:
But a few remarks:

  • Bytespider, even if it doesn’t follow many rules (it’s not the only one, hello Facebook), doesn’t lie about its UA.
  • 31.126.220.178 is a residential IP (British Telecom)
  • 151.241.251.87 is a datacenter IP
  • That doesn’t stop the bot from making requests and therefore using PHP resources (even if it’s better)

From what you say (multiple unique IPs) it’s possible that your site is being "targeted" by a rotating‑proxy bot. If that’s the case, it will be hard to defend yourself if it gets angry about the requests per second.