GPU and CPU idle during training

Hello,
I've been using AI Training for several big trainings and every time the training time is abnormaly long (And billed accordingly). It takes 30 days for a training that should take not longer than 5 days.
The Job Monitoring interface shows that both CPU and GPU are idle most of the time.


I use a custom image based on python:3.9 (which is based on debian buster).
Data are located in a mounted object storage RW:cache. (Note that the problem was the same without the cache). Outputs are stored in the same storage.
I suspect an IO or journalisation problem, or a problem related to the use/sync of object storage but i cannot inquire it as IO monitoring is not available.
Am i doing something wrong ?

Bonjour @DamienL38,

Si le dysfonctionnement est toujours d'actualité, je vous invite à préciser davantage d'éléments et/ou tests effectués afin que la communauté puisse vous apporter un retour.

^FabL