Data Plateform - Why does my Python script run perfectly locally but fails when deployed as a Custom Function in OVH Data Platform?
... / Why does my Python script...
BMPCreated with Sketch.BMPZIPCreated with Sketch.ZIPXLSCreated with Sketch.XLSTXTCreated with Sketch.TXTPPTCreated with Sketch.PPTPNGCreated with Sketch.PNGPDFCreated with Sketch.PDFJPGCreated with Sketch.JPGGIFCreated with Sketch.GIFDOCCreated with Sketch.DOC Error Created with Sketch.
Frage

Why does my Python script run perfectly locally but fails when deployed as a Custom Function in OVH Data Platform?

Von
piv
Erstellungsdatum 2025-06-19 10:07:45 in Data Plateform

Hello,

I'm currently deploying a Custom Function on the OVH Data Platform, and I'm encountering issues when trying to run my Python script in that environment.


What my script does

This script performs an automated evaluation of PDF files using OpenAI's GPT-4o model. The steps include:

  1. Loading PDFs from a mounted bucket (via the environment variable BUCKET_PATH)

  2. Extracting text from the PDFs using PyMuPDF

  3. Converting pages to images using pdf2image

  4. Creating prompts (text + image) from templates stored in other files (via PROMPT_TEXTE_PATH and PROMPT_VISION_PATH)

  5. Sending requests to OpenAI using the openai Python SDK

  6. Extracting numeric scores from the model's response

  7. Saving the results in .xlsx and .txt format to a result bucket (OUTPUT_PATH)

 

Problem

The exact same code works perfectly when run locally in Python 3.11.

However, when I deploy it as a Custom Function on the OVH Data Platform, it fails — even though all variables and dependencies are correctly configured, and I’m not using the event object for input.


Logs

You can find a detailed log of the failure here:

2025-06-19 07:18:59 [NOTICE] "New custom" PROVISIONING
2025-06-19 07:18:59 [NOTICE] "New custom" QUEUED
2025-06-19 07:18:59 [NOTICE] "New custom" SUBMITTED
2025-06-19 07:19:25 [INFO] Waiting for exchange init: Operator - True, Worker - True
2025-06-19 07:19:25 [NOTICE] "New custom" CONFIGURING ACTION
2025-06-19 07:19:25 [NOTICE] "New custom" BEGIN action
2025-06-19 07:19:25 [NOTICE] BEGIN action "New custom with 1 task(s)'
2025-06-19 07:19:25 [NOTICE] "New custom" BEGIN (custom) action.
Traceback (most recent call last):
File "/opt/dpe/forepaas/actions/custom/fpcustom/fpcustom.py", line 37, in custom
return func(event)
^^^^^^^^^^^
File "/opt/dpe/funcs/684c128435731a25829efc74/test_ovh.py", line 27, in extract_text
assert isinstance(pdf_path, (str, Path)), f"❌ pdf_path doit être un chemin, pas {type(pdf_path)}"
AssertionError: ❌ pdf_path doit être un chemin, pas <class 'forepaas.worker.event.event.Event'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/eventlet/hubs/hub.py", line 471, in fire_timers
timer()
File "/usr/local/lib/python3.11/site-packages/eventlet/hubs/timer.py", line 59, in __call__
cb(*args, **kw)
File "/usr/local/lib/python3.11/site-packages/eventlet/semaphore.py", line 147, in _do_acquire
waiter.switch()
File "/usr/local/lib/python3.11/site-packages/eventlet/greenthread.py", line 265, in main
result = function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/dpe/forepaas/actions/custom/fpcustom/fpcustom.py", line 44, in custom
raise Exception(msg)
Exception: ----Error in custom: ❌ pdf_path doit être un chemin, pas <class 'forepaas.worker.event.event.Event'> L: 37
2025-06-19 07:19:26 [CRITICAL] "New custom" ----Error in custom: ❌ pdf_path doit être un chemin, pas <class 'forepaas.worker.event.event.Event'> L: 37
2025-06-19 07:19:26 [NOTICE] "New custom" END (custom) action. Duration: 0.811 sec
2025-06-19 07:19:26 [NOTICE] "New custom" BEGIN action "Flush dataplant cache with 1 task(s)'
2025-06-19 07:19:26 [NOTICE] "Flush dataplant cache" BEGIN (forepaas-flushall) action.
2025-06-19 07:19:26 [INFO] "Flush dataplant cache" Start update metas
2025-06-19 07:19:39 [WARNING] "Flush dataplant cache" SELECT COUNT(*) as nb FROM default_dataset.pdf_example
2025-06-19 07:19:40 [INFO] "Flush dataplant cache" End update metas with status: SUCCESS, took 13.5 seconds
2025-06-19 07:19:43 [CRITICAL] "Flush dataplant cache" Critical error cause : critical in flush cache QB HTTPConnectionPool(host='127.0.0.1', port=9000): Max retries exceeded with url: /flushcache (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f67a8169850>: Failed to establish a new connection: [Errno 111] ECONNREFUSED')) in /usr/local/lib/python3.11/site-packages/forepaas/worker/cache/flushcacheqb.py line 29 in /opt/dpe/services/worker/providers/worker.py line 163
2025-06-19 07:19:43 [NOTICE] "Flush dataplant cache" END (forepaas-flushall) action. Duration: 16.516 sec

Can you please help me understand why this same script fails inside the platform even though it runs locally with no issues?

Thanks a lot for your support,

Best regards,


1 Antwort ( Latest reply on 2025-06-24 08:01:29 Von
^FabL
)

Hello @piv 

If the malfunction persists, I invite you to add details and tests performed since the creation of your post.

If not, please feel free to share the solution you've found so that as many people as possible can benefit from it.

^FabL