Skip to content

Conversation

@dawidkurdyla
Copy link

This pull request introduces first‑class support for pulling input data from S3/MinIO and pushing output data back, while preserving backward‑compatible behaviour for existing HyperFlow jobs.

Key points:

  • Data stager. New data‑stager.js and storage/s3Adapter.js modules implement:

    • Concurrent downloads from S3 prefixes/keys into the executor’s input directory. Include/exclude glob patterns and optional recursion are supported.
    • Concurrent uploads of output files back to S3, with configurable overwrite and layout (e.g. {stem}.ext to derive object names from input stems).
    • Concurrency and retries are controlled via HF_S3_CONCURRENCY and HF_S3_RETRIES (defaults sensible).
    • Optional local cleanup (HF_TASK_CLEANUP_LOCAL=1) to remove downloaded inputs and uploaded outputs after job completion.
    • Uses AWS SDK v3 (@aws-sdk/client-s3 and @aws-sdk/lib-storage) with endpoint/region/path‑style settings read from HF_S3_ENDPOINT, HF_S3_FORCE_PATH_STYLE, AWS_REGION or AWS_DEFAULT_REGION for MinIO/AWS compatibility
  • Connector tweaks. The RemoteJobConnector now uses a keys object to refer to wf::tasksPendingCompletionHandling and wf::completedTasks, making it easier to mark tasks as completed or ready for completion handling

  • Environment variables & defaults. New variables:

  • HF_VAR_USE_S3_IO – enable S3 downloads/uploads; off by default to maintain old behaviour.

  • HF_S3_ENDPOINT, HF_S3_FORCE_PATH_STYLE, AWS_REGION/AWS_DEFAULT_REGION – S3/MinIO config.

  • HF_S3_CONCURRENCY, HF_S3_RETRIES – concurrency and retry controls.

  • HF_TASK_CLEANUP_LOCAL – remove local data after successful upload.

  • Existing workflows without these variables continue to run as before.

Dependencies. Adds @aws-sdk/client-s3, @aws-sdk/lib-storage, minimatch and updates amqplib to latest, but retains callback‑based AMQP API for backward compatibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant