From 35c8b2fda8b1b27bc95524ba26d72490b997d1a2 Mon Sep 17 00:00:00 2001 From: Dilum Aluthge Date: Sun, 9 Feb 2025 18:06:48 -0500 Subject: [PATCH 1/3] Docs: Move SGE docs from README to separate file --- README.md | 64 ------------------------------------------------ docs/sge.md | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+), 64 deletions(-) create mode 100644 docs/sge.md diff --git a/README.md b/README.md index dd76f54..e87c03e 100755 --- a/README.md +++ b/README.md @@ -39,70 +39,6 @@ You can also write your own custom cluster manager; see the instructions in the For Slurm, please see the [SlurmClusterManager.jl](https://github.com/JuliaParallel/SlurmClusterManager.jl) package. -### SGE - a simple interactive example - -```julia -julia> using ClusterManagers - -julia> ClusterManagers.addprocs_sge(5; qsub_flags=`-q queue_name`) -job id is 961, waiting for job to start . -5-element Array{Any,1}: -2 -3 -4 -5 -6 - -julia> @parallel for i=1:5 - run(`hostname`) - end - -julia> From worker 2: compute-6 - From worker 4: compute-6 - From worker 5: compute-6 - From worker 6: compute-6 - From worker 3: compute-6 -``` - -Some clusters require the user to specify a list of required resources. -For example, it may be necessary to specify how much memory will be needed by the job - see this [issue](https://github.com/JuliaLang/julia/issues/10390). -The keyword `qsub_flags` can be used to specify these and other options. -Additionally the keyword `wd` can be used to specify the working directory (which defaults to `ENV["HOME"]`). - -```julia -julia> using Distributed, ClusterManagers - -julia> addprocs_sge(5; qsub_flags=`-q queue_name -l h_vmem=4G,tmem=4G`, wd=mktempdir()) -Job 5672349 in queue. -Running. -5-element Array{Int64,1}: - 2 - 3 - 4 - 5 - 6 - -julia> pmap(x->run(`hostname`),workers()); - -julia> From worker 26: lum-7-2.local - From worker 23: pace-6-10.local - From worker 22: chong-207-10.local - From worker 24: pace-6-11.local - From worker 25: cheech-207-16.local - -julia> rmprocs(workers()) -Task (done) -``` - -### SGE via qrsh - -`SGEManager` uses SGE's `qsub` command to launch workers, which communicate the -TCP/IP host:port info back to the master via the filesystem. On filesystems -that are tuned to make heavy use of caching to increase throughput, launching -Julia workers can frequently timeout waiting for the standard output files to appear. -In this case, it's better to use the `QRSHManager`, which uses SGE's `qrsh` -command to bypass the filesystem and captures STDOUT directly. - ### Using `LocalAffinityManager` (for pinning local workers to specific cores) - Linux only feature. diff --git a/docs/sge.md b/docs/sge.md new file mode 100644 index 0000000..8a74b6b --- /dev/null +++ b/docs/sge.md @@ -0,0 +1,70 @@ +# Sun Grid Engine (SGE) + +> [!WARNING] +> The SGE functionality is not currently being maintained. +> +> We are currently seeking a new maintainer for the SGE functionality. If you are an active user of SGE and are interested in being a maintainer, please open a GitHub issue - say that you are interested in being a maintainer for the SGE functionality. + +## SGE via `qsub`: Use `ClusterManagers.addprocs_sge` (or `ClusterManagers.SGEManager`) + +```julia +julia> using ClusterManagers + +julia> ClusterManagers.addprocs_sge(5; qsub_flags=`-q queue_name`) +job id is 961, waiting for job to start . +5-element Array{Any,1}: +2 +3 +4 +5 +6 + +julia> @parallel for i=1:5 + run(`hostname`) + end + +julia> From worker 2: compute-6 + From worker 4: compute-6 + From worker 5: compute-6 + From worker 6: compute-6 + From worker 3: compute-6 +``` + +Some clusters require the user to specify a list of required resources. +For example, it may be necessary to specify how much memory will be needed by the job - see this [issue](https://github.com/JuliaLang/julia/issues/10390). +The keyword `qsub_flags` can be used to specify these and other options. +Additionally the keyword `wd` can be used to specify the working directory (which defaults to `ENV["HOME"]`). + +```julia +julia> using Distributed, ClusterManagers + +julia> addprocs_sge(5; qsub_flags=`-q queue_name -l h_vmem=4G,tmem=4G`, wd=mktempdir()) +Job 5672349 in queue. +Running. +5-element Array{Int64,1}: + 2 + 3 + 4 + 5 + 6 + +julia> pmap(x->run(`hostname`),workers()); + +julia> From worker 26: lum-7-2.local + From worker 23: pace-6-10.local + From worker 22: chong-207-10.local + From worker 24: pace-6-11.local + From worker 25: cheech-207-16.local + +julia> rmprocs(workers()) +Task (done) +``` + +## SGE via `qrsh`: Use `ClusterManagers.addprocs_qrsh` (or `ClusterManagers.QRSHManager`) + +`SGEManager` uses SGE's `qsub` command to launch workers, which communicate the +TCP/IP host:port info back to the master via the filesystem. On filesystems +that are tuned to make heavy use of caching to increase throughput, launching +Julia workers can frequently timeout waiting for the standard output files to appear. +In this case, it's better to use the `QRSHManager`, which uses SGE's `qrsh` +command to bypass the filesystem and captures STDOUT directly. From c21f3010bd91544a2da12727b5838b6e765cef05 Mon Sep 17 00:00:00 2001 From: Dilum Aluthge Date: Sun, 9 Feb 2025 19:14:07 -0500 Subject: [PATCH 2/3] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index e87c03e..1efd3b1 100755 --- a/README.md +++ b/README.md @@ -85,3 +85,7 @@ ElasticManager: By default, the printed command uses the absolute path to the current Julia executable and activates the same project as the current session. You can change either of these defaults by passing `printing_kwargs=(absolute_exename=false, same_project=false))` to the first form of the `ElasticManager` constructor. Once workers are connected, you can print the `em` object again to see them added to the list of active workers. + +### Sun Grid Engine (SGE) + +See [docs/sge.md](docs/sge.md) From 5c74c82b6d59bf4c61060e7de50481c5309c5340 Mon Sep 17 00:00:00 2001 From: Dilum Aluthge Date: Sun, 9 Feb 2025 19:14:36 -0500 Subject: [PATCH 3/3] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1efd3b1..3b060d6 100755 --- a/README.md +++ b/README.md @@ -88,4 +88,4 @@ Once workers are connected, you can print the `em` object again to see them adde ### Sun Grid Engine (SGE) -See [docs/sge.md](docs/sge.md) +See [`docs/sge.md`](docs/sge.md)