Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 4 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,70 +39,6 @@ You can also write your own custom cluster manager; see the instructions in the

For Slurm, please see the [SlurmClusterManager.jl](https://github.com/JuliaParallel/SlurmClusterManager.jl) package.

### SGE - a simple interactive example

```julia
julia> using ClusterManagers

julia> ClusterManagers.addprocs_sge(5; qsub_flags=`-q queue_name`)
job id is 961, waiting for job to start .
5-element Array{Any,1}:
2
3
4
5
6

julia> @parallel for i=1:5
run(`hostname`)
end

julia> From worker 2: compute-6
From worker 4: compute-6
From worker 5: compute-6
From worker 6: compute-6
From worker 3: compute-6
```

Some clusters require the user to specify a list of required resources.
For example, it may be necessary to specify how much memory will be needed by the job - see this [issue](https://github.com/JuliaLang/julia/issues/10390).
The keyword `qsub_flags` can be used to specify these and other options.
Additionally the keyword `wd` can be used to specify the working directory (which defaults to `ENV["HOME"]`).

```julia
julia> using Distributed, ClusterManagers

julia> addprocs_sge(5; qsub_flags=`-q queue_name -l h_vmem=4G,tmem=4G`, wd=mktempdir())
Job 5672349 in queue.
Running.
5-element Array{Int64,1}:
2
3
4
5
6

julia> pmap(x->run(`hostname`),workers());

julia> From worker 26: lum-7-2.local
From worker 23: pace-6-10.local
From worker 22: chong-207-10.local
From worker 24: pace-6-11.local
From worker 25: cheech-207-16.local

julia> rmprocs(workers())
Task (done)
```

### SGE via qrsh

`SGEManager` uses SGE's `qsub` command to launch workers, which communicate the
TCP/IP host:port info back to the master via the filesystem. On filesystems
that are tuned to make heavy use of caching to increase throughput, launching
Julia workers can frequently timeout waiting for the standard output files to appear.
In this case, it's better to use the `QRSHManager`, which uses SGE's `qrsh`
command to bypass the filesystem and captures STDOUT directly.

### Using `LocalAffinityManager` (for pinning local workers to specific cores)

- Linux only feature.
Expand Down Expand Up @@ -149,3 +85,7 @@ ElasticManager:
By default, the printed command uses the absolute path to the current Julia executable and activates the same project as the current session. You can change either of these defaults by passing `printing_kwargs=(absolute_exename=false, same_project=false))` to the first form of the `ElasticManager` constructor.

Once workers are connected, you can print the `em` object again to see them added to the list of active workers.

### Sun Grid Engine (SGE)

See [`docs/sge.md`](docs/sge.md)
70 changes: 70 additions & 0 deletions docs/sge.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Sun Grid Engine (SGE)

> [!WARNING]
> The SGE functionality is not currently being maintained.
>
> We are currently seeking a new maintainer for the SGE functionality. If you are an active user of SGE and are interested in being a maintainer, please open a GitHub issue - say that you are interested in being a maintainer for the SGE functionality.

## SGE via `qsub`: Use `ClusterManagers.addprocs_sge` (or `ClusterManagers.SGEManager`)

```julia
julia> using ClusterManagers

julia> ClusterManagers.addprocs_sge(5; qsub_flags=`-q queue_name`)
job id is 961, waiting for job to start .
5-element Array{Any,1}:
2
3
4
5
6

julia> @parallel for i=1:5
run(`hostname`)
end

julia> From worker 2: compute-6
From worker 4: compute-6
From worker 5: compute-6
From worker 6: compute-6
From worker 3: compute-6
```

Some clusters require the user to specify a list of required resources.
For example, it may be necessary to specify how much memory will be needed by the job - see this [issue](https://github.com/JuliaLang/julia/issues/10390).
The keyword `qsub_flags` can be used to specify these and other options.
Additionally the keyword `wd` can be used to specify the working directory (which defaults to `ENV["HOME"]`).

```julia
julia> using Distributed, ClusterManagers

julia> addprocs_sge(5; qsub_flags=`-q queue_name -l h_vmem=4G,tmem=4G`, wd=mktempdir())
Job 5672349 in queue.
Running.
5-element Array{Int64,1}:
2
3
4
5
6

julia> pmap(x->run(`hostname`),workers());

julia> From worker 26: lum-7-2.local
From worker 23: pace-6-10.local
From worker 22: chong-207-10.local
From worker 24: pace-6-11.local
From worker 25: cheech-207-16.local

julia> rmprocs(workers())
Task (done)
```

## SGE via `qrsh`: Use `ClusterManagers.addprocs_qrsh` (or `ClusterManagers.QRSHManager`)

`SGEManager` uses SGE's `qsub` command to launch workers, which communicate the
TCP/IP host:port info back to the master via the filesystem. On filesystems
that are tuned to make heavy use of caching to increase throughput, launching
Julia workers can frequently timeout waiting for the standard output files to appear.
In this case, it's better to use the `QRSHManager`, which uses SGE's `qrsh`
command to bypass the filesystem and captures STDOUT directly.
Loading