You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: hadoop-submarine/hadoop-submarine-core/src/site/markdown/HowToInstall.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,23 +14,23 @@
14
14
15
15
# How to Install Dependencies
16
16
17
-
Submarine project uses YARN Service, Docker container, and GPU (when GPU hardware available and properly configured).
17
+
Submarine project uses YARN Service, Docker container and GPU.
18
+
GPU could only be used if a GPU hardware is available and properly configured.
18
19
19
-
That means as an admin, you have to properly setup YARN Service related dependencies, including:
20
+
As an administrator, you have to properly setup YARN Service related dependencies, including:
20
21
- YARN Registry DNS
22
+
- Docker related dependencies, including:
23
+
- Docker binary with expected versions
24
+
- Docker network that allows Docker containers to talk to each other across different nodes
21
25
22
-
Docker related dependencies, including:
23
-
-Docker binary with expected versions.
24
-
-Docker network which allows Docker container can talk to each other across different nodes.
26
+
If you would like to use GPU, you need to set up:
27
+
-GPU Driver
28
+
-Nvidia-docker
25
29
26
-
And when GPU wanna to be used:
27
-
- GPU Driver.
28
-
- Nvidia-docker.
29
-
30
-
For your convenience, we provided installation documents to help you to setup your environment. You can always choose to have them installed in your own way.
30
+
For your convenience, we provided some installation documents to help you setup your environment. You can always choose to have them installed in your own way.
31
31
32
32
Use Submarine installer to install dependencies: [EN](https://github.com/hadoopsubmarine/hadoop-submarine-ecosystem/tree/master/submarine-installer)[CN](https://github.com/hadoopsubmarine/hadoop-submarine-ecosystem/blob/master/submarine-installer/README-CN.md)
33
33
34
-
Alternatively, you can follow manual install dependencies: [EN](InstallationGuide.html)[CN](InstallationGuideChineseVersion.html)
34
+
Alternatively, you can follow this guide to manually install dependencies: [EN](InstallationGuide.html)[CN](InstallationGuideChineseVersion.html)
35
35
36
-
Once you have installed dependencies, please follow following guide to [TestAndTroubleshooting](TestAndTroubleshooting.html).
36
+
Once you have installed all the dependencies, please follow this guide: [TestAndTroubleshooting](TestAndTroubleshooting.html).
Copy file name to clipboardExpand all lines: hadoop-submarine/hadoop-submarine-core/src/site/markdown/InstallationGuide.md
+47-32Lines changed: 47 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,20 +16,25 @@
16
16
17
17
## Prerequisites
18
18
19
-
(Please note that all following prerequisites are just an example for you to install. You can always choose to install your own version of kernel, different users, different drivers, etc.).
19
+
Please note that the following prerequisites are just an example for you to install Submarine.
20
+
21
+
You can always choose to install your own version of kernel, different users, different drivers, etc.
20
22
21
23
### Operating System
22
24
23
-
The operating system and kernel versions we have tested are as shown in the following table, which is the recommneded minimum required versions.
25
+
The operating system and kernel versions we have tested against are shown in the following table.
26
+
The versions in the table are the recommended minimum required versions.
24
27
25
-
|Enviroment|Verion|
28
+
|Environment|Version|
26
29
| ------ | ------ |
27
30
| Operating System | centos-release-7-5.1804.el7.centos.x86_64 |
28
-
|Kernal| 3.10.0-862.el7.x86_64 |
31
+
|Kernel| 3.10.0-862.el7.x86_64 |
29
32
30
33
### User & Group
31
34
32
-
As there are some specific users and groups recommended to be created to install hadoop/docker. Please create them if they are missing.
35
+
There are specific users and groups recommended to be created to install Hadoop with Docker.
36
+
37
+
Please create these users if they do not exist.
33
38
34
39
```
35
40
adduser hdfs
@@ -80,7 +85,9 @@ lspci | grep -i nvidia
80
85
81
86
### Nvidia Driver Installation (Only for Nvidia GPU equipped nodes)
82
87
83
-
To make a clean installation, if you have requirements to upgrade GPU drivers. If nvidia driver/cuda has been installed before, They should be uninstalled firstly.
88
+
To make a clean installation, if you have requirements to upgrade GPU drivers.
89
+
90
+
If nvidia driver / CUDA has been installed before, they should be uninstalled as a first step.
Add a file, named daemon.json, under the path of /etc/docker/. Please replace the variables of image_registry_ip, etcd_host_ip, localhost_ip, yarn_dns_registry_host_ip, dns_host_ip with specific ips according to your environments.
217
+
Add a file, named daemon.json, under the path of /etc/docker/.
218
+
219
+
Please replace the variables of image_registry_ip, etcd_host_ip, localhost_ip, yarn_dns_registry_host_ip, dns_host_ip with specific IPs according to your environment.
There is no need to install CUDNN and CUDA on the servers, because CUDNN and CUDA can be added in the docker images. We can get basic docker images by referring to [Write Dockerfile](WriteDockerfileTF.html).
318
+
There is no need to install CUDNN and CUDA on the servers, because CUDNN and CUDA can be added in the docker images.
319
+
320
+
We can get or build basic docker images by referring to [Write Dockerfile](WriteDockerfileTF.html).
308
321
309
322
### Test tensorflow in a docker container
310
323
311
324
After docker image is built, we can check
312
-
Tensorflow environments before submitting a yarn job.
325
+
Tensorflow environments before submitting a Submarine job.
313
326
314
327
```shell
315
328
$ docker run -it ${docker_image_name} /bin/bash
@@ -336,8 +349,8 @@ If there are some errors, we could check the following configuration.
336
349
337
350
### Etcd Installation
338
351
339
-
etcd is a distributed reliable key-value store for the most critical data of a distributed system, Registration and discovery of services used in containers.
340
-
You can also choose alternatives like zookeeper, Consul.
352
+
etcd is a distributed, reliable key-value store for the most critical data of a distributed system, Registration and discovery of services used in containers.
353
+
You can also choose alternatives like ZooKeeper, Consul or others.
341
354
342
355
To install Etcd on specified servers, we can run Submarine-installer/install.sh
./bin/yarn jar /home/hadoop/hadoop-current/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar job run \
@@ -490,11 +505,11 @@ where ${dfs_name_service} is the hdfs name service you use
490
505
```
491
506
492
507
493
-
## Tensorflow Job with GPU
508
+
## TensorFlow Job with GPU
494
509
495
-
### GPU configurations for both resourcemanager and nodemanager
510
+
### GPU configurations for both ResourceManager and NodeManager
496
511
497
-
Add the yarn resource configuration file, named resource-types.xml
512
+
Add the YARN resource configuration file, named resource-types.xml
498
513
499
514
```
500
515
<configuration>
@@ -505,9 +520,9 @@ Add the yarn resource configuration file, named resource-types.xml
505
520
</configuration>
506
521
```
507
522
508
-
#### GPU configurations for resourcemanager
523
+
#### GPU configurations for ResourceManager
509
524
510
-
The scheduler used by resourcemanager must be capacity scheduler, and yarn.scheduler.capacity.resource-calculator in capacity-scheduler.xml should be DominantResourceCalculator
525
+
The scheduler used by ResourceManager must be the capacity scheduler, and yarn.scheduler.capacity.resource-calculator in capacity-scheduler.xml should be DominantResourceCalculator
511
526
512
527
```
513
528
<configuration>
@@ -518,7 +533,7 @@ The scheduler used by resourcemanager must be capacity scheduler, and yarn.sche
518
533
</configuration>
519
534
```
520
535
521
-
#### GPU configurations for nodemanager
536
+
#### GPU configurations for NodeManager
522
537
523
538
Add configurations in yarn-site.xml
524
539
@@ -536,7 +551,7 @@ Add configurations in yarn-site.xml
536
551
</configuration>
537
552
```
538
553
539
-
Add configurations in container-executor.cfg
554
+
Add configurations to container-executor.cfg
540
555
541
556
```
542
557
[docker]
@@ -560,7 +575,7 @@ Add configurations in container-executor.cfg
560
575
yarn-hierarchy=/hadoop-yarn
561
576
```
562
577
563
-
### Run a distributed tensorflow gpu job
578
+
### Run a distributed TensorFlow GPU job
564
579
565
580
```bash
566
581
./yarn jar /home/hadoop/hadoop-current/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar job run \
0 commit comments