You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: DirectProgramming/DPC++/GraphAlgorithms/all-pairs-shortest-paths/README.md
+14-9Lines changed: 14 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
# All Pairs Shortest Paths sample
2
-
`All Pairs Shortest Paths` uses the FloydWarshall algorithm to find the shortest paths between pairs of vertices in a graph. It uses a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.
2
+
`All Pairs Shortest Paths` uses the Floyd-Warshall algorithm to find the shortest paths between pairs of vertices in a graph. It uses a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.
3
3
4
4
For comprehensive instructions regarding DPC++ Programming, go to https://software.intel.com/en-us/oneapi-programming-guide and search based on relevant terms noted in the comments.
5
5
@@ -11,23 +11,27 @@ For comprehensive instructions regarding DPC++ Programming, go to https://softwa
11
11
| What you will learn | The All Pairs Shortest Paths sample demonstrates the following using the Intel® oneAPI DPC++/C++ Compiler <ul><li>Offloading compute intensive parts of the application using lambda kernel</li><li>Measuring kernel execution time</li></ul>
12
12
| Time to complete | 15 minutes
13
13
14
-
## Purpose
15
-
This sample uses blocked Floyd-Warshall all pairs shortest paths algorithm to compute a matrix that represents the minimum distance from any node to all other nodes in the graph. Using parallel blocked processing, blocks can be calculated simultaneously by distributing task computations to the GPU.
16
14
17
-
## Key implementation details
18
-
The basic DPC++ implementation explained in the code includes device selector, unified shared memory, kernel, and command groups.
15
+
## Purpose
16
+
This sample uses blocked Floyd-Warshall all pairs shortest paths algorithm to compute a matrix that represents the minimum distance from any node to all other nodes in the graph. Using parallel blocked processing, blocks can be calculated simultaneously by distributing task computations to the GPU. For comparison, the application is run sequentially and parallel with run times for each displayed in the application's output. The device where the code us run is also identified.
19
17
20
-
The parallel implementation of blocked FloydWarshall algorithm has three phases. Given a prior round of these computation phases are complete, phase 1 is independent; Phase 2 can only execute after phase 1 completes; Similarly phase 3 depends on phase 2 so can only execute after phase 2 is complete.
18
+
The parallel implementation of blocked Floyd-Warshall algorithm has three phases. Given a prior round of these computation phases are complete, phase 1 is independent; Phase 2 can only execute after phase 1 completes; Similarly phase 3 depends on phase 2 so can only execute after phase 2 is complete.
21
19
22
20
The inner loop of the sequential implementation is:
23
21
g[i][j] = min(g[i][j], g[i][k] + g[k][j])
24
22
25
-
A careful observation shows that for the kth iteration of the outer loop, the computation depends on cells either on the kth column, g[i][k] or on the kth row, g[k][j] of the graph. Phase 1 handles g[k][k], phase 2 handles g[\*][k] and g[k][\*], and phase 3 handles g[\*][\*] in that sequence. This cell level observations largely propagate to the blocks as well.
23
+
A careful observation shows that for the kth iteration of the outer loop, the computation depends on cells either on the kth column, g[i][k] or on the kth row, g[k][j] of the graph. Phase 1 handles g[k][k], phase 2 handles g[\*][k] and g[k][\*], and phase 3 handles g[\*][\*] in that sequence. This cell level observations largely propagate to the blocks as well.
24
+
25
+
In each phase computation within a block can proceed independently in parallel.
26
+
27
+
28
+
## Key implementation details
29
+
Includes device selector, unified shared memory, kernel, and command groups in order to implement a solution using parallel block method targeting the GPU.
26
30
27
-
In each phase computation within a block can proceed independently in parallel.
28
31
29
32
## License
30
-
This code sample is licensed under MIT license
33
+
This code sample is licensed under MIT license.
34
+
31
35
32
36
## Building the Program for CPU and GPU
33
37
@@ -61,6 +65,7 @@ Perform the following steps:
61
65
* Build the program using VS2017 or VS2019: Right click on the solution file and open using either VS2017 or VS2019 IDE. Right click on the project in Solution explorer and select Rebuild. From top menu select Debug -> Start without Debugging.
62
66
* Build the program using MSBuild: Open "x64 Native Tools Command Prompt for VS2017" or "x64 Native Tools Command Prompt for VS2019". Run - MSBuild all-pairs-shortest-paths.sln /t:Rebuild /p:Configuration="Release"
0 commit comments