Skip to content

Commit 48575d8

Browse files
authored
Merge pull request #11 from tomlenth/patch-1
Update README.md
2 parents e1aa6b2 + b10f0ad commit 48575d8

File tree

1 file changed

+14
-9
lines changed
  • DirectProgramming/DPC++/GraphAlgorithms/all-pairs-shortest-paths

1 file changed

+14
-9
lines changed

DirectProgramming/DPC++/GraphAlgorithms/all-pairs-shortest-paths/README.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# All Pairs Shortest Paths sample
2-
`All Pairs Shortest Paths` uses the Floyd Warshall algorithm to find the shortest paths between pairs of vertices in a graph. It uses a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.
2+
`All Pairs Shortest Paths` uses the Floyd-Warshall algorithm to find the shortest paths between pairs of vertices in a graph. It uses a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.
33

44
For comprehensive instructions regarding DPC++ Programming, go to https://software.intel.com/en-us/oneapi-programming-guide and search based on relevant terms noted in the comments.
55

@@ -11,23 +11,27 @@ For comprehensive instructions regarding DPC++ Programming, go to https://softwa
1111
| What you will learn | The All Pairs Shortest Paths sample demonstrates the following using the Intel&reg; oneAPI DPC++/C++ Compiler <ul><li>Offloading compute intensive parts of the application using lambda kernel</li><li>Measuring kernel execution time</li></ul>
1212
| Time to complete | 15 minutes
1313

14-
## Purpose
15-
This sample uses blocked Floyd-Warshall all pairs shortest paths algorithm to compute a matrix that represents the minimum distance from any node to all other nodes in the graph. Using parallel blocked processing, blocks can be calculated simultaneously by distributing task computations to the GPU.
1614

17-
## Key implementation details
18-
The basic DPC++ implementation explained in the code includes device selector, unified shared memory, kernel, and command groups.
15+
## Purpose
16+
This sample uses blocked Floyd-Warshall all pairs shortest paths algorithm to compute a matrix that represents the minimum distance from any node to all other nodes in the graph. Using parallel blocked processing, blocks can be calculated simultaneously by distributing task computations to the GPU. For comparison, the application is run sequentially and parallel with run times for each displayed in the application's output. The device where the code us run is also identified.
1917

20-
The parallel implementation of blocked Floyd Warshall algorithm has three phases. Given a prior round of these computation phases are complete, phase 1 is independent; Phase 2 can only execute after phase 1 completes; Similarly phase 3 depends on phase 2 so can only execute after phase 2 is complete.
18+
The parallel implementation of blocked Floyd-Warshall algorithm has three phases. Given a prior round of these computation phases are complete, phase 1 is independent; Phase 2 can only execute after phase 1 completes; Similarly phase 3 depends on phase 2 so can only execute after phase 2 is complete.
2119

2220
The inner loop of the sequential implementation is:
2321
g[i][j] = min(g[i][j], g[i][k] + g[k][j])
2422

25-
A careful observation shows that for the kth iteration of the outer loop, the computation depends on cells either on the kth column, g[i][k] or on the kth row, g[k][j] of the graph. Phase 1 handles g[k][k], phase 2 handles g[\*][k] and g[k][\*], and phase 3 handles g[\*][\*] in that sequence. This cell level observations largely propagate to the blocks as well.
23+
A careful observation shows that for the kth iteration of the outer loop, the computation depends on cells either on the kth column, g[i][k] or on the kth row, g[k][j] of the graph. Phase 1 handles g[k][k], phase 2 handles g[\*][k] and g[k][\*], and phase 3 handles g[\*][\*] in that sequence. This cell level observations largely propagate to the blocks as well.
24+
25+
In each phase computation within a block can proceed independently in parallel.
26+
27+
28+
## Key implementation details
29+
Includes device selector, unified shared memory, kernel, and command groups in order to implement a solution using parallel block method targeting the GPU.
2630

27-
In each phase computation within a block can proceed independently in parallel.
2831

2932
## License
30-
This code sample is licensed under MIT license
33+
This code sample is licensed under MIT license.
34+
3135

3236
## Building the Program for CPU and GPU
3337

@@ -61,6 +65,7 @@ Perform the following steps:
6165
* Build the program using VS2017 or VS2019: Right click on the solution file and open using either VS2017 or VS2019 IDE. Right click on the project in Solution explorer and select Rebuild. From top menu select Debug -> Start without Debugging.
6266
* Build the program using MSBuild: Open "x64 Native Tools Command Prompt for VS2017" or "x64 Native Tools Command Prompt for VS2019". Run - MSBuild all-pairs-shortest-paths.sln /t:Rebuild /p:Configuration="Release"
6367

68+
6469
## Running the sample
6570

6671
### Example Output

0 commit comments

Comments
 (0)