Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
fe34bc7
Merge pull request #1 from oneapi-src/master
JoeOster Aug 11, 2020
632a7a4
Updating License file to no date in the title /*
JoeOster Aug 12, 2020
7a3358d
Merge pull request #2 from oneapi-src/master
JoeOster Aug 19, 2020
0032b0b
Update README.md
JoeOster Aug 19, 2020
dadbdab
Fix FPGA entries
akertesz Aug 19, 2020
11fdd3b
Update README.md
JoeOster Aug 20, 2020
ad94562
Merge pull request #3 from oneapi-src/master
JoeOster Aug 21, 2020
c9f8629
Update README.md
JoeOster Aug 23, 2020
8a450a0
Merge pull request #4 from oneapi-src/master
JoeOster Aug 26, 2020
84ce0f1
removing duplicate samples after transfering to dwarves folders
JoeOster Aug 26, 2020
c5c3880
Update Makefile.win
JoeOster Aug 26, 2020
0ead459
Update Makefile.win
JoeOster Aug 26, 2020
d024a8e
Update Makefile.win.fpga
JoeOster Aug 26, 2020
d5ce16c
Update CMakeLists.txt
JoeOster Aug 27, 2020
a8f34a5
Update CMakeLists.txt
JoeOster Aug 27, 2020
d1d4a6b
Update CMakeLists.txt
JoeOster Aug 27, 2020
2e989df
Merge pull request #5 from oneapi-src/master
JoeOster Aug 27, 2020
ca8ffa4
Merge pull request #6 from oneapi-src/master
JoeOster Sep 15, 2020
828111c
Update README.md
JoeOster Sep 15, 2020
801a485
Update README.md
JoeOster Sep 15, 2020
0bda57a
Merge pull request #7 from oneapi-src/master
JoeOster Sep 29, 2020
357d49b
Update from Legal Approval of 10/05/2020
JoeOster Oct 5, 2020
a989fad
Merge pull request #8 from oneapi-src/master
JoeOster Oct 5, 2020
01f8379
Merge pull request #9 from oneapi-src/master
JoeOster Oct 6, 2020
0f5032f
Create README.md
JoeOster Oct 6, 2020
a643759
Add files via upload
JoeOster Oct 6, 2020
29dd1b9
Merge pull request #10 from oneapi-src/master
JoeOster Oct 7, 2020
b10f0ad
Update README.md
tomlenth Oct 7, 2020
1a68b03
Update sample.json
tomlenth Oct 7, 2020
630bfb4
Update README.md
tomlenth Oct 7, 2020
c51adb5
Update sample.json
tomlenth Oct 7, 2020
e9e91b9
Merge pull request #14 from tomlenth/patch-4
JoeOster Oct 7, 2020
37cb3ea
Merge pull request #13 from tomlenth/patch-3
JoeOster Oct 7, 2020
e1aa6b2
Merge pull request #12 from tomlenth/patch-2
JoeOster Oct 7, 2020
48575d8
Merge pull request #11 from tomlenth/patch-1
JoeOster Oct 7, 2020
7316776
Update README.md
JoeOster Oct 7, 2020
4f422dc
Merge pull request #15 from oneapi-src/master
JoeOster Oct 7, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .github/images/FileStructure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions .github/images/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# All Pairs Shortest Paths sample
`All Pairs Shortest Paths` uses the Floyd Warshall algorithm to find the shortest paths between pairs of vertices in a graph. It uses a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.
`All Pairs Shortest Paths` uses the Floyd-Warshall algorithm to find the shortest paths between pairs of vertices in a graph. It uses a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.

For comprehensive instructions regarding DPC++ Programming, go to https://software.intel.com/en-us/oneapi-programming-guide and search based on relevant terms noted in the comments.

Expand All @@ -11,23 +11,27 @@ For comprehensive instructions regarding DPC++ Programming, go to https://softwa
| What you will learn | The All Pairs Shortest Paths sample demonstrates the following using the Intel&reg; oneAPI DPC++/C++ Compiler <ul><li>Offloading compute intensive parts of the application using lambda kernel</li><li>Measuring kernel execution time</li></ul>
| Time to complete | 15 minutes

## Purpose
This sample uses blocked Floyd-Warshall all pairs shortest paths algorithm to compute a matrix that represents the minimum distance from any node to all other nodes in the graph. Using parallel blocked processing, blocks can be calculated simultaneously by distributing task computations to the GPU.

## Key implementation details
The basic DPC++ implementation explained in the code includes device selector, unified shared memory, kernel, and command groups.
## Purpose
This sample uses blocked Floyd-Warshall all pairs shortest paths algorithm to compute a matrix that represents the minimum distance from any node to all other nodes in the graph. Using parallel blocked processing, blocks can be calculated simultaneously by distributing task computations to the GPU. For comparison, the application is run sequentially and parallel with run times for each displayed in the application's output. The device where the code us run is also identified.

The parallel implementation of blocked Floyd Warshall algorithm has three phases. Given a prior round of these computation phases are complete, phase 1 is independent; Phase 2 can only execute after phase 1 completes; Similarly phase 3 depends on phase 2 so can only execute after phase 2 is complete.
The parallel implementation of blocked Floyd-Warshall algorithm has three phases. Given a prior round of these computation phases are complete, phase 1 is independent; Phase 2 can only execute after phase 1 completes; Similarly phase 3 depends on phase 2 so can only execute after phase 2 is complete.

The inner loop of the sequential implementation is:
g[i][j] = min(g[i][j], g[i][k] + g[k][j])

A careful observation shows that for the kth iteration of the outer loop, the computation depends on cells either on the kth column, g[i][k] or on the kth row, g[k][j] of the graph. Phase 1 handles g[k][k], phase 2 handles g[\*][k] and g[k][\*], and phase 3 handles g[\*][\*] in that sequence. This cell level observations largely propagate to the blocks as well.
A careful observation shows that for the kth iteration of the outer loop, the computation depends on cells either on the kth column, g[i][k] or on the kth row, g[k][j] of the graph. Phase 1 handles g[k][k], phase 2 handles g[\*][k] and g[k][\*], and phase 3 handles g[\*][\*] in that sequence. This cell level observations largely propagate to the blocks as well.

In each phase computation within a block can proceed independently in parallel.


## Key implementation details
Includes device selector, unified shared memory, kernel, and command groups in order to implement a solution using parallel block method targeting the GPU.

In each phase computation within a block can proceed independently in parallel.

## License
This code sample is licensed under MIT license
This code sample is licensed under MIT license.


## Building the Program for CPU and GPU

Expand Down Expand Up @@ -61,6 +65,7 @@ Perform the following steps:
* Build the program using VS2017 or VS2019: Right click on the solution file and open using either VS2017 or VS2019 IDE. Right click on the project in Solution explorer and select Rebuild. From top menu select Debug -> Start without Debugging.
* Build the program using MSBuild: Open "x64 Native Tools Command Prompt for VS2017" or "x64 Native Tools Command Prompt for VS2019". Run - MSBuild all-pairs-shortest-paths.sln /t:Rebuild /p:Configuration="Release"


## Running the sample

### Example Output
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"guid": "4F0F5FBD-C237-4A9F-B259-2C56ABFB40D9",
"name": "all-pairs-shortest-paths",
"categories": [ "Toolkit/Intel® oneAPI Base Toolkit/oneAPI DPC++/C++ Compiler/CPU and GPU" ],
"description": "all-pairs-shortest-paths using Intel® oneAPI DPC++ Language",
"description": "All Pairs Shortest Paths uses finds the shortest paths between pairs of vertices in a graph using a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"All Pairs Shortest Paths uses finds the shortest paths" -> "All Pairs Shortest Paths finds the shortest paths"

"toolchain": [ "dpcpp" ],
"targetDevice": [ "CPU", "GPU" ],
"languages": [ { "cpp": {} } ],
Expand Down
14 changes: 10 additions & 4 deletions DirectProgramming/DPC++/SparseLinearAlgebra/merge-spmv/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ For comprehensive instructions regarding DPC++ Programming, go to https://softwa
| What you will learn | The Sparse Matrix Vector sample demonstrates the following using the Intel&reg; oneAPI DPC++/C++ Compiler <ul><li>Offloading compute intensive parts of the application using lambda kernel</li><li>Measuring kernel execution time</li></ul>
| Time to complete | 15 minutes


## Purpose
Sparse linear algebra algorithms are common in HPC, in fields as machine learning and computational science. In this sample, a merge based sparse matrix and vector multiplication algorithm is implemented. The input matrix is in compressed sparse row format. Use a parallel merge model enables the application to efficiently offload compute intensive operation to the GPU.
Sparse linear algebra algorithms are common in HPC, in fields as machine learning and computational science. In this sample, a merge based sparse matrix and vector multiplication algorithm is implemented. The input matrix is in compressed sparse row format. Use a parallel merge model enables the application to efficiently offload compute intensive operation to the GPU. For comparison, the application is run sequentially and parallelly with run times for each displayed in the application's output. The device where the code is run is also identified.

## Key implementation details
The basic DPC++ implementation explained in the code includes device selector, unified shared memory, kernel, and command groups.
The workgroup size requirement is 256. If your hardware cannot support this, the application will present an error.

Compressed Sparse Row (CSR) representation for sparse matrix have three components:
<ul>
Expand All @@ -28,14 +28,19 @@ Both row offsets and values indices can be thought of as sorted arrays. The prog

In parallel implementation, each thread independently identifies its scope of the merge and then performs only the amount of work that belongs this thread in the cohort of threads.

## Key implementation details
Includes device selector, unified shared memory, kernel, and command groups in order to implement a solution using a parallel merge method ih which each thread independently identifies its scope of the merge and then performs only the amount of work that belongs this thread.


## License
This code sample is licensed under MIT license
This code sample is licensed under MIT license.

## Building the Program for CPU and GPU

### Include Files
The include folder is located at `%ONEAPI_ROOT%\dev-utilities\latest\include` on your development system.


### Running Samples in DevCloud
If running a sample in the Intel DevCloud, remember that you must specify the compute node (CPU, GPU, FPGA) as well whether to run in batch or interactive mode. For more information see the Intel&reg; oneAPI Base Toolkit Get Started Guide (https://devcloud.intel.com/oneapi/get-started/base-toolkit/)

Expand Down Expand Up @@ -63,6 +68,7 @@ Perform the following steps:
* Build the program using VS2017 or VS2019: Right click on the solution file and open using either VS2017 or VS2019 IDE. Right click on the project in Solution explorer and select Rebuild. From top menu select Debug -> Start without Debugging.
* Build the program using MSBuild: Open "x64 Native Tools Command Prompt for VS2017" or "x64 Native Tools Command Prompt for VS2019". Run - MSBuild merge-spmv.sln /t:Rebuild /p:Configuration="Release"


## Running the sample

### Example Output
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"guid": "C573751F-C04C-4EE0-A868-941EF460F944",
"name": "merge-spmv",
"categories": [ "Toolkit/Intel® oneAPI Base Toolkit/oneAPI DPC++/C++ Compiler/CPU and GPU" ],
"description": "merge-spmv using Intel® oneAPI DPC++ Language",
"description": "Sparse Matrix Vector sample provides a parallel implementation of a merge based sparse matrix and vector multiplication algorithm using DPC++.",
"toolchain": [ "dpcpp" ],
"targetDevice": [ "CPU", "GPU" ],
"languages": [ { "cpp": {} } ],
Expand Down