oneapi-src · JoeOster · Oct 7, 2020 · Aug 11, 2020 · Aug 12, 2020 · Aug 19, 2020
diff --git a/.github/images/FileStructure.png b/.github/images/FileStructure.png
diff --git a/.github/images/README.md b/.github/images/README.md
@@ -0,0 +1 @@
+
diff --git a/DirectProgramming/DPC++/GraphAlgorithms/all-pairs-shortest-paths/README.md b/DirectProgramming/DPC++/GraphAlgorithms/all-pairs-shortest-paths/README.md
@@ -1,5 +1,5 @@
 # All Pairs Shortest Paths sample
-`All Pairs Shortest Paths` uses the Floyd Warshall algorithm to find the shortest paths between pairs of vertices in a graph. It uses a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.
+`All Pairs Shortest Paths` uses the Floyd-Warshall algorithm to find the shortest paths between pairs of vertices in a graph. It uses a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.
 
 For comprehensive instructions regarding DPC++ Programming, go to https://software.intel.com/en-us/oneapi-programming-guide and search based on relevant terms noted in the comments.
 
@@ -11,23 +11,27 @@ For comprehensive instructions regarding DPC++ Programming, go to https://softwa
 | What you will learn               | The All Pairs Shortest Paths sample demonstrates the following using the Intel&reg; oneAPI DPC++/C++ Compiler <ul><li>Offloading compute intensive parts of the application using lambda kernel</li><li>Measuring kernel execution time</li></ul>
 | Time to complete                  | 15 minutes
 
-## Purpose
-This sample uses blocked Floyd-Warshall all pairs shortest paths algorithm to compute a matrix that represents the minimum distance from any node to all other nodes in the graph. Using parallel blocked processing, blocks can be calculated simultaneously by distributing task computations to the GPU.
 
-## Key implementation details
-The basic DPC++ implementation explained in the code includes device selector, unified shared memory, kernel, and command groups.
+## Purpose
+This sample uses blocked Floyd-Warshall all pairs shortest paths algorithm to compute a matrix that represents the minimum distance from any node to all other nodes in the graph. Using parallel blocked processing, blocks can be calculated simultaneously by distributing task computations to the GPU. For comparison, the application is run sequentially and parallel with run times for each displayed in the application's output. The device where the code us run is also identified.
 
-The parallel implementation of blocked Floyd Warshall algorithm has three phases. Given a prior round of these computation phases are complete, phase 1 is independent; Phase 2 can only execute after phase 1 completes; Similarly phase 3 depends on phase 2 so can only execute after phase 2 is complete.
+The parallel implementation of blocked Floyd-Warshall algorithm has three phases. Given a prior round of these computation phases are complete, phase 1 is independent; Phase 2 can only execute after phase 1 completes; Similarly phase 3 depends on phase 2 so can only execute after phase 2 is complete.
 
 The inner loop of the sequential implementation is:
   g[i][j] = min(g[i][j], g[i][k] + g[k][j])
 
-A careful observation shows that for the kth iteration of the outer loop, the computation depends on cells either on the kth column, g[i][k] or on the kth row, g[k][j] of the graph. Phase 1 handles g[k][k], phase 2 handles g[\*][k] and g[k][\*], and phase 3 handles g[\*][\*] in that sequence. This cell level observations largely propagate to the blocks as well.
+A careful observation shows that for the kth iteration of the outer loop, the computation depends on cells either on the kth column, g[i][k] or on the kth row, g[k][j] of the graph. Phase 1 handles g[k][k], phase 2 handles g[\*][k] and g[k][\*], and phase 3 handles g[\*][\*] in that sequence. This cell level observations largely propagate to the blocks as well. 
+
+In each phase computation within a block can proceed independently in parallel. 
+
+
+## Key implementation details
+Includes device selector, unified shared memory, kernel, and command groups in order to implement a solution using parallel block method targeting the GPU. 
 
-In each phase computation within a block can proceed independently in parallel.
 
 ## License  
-This code sample is licensed under MIT license 
+This code sample is licensed under MIT license. 
+
 
 ## Building the Program for CPU and GPU
 
@@ -61,6 +65,7 @@ Perform the following steps:
 * Build the program using VS2017 or VS2019: Right click on the solution file and open using either VS2017 or VS2019 IDE. Right click on the project in Solution explorer and select Rebuild. From top menu select Debug -> Start without Debugging.
 * Build the program using MSBuild: Open "x64 Native Tools Command Prompt for VS2017" or "x64 Native Tools Command Prompt for VS2019". Run - MSBuild all-pairs-shortest-paths.sln /t:Rebuild /p:Configuration="Release"
 
+
 ## Running the sample
 
 ### Example Output

diff --git a/DirectProgramming/DPC++/GraphAlgorithms/all-pairs-shortest-paths/sample.json b/DirectProgramming/DPC++/GraphAlgorithms/all-pairs-shortest-paths/sample.json
@@ -2,7 +2,7 @@
     "guid": "4F0F5FBD-C237-4A9F-B259-2C56ABFB40D9",
     "name": "all-pairs-shortest-paths",
     "categories": [ "Toolkit/Intel® oneAPI Base Toolkit/oneAPI DPC++/C++ Compiler/CPU and GPU" ],
-    "description": "all-pairs-shortest-paths using Intel® oneAPI DPC++ Language",
+    "description": "All Pairs Shortest Paths uses finds the shortest paths between pairs of vertices in a graph using a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.",
     "toolchain": [ "dpcpp" ],
     "targetDevice": [ "CPU", "GPU" ],
     "languages": [ { "cpp": {} } ],

diff --git a/DirectProgramming/DPC++/SparseLinearAlgebra/merge-spmv/README.md b/DirectProgramming/DPC++/SparseLinearAlgebra/merge-spmv/README.md
@@ -11,11 +11,11 @@ For comprehensive instructions regarding DPC++ Programming, go to https://softwa
 | What you will learn               | The Sparse Matrix Vector sample demonstrates the following using the Intel&reg; oneAPI DPC++/C++ Compiler <ul><li>Offloading compute intensive parts of the application using lambda kernel</li><li>Measuring kernel execution time</li></ul>
 | Time to complete                  | 15 minutes
 
+
 ## Purpose
-Sparse linear algebra algorithms are common in HPC, in fields as machine learning and computational science. In this sample, a merge based sparse matrix and vector multiplication algorithm is implemented. The input matrix is in compressed sparse row format. Use a parallel merge model enables the application to efficiently offload compute intensive operation to the GPU.
+Sparse linear algebra algorithms are common in HPC, in fields as machine learning and computational science. In this sample, a merge based sparse matrix and vector multiplication algorithm is implemented. The input matrix is in compressed sparse row format. Use a parallel merge model enables the application to efficiently offload compute intensive operation to the GPU. For comparison, the application is run sequentially and parallelly with run times for each displayed in the application's output. The device where the code is run is also identified.
 
-## Key implementation details
-The basic DPC++ implementation explained in the code includes device selector, unified shared memory, kernel, and command groups.
+The workgroup size requirement is 256.  If your hardware cannot support this, the application will present an error.
 
 Compressed Sparse Row (CSR) representation for sparse matrix have three components:
 <ul>
@@ -28,14 +28,19 @@ Both row offsets and values indices can be thought of as sorted arrays. The prog
 
 In parallel implementation, each thread independently identifies its scope of the merge and then performs only the amount of work that belongs this thread in the cohort of threads.
 
+## Key implementation details
+Includes device selector, unified shared memory, kernel, and command groups in order to implement a solution using a parallel merge method ih which each thread independently identifies its scope of the merge and then performs only the amount of work that belongs this thread. 
+
+
 ## License  
-This code sample is licensed under MIT license 
+This code sample is licensed under MIT license. 
 
 ## Building the Program for CPU and GPU
 
 ### Include Files
 The include folder is located at `%ONEAPI_ROOT%\dev-utilities\latest\include` on your development system.
 
+
 ### Running Samples in DevCloud
 If running a sample in the Intel DevCloud, remember that you must specify the compute node (CPU, GPU, FPGA) as well whether to run in batch or interactive mode. For more information see the Intel&reg; oneAPI Base Toolkit Get Started Guide (https://devcloud.intel.com/oneapi/get-started/base-toolkit/)
 
@@ -63,6 +68,7 @@ Perform the following steps:
 * Build the program using VS2017 or VS2019: Right click on the solution file and open using either VS2017 or VS2019 IDE. Right click on the project in Solution explorer and select Rebuild. From top menu select Debug -> Start without Debugging.
 * Build the program using MSBuild: Open "x64 Native Tools Command Prompt for VS2017" or "x64 Native Tools Command Prompt for VS2019". Run - MSBuild merge-spmv.sln /t:Rebuild /p:Configuration="Release"
 
+
 ## Running the sample
 
 ### Example Output

diff --git a/DirectProgramming/DPC++/SparseLinearAlgebra/merge-spmv/sample.json b/DirectProgramming/DPC++/SparseLinearAlgebra/merge-spmv/sample.json
@@ -2,7 +2,7 @@
     "guid": "C573751F-C04C-4EE0-A868-941EF460F944",
     "name": "merge-spmv",
     "categories": [ "Toolkit/Intel® oneAPI Base Toolkit/oneAPI DPC++/C++ Compiler/CPU and GPU" ],
-    "description": "merge-spmv using Intel® oneAPI DPC++ Language",
+    "description": "Sparse Matrix Vector sample provides a parallel implementation of a merge based sparse matrix and vector multiplication algorithm using DPC++.",
     "toolchain": [ "dpcpp" ],
     "targetDevice": [ "CPU", "GPU" ],
     "languages": [ { "cpp": {} } ],