Skip to content

Commit 4eccc86

Browse files
authored
Merge pull request #15 from JohT/feature/open-graph-data-science
Open graph-data-science
2 parents 83cf937 + 5183629 commit 4eccc86

36 files changed

+287
-151
lines changed

.github/workflows/code-reports.yml

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ jobs:
112112
env:
113113
NEO4J_INITIAL_PASSWORD: ${{ secrets.NEO4J_INITIAL_PASSWORD }}
114114
run: |
115-
./../../scripts/analysis/analyze.sh --report All --profile Neo4jv5
115+
./../../scripts/analysis/analyze.sh
116116
117117
- name: Move reports from the temp to the results directory preserving their surrounding directory
118118
working-directory: temp
@@ -128,13 +128,15 @@ jobs:
128128
retention-days: 5
129129

130130
# Upload Database Export
131-
- name: Archive exported database
132-
uses: actions/upload-artifact@v3
133-
with:
134-
name: code-report-database-export-${{ matrix.java }}-python-${{ matrix.python }}-mambaforge-${{ matrix.mambaforge }}
135-
path: ./temp/**/import
136-
if-no-files-found: error
137-
retention-days: 5
131+
# Only possible after an export with "./../../scripts/analysis/analyze.sh --report DatabaseCsvExport"
132+
# Won't be done here because of performance and security concerns
133+
#- name: Archive exported database
134+
# uses: actions/upload-artifact@v3
135+
# with:
136+
# name: code-report-database-export-${{ matrix.java }}-python-${{ matrix.python }}-mambaforge-${{ matrix.mambaforge }}
137+
# path: ./temp/**/import
138+
# if-no-files-found: error
139+
# retention-days: 5
138140

139141
# Commit and push the native image agent results
140142
- name: Display environment variable "github.event_name"

COMMANDS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@
5151

5252
The [analyze.sh](./scripts/analysis/analyze.sh) command comes with these command line options:
5353

54-
- `--report Csv` only generates CSV reports. This speeds up the report generation and doesn't depend on Python, Jupyter Notebook or any other related dependencies. The default value os `All` to generate all reports. `Jupiter` will only generate Jupyter Notebook reports.
54+
- `--report Csv` only generates CSV reports. This speeds up the report generation and doesn't depend on Python, Jupyter Notebook or any other related dependencies. The default value os `All` to generate all reports. `Jupiter` will only generate Jupyter Notebook reports. `DatabaseCsvExport` exports the whole graph database as a CSV file (performance intense, check if there are security concerns first).
5555
5656
- `--profile Neo4jv4` uses the older long term support (june 2023) version v4.4.x of Neo4j and suitable compatible versions of plugins and JQAssistant. `Neo4jv5` will explicitly select the newest (june 2023) version 5.x of Neo4j. Without setting
5757
a profile, the newest versions will be used. Profiles are scripts that can be found in the directory [scripts/profiles](./scripts/profiles/).
@@ -123,6 +123,7 @@ to download a Maven artifact into the artifacts directory:
123123
- `-a <maven artifact name>`
124124
- `-v <maven artifact version>`
125125
- `-t <maven artifact type (optional, defaults to jar)>`
126+
- `-d <target directory for the downloaded file (optional, defaults to "artifacts")>`
126127

127128
### Reset the database and scan the java artifacts
128129

renovate.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,18 @@
6666
"depNameTemplate": "neo4j/graph-data-science",
6767
"datasourceTemplate": "github-releases"
6868
},
69+
{
70+
"fileMatch": [
71+
"^scripts\/profiles\/Neo4jv5\\.sh$",
72+
"^scripts\/profiles\/Default\\.sh$",
73+
"^scripts\/[^\/]*\\.sh$"
74+
],
75+
"matchStrings": [
76+
"NEO4J_OPEN_GDS_PLUGIN_VERSION:-\\\"?(?<currentValue>.*?)\\\""
77+
],
78+
"depNameTemplate": "JohT/open-graph-data-science-packaging",
79+
"datasourceTemplate": "github-releases"
80+
},
6981
{
7082
"fileMatch": [
7183
"^scripts\/profiles\/Neo4jv5\\.sh$",

scripts/SCRIPTS.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,10 @@ Script | Directory | Description
88
| [analyze.sh](./analysis/analyze.sh) | analysis | Coordinates the end-to-end analysis process, encompassing tool installation, graph generation, and report generation. |
99
| [copyReportsIntoResults.sh](./copyReportsIntoResults.sh) | | Copies the results from the temp directory to the results directory grouped by the analysis name. |
1010
| [detectChangedArtifacts.sh](./detectChangedArtifacts.sh) | | Detect changed files in the artifacts directory with a text file containing the last hash code of the contents. |
11+
| [download.sh](./download.sh) | | Downloads a file into the directory of the environment variable SHARED_DOWNLOADS_DIRECTORY (or default "../downloads"). |
1112
| [downloadMavenArtifact.sh](./downloadMavenArtifact.sh) | | Downloads an artifact from Maven Central (https://mvnrepository.com/repos/central) |
1213
| [downloadAxonFramework.sh](./downloader/downloadAxonFramework.sh) | downloader | Downloads AxonFramework (https://developer.axoniq.io/axon-framework) artifacts from Maven Central. |
14+
| [analyzeAxonFramework.sh](./examples/analyzeAxonFramework.sh) | examples | This is an example for an analysis of AxonFramework |
1315
| [executeJupyterNotebook.sh](./executeJupyterNotebook.sh) | | Executes all steps in the given Jupyter Notebook (ipynb), stores it and converts it to Markdown (md) and PDF. |
1416
| [executeQuery.sh](./executeQuery.sh) | | Utilizes Neo4j's HTTP API to execute a Cypher query from an input file and provides the results in CSV format. |
1517
| [executeQueryFunctions.sh](./executeQueryFunctions.sh) | | Provides functions to execute Cypher queries using either "executeQuery.sh" or Neo4j's "cypher-shell". |
@@ -22,7 +24,6 @@ Script | Directory | Description
2224
| [Neo4jv5.sh](./profiles/Neo4jv5.sh) | profiles | Sets all settings variables for an analysis with Neo4j v5.x (newest version as of june 2023). |
2325
| [CentralityCsv.sh](./reports/CentralityCsv.sh) | reports | Looks for centrality using the Graph Data Science Library of Neo4j and creates CSV reports. |
2426
| [CommunityCsv.sh](./reports/CommunityCsv.sh) | reports | Detects communities using the Graph Data Science Library of Neo4j and creates CSV reports. |
25-
| [DatabaseCsvExport.sh](./reports/DatabaseCsvExport.sh) | reports | Exports the whole graph database as a CSV file using the APOC procedure "apoc.export.csv.all" |
2627
| [ExternalDependenciesCsv.sh](./reports/ExternalDependenciesCsv.sh) | reports | Executes "Package_Usage" Cypher queries to get the "external-dependencies-csv" CSV reports. |
2728
| [ExternalDependenciesJupyter.sh](./reports/ExternalDependenciesJupyter.sh) | reports | Creates the "overview" report (ipynb, md, pdf) based on the Jupyter Notebook "Overview.ipynb". |
2829
| [InternalDependenciesCsv.sh](./reports/InternalDependenciesCsv.sh) | reports | Executes "Package_Usage" Cypher queries to get the "internal-dependencies" CSV reports. |
@@ -37,6 +38,7 @@ Script | Directory | Description
3738
| [WordcloudJupyter.sh](./reports/WordcloudJupyter.sh) | reports | Creates the "overview" report (ipynb, md, pdf) based on the Jupyter Notebook "Overview.ipynb". |
3839
| [AllReports.sh](./reports/compilations/AllReports.sh) | compilations | Runs all report scripts. |
3940
| [CsvReports.sh](./reports/compilations/CsvReports.sh) | compilations | Runs all CSV report scripts (no Python and Chromium required). |
41+
| [DatabaseCsvExportReports.sh](./reports/compilations/DatabaseCsvExportReports.sh) | compilations | Exports the whole graph database as a CSV file using the APOC procedure "apoc.export.csv.all" |
4042
| [JupyterReports.sh](./reports/compilations/JupyterReports.sh) | compilations | Runs all Jupyter Notebook report scripts. |
4143
| [resetAndScan.sh](./resetAndScan.sh) | | Deletes all data in the Neo4j graph database and rescans the downloaded artifacts to create a new graph. |
4244
| [resetAndScanChanged.sh](./resetAndScanChanged.sh) | | Executes "resetAndScan.sh" only if "detectChangedArtifacts.sh" returns detected changes. |

scripts/analysis/analyze.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@
2828
# when it comes to subsequent executions.
2929
# Existing downloads, installations, scans and processes will be detected.
3030

31+
# Requires setupNeo4j.sh,setupJQAssistant.sh,startNeo4j.sh,resetAndScanChanged.sh,prepareAnalysis.sh,stopNeo4j.sh,comilations/*.sh,profiles/*.sh
32+
3133
# Overrideable variables with directory names
3234
REPORTS_SCRIPTS_DIRECTORY=${REPORTS_SCRIPTS_DIRECTORY:-"reports"}
3335
REPORT_COMPILATIONS_SCRIPTS_DIRECTORY=${REPORT_COMPILATIONS_SCRIPTS_DIRECTORY:-"compilations"}

scripts/copyReportsIntoResults.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66

77
# Notice that this scripts needs to be executed within the "temp" directory.
88

9+
# Requires generateMarkdownReference.sh
10+
911
## Get this "scripts" directory if not already set
1012
# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution.
1113
# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes.

scripts/download.sh

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
#!/usr/bin/env bash
2+
3+
# Downloads a file into the directory of the environment variable SHARED_DOWNLOADS_DIRECTORY (or default "../downloads").
4+
# Does nothing if the file already exists.
5+
6+
# Command line options:
7+
# --url Download URL (required)
8+
# --filename Target file name with extension without path (optional, default = basename of download URL)
9+
10+
# Function to display script usage
11+
usage() {
12+
echo "Usage: $0 --url https://my.download.url [--filename download-file-name-without-path.ext> (default=url filename)]"
13+
exit 1
14+
}
15+
16+
# Default values
17+
downloadUrl=""
18+
filename=""
19+
20+
# Parse command line arguments
21+
while [[ $# -gt 0 ]]; do
22+
key="$1"
23+
case $key in
24+
--url)
25+
downloadUrl="$2"
26+
shift
27+
;;
28+
--filename)
29+
filename="$2"
30+
shift
31+
;;
32+
*)
33+
echo "download: Error: Unknown option: ${key}"
34+
usage
35+
;;
36+
esac
37+
shift
38+
done
39+
40+
if [[ -z ${downloadUrl} ]]; then
41+
echo "${USAGE}"
42+
exit 1
43+
fi
44+
45+
if ! curl --head --fail ${downloadUrl} >/dev/null 2>&1; then
46+
echo "download: Error: Invalid URL: ${downloadUrl}"
47+
exit 1
48+
fi
49+
50+
if [[ -z ${filename} ]]; then
51+
filename=$(basename -- "${downloadUrl}")
52+
fi
53+
54+
# Get shared download directory and create it if it doesn't exist
55+
SHARED_DOWNLOADS_DIRECTORY="${SHARED_DOWNLOADS_DIRECTORY:-$(dirname "$( pwd )")/downloads}"
56+
if [ ! -d "${SHARED_DOWNLOADS_DIRECTORY}" ] ; then
57+
echo "download: Creating shared downloads directory ${SHARED_DOWNLOADS_DIRECTORY}"
58+
mkdir -p ${SHARED_DOWNLOADS_DIRECTORY}
59+
fi
60+
61+
# Download the file if it doesn't exist in the shared downloads directory
62+
if [ ! -f "${SHARED_DOWNLOADS_DIRECTORY}/${filename}" ] ; then
63+
echo "download: Downloading ${filename} from ${downloadUrl} into ${SHARED_DOWNLOADS_DIRECTORY}"
64+
65+
# Download the file
66+
if ! curl -L --fail-with-body -o "${SHARED_DOWNLOADS_DIRECTORY}/${filename}" "${downloadUrl}"; then
67+
echo "download: Error: Failed to download ${filename}"
68+
rm -f "${SHARED_DOWNLOADS_DIRECTORY}/${filename}"
69+
exit 1
70+
fi
71+
else
72+
echo "download: ${filename} already downloaded"
73+
fi
74+
75+
# Check downloaded file size to be at least 600 bytes or otherwise delete the invalid file
76+
downloaded_file_size=$(wc -c "${SHARED_DOWNLOADS_DIRECTORY}/${filename}" | awk '{print $1}')
77+
if [[ "${downloaded_file_size}" -le 600 ]]; then
78+
echo "download: Error: Failed to download ${filename}: Filesize: ${downloaded_file_size} < 600 bytes"
79+
rm -f "${SHARED_DOWNLOADS_DIRECTORY}/${filename}"
80+
exit 1
81+
fi
82+
83+
# Fail if download failed
84+
if [ ! -f "${SHARED_DOWNLOADS_DIRECTORY}/${filename}" ] ; then
85+
echo "download: Error: Failed to download ${filename}"
86+
exit 1
87+
fi

scripts/downloadMavenArtifact.sh

Lines changed: 46 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,66 +6,83 @@
66
# -a Maven Artifact Name
77
# -v Maven Artifact Version
88
# -t Maven Artifact Type (defaults to jar)
9+
# -d Target directory for the downloaded file
910

10-
# Read options
11-
ARTIFACT_TYPE="jar"
11+
# Requires download.sh
12+
13+
# Overrideable constants
14+
ARTIFACTS_DIRECTORY=${ARTIFACTS_DIRECTORY:-"artifacts"}
15+
SHARED_DOWNLOADS_DIRECTORY="${SHARED_DOWNLOADS_DIRECTORY:-$(dirname "$( pwd )")/downloads}"
16+
17+
# Default and initial values for command line options
18+
groupId=""
19+
artifactId=""
20+
version=""
21+
artifactType="jar"
22+
targetDirectory="${ARTIFACTS_DIRECTORY}"
23+
24+
# Read command line options
25+
USAGE="downloadMavenArtifact: Usage: $0 [-g group_id] [-a artifact_id] [-v version] [-t type (default=jar)] [-d targetDirectory (default=${ARTIFACTS_DIRECTORY})]"
1226
OPTIND=1
13-
while getopts "g:a:v:t:" opt; do
27+
while getopts "g:a:v:t:d:" opt; do
1428
case ${opt} in
1529
g )
16-
GROUP_ID=${OPTARG}
30+
groupId=${OPTARG}
1731
;;
1832
a )
19-
ARTIFACT_ID=${OPTARG}
33+
artifactId=${OPTARG}
2034
;;
2135
v )
22-
VERSION=${OPTARG}
36+
version=${OPTARG}
2337
;;
2438
t )
25-
ARTIFACT_TYPE=${OPTARG}
39+
artifactType=${OPTARG}
40+
;;
41+
d )
42+
targetDirectory=${OPTARG}
2643
;;
2744
\? )
28-
echo "Usage: $0 [-g group_id] [-a artifact_id] [-v version] [-t type (default=jar)]"
45+
echo "${USAGE}"
2946
exit 1
3047
;;
3148
esac
3249
done
3350

34-
if [[ -z ${GROUP_ID} || -z ${ARTIFACT_ID} || -z ${VERSION} || -z ${ARTIFACT_TYPE} ]]; then
35-
echo "Usage: $0 [-g group_id] [-a artifact_id] [-v version] [-t type (default=jar)]"
51+
if [[ -z ${groupId} || -z ${artifactId} || -z ${version} || -z ${artifactType} || -z ${targetDirectory} ]]; then
52+
echo "${USAGE}"
3653
exit 1
3754
fi
3855

39-
# Overrideable constants
40-
ARTIFACTS_DIRECTORY=${ARTIFACTS_DIRECTORY:-"artifacts"}
56+
## Get this "scripts" directory if not already set
57+
# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution.
58+
# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes.
59+
# This way non-standard tools like readlink aren't needed.
60+
SCRIPTS_DIR=${SCRIPTS_DIR:-$( CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P )}
4161

4262
# Internal constants
4363
BASE_URL="https://repo1.maven.org/maven2"
44-
ARTIFACT_FILENAME="${ARTIFACT_ID}-${VERSION}.${ARTIFACT_TYPE}"
45-
GROUP_ID_FOR_API="$(echo "${GROUP_ID}" | tr '.' '/')"
46-
DOWNLOAD_URL="${BASE_URL}/${GROUP_ID_FOR_API}/${ARTIFACT_ID}/${VERSION}/${ARTIFACT_FILENAME}"
47-
48-
# Download Maven Artifact into the ARTIFACTS_DIRECTORY
49-
if [ ! -f "${ARTIFACTS_DIRECTORY}/${ARTIFACT_FILENAME}" ] ; then
50-
echo "Downloading ${DOWNLOAD_URL}"
64+
ARTIFACT_FILENAME="${artifactId}-${version}.${artifactType}"
65+
GROUP_ID_FOR_API="$(echo "${groupId}" | tr '.' '/')"
66+
DOWNLOAD_URL="${BASE_URL}/${GROUP_ID_FOR_API}/${artifactId}/${version}/${ARTIFACT_FILENAME}"
5167

52-
# Download Maven Artifact
53-
curl -L --fail-with-body -O "${DOWNLOAD_URL}"
68+
# Download Maven Artifact into the "targetDirectory"
69+
if [ ! -f "./${targetDirectory}/${ARTIFACT_FILENAME}" ] ; then
70+
source ${SCRIPTS_DIR}/download.sh --url "${DOWNLOAD_URL}" || exit 1
5471

55-
# Create artifacts directory if it doen't exist
56-
mkdir -p "${ARTIFACTS_DIRECTORY}"
72+
# Create artifacts targetDirectory if it doen't exist
73+
mkdir -p "./${targetDirectory}" || exit 1
5774

5875
# Delete already existing older versions of the artifact
59-
rm -f "${ARTIFACTS_DIRECTORY}/${ARTIFACT_ID}"*
76+
rm -f "./${targetDirectory}/${artifactId}"* || exit 1
6077

61-
# Move artifact to artifacts directory
62-
mv "${ARTIFACT_FILENAME}" "${ARTIFACTS_DIRECTORY}"
78+
# Copy artifact into artifacts targetDirectory
79+
cp -R "${SHARED_DOWNLOADS_DIRECTORY}/${ARTIFACT_FILENAME}" "./${targetDirectory}" || exit 1
6380
else
64-
echo "${ARTIFACT_FILENAME} already downloaded"
81+
echo "downloadMavenArtifact: ${ARTIFACT_FILENAME} already downloaded into target directory ${targetDirectory}"
6582
fi
6683

6784
# Fail if Maven Download failed
68-
if [ ! -f "${ARTIFACTS_DIRECTORY}/${ARTIFACT_FILENAME}" ] ; then
69-
echo "Failed to download ${ARTIFACT_FILENAME}"
85+
if [ ! -f "${targetDirectory}/${ARTIFACT_FILENAME}" ] ; then
86+
echo "downloadMavenArtifact: Error: Failed to download ${ARTIFACT_FILENAME}"
7087
exit 1
7188
fi

scripts/downloader/downloadAxonFramework.sh

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88

99
# Note: This script is meant to be started within the temporary analysis directory (e.g. "temp/AnalysisName/")
1010

11+
# Requires downloadMavenArtifact.sh
12+
1113
# Get the analysis name from the middle part of the current file name (without prefix "download" and without extension)
1214
SCRIPT_FILE_NAME="$(basename -- "${BASH_SOURCE[0]}")"
1315
SCRIPT_FILE_NAME_WITHOUT_EXTENSION="${SCRIPT_FILE_NAME%%.*}"
@@ -30,11 +32,11 @@ echo "download${ANALYSIS_NAME}: ARTIFACTS_VERSION=${ARTIFACTS_VERSION}"
3032
# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution.
3133
# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes.
3234
# This way non-standard tools like readlink aren't needed.
33-
ANALYSIS_SCRIPT_DIR=${ANALYSIS_SCRIPT_DIR:-$( CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P )}
34-
echo "download${ANALYSIS_NAME}: ANALYSIS_SCRIPT_DIR=${ANALYSIS_SCRIPT_DIR}"
35+
DOWNLOADER_SCRIPTS_DIR=${DOWNLOADER_SCRIPTS_DIR:-$( CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P )}
36+
echo "download${ANALYSIS_NAME}: DOWNLOADER_SCRIPTS_DIR=${DOWNLOADER_SCRIPTS_DIR}"
3537

3638
# Get the "scripts" directory by taking the path of this script and going one directory up.
37-
SCRIPTS_DIR=${SCRIPTS_DIR:-$(dirname -- "${ANALYSIS_SCRIPT_DIR}")}
39+
SCRIPTS_DIR=${SCRIPTS_DIR:-$(dirname -- "${DOWNLOADER_SCRIPTS_DIR}")}
3840
echo "download${ANALYSIS_NAME}: SCRIPTS_DIR=${SCRIPTS_DIR}"
3941

4042
################################################################

scripts/executeQueryFunctions.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
# Provides functions to execute Cypher queries using either "executeQuery.sh" or Neo4j's "cypher-shell".
44

5+
# Requires executeQuery.sh
6+
57
## Get this "scripts" directory if not already set
68
# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution.
79
# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes.

0 commit comments

Comments
 (0)