Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 14 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Code Graph Analysis Pipeline

<img src="./images/DALL-E-Mini-Graph-Pipeline-Logo.png" align="left" hspace="10" width="180">
<img src="./images/DALL-E-Mini-Graph-Pipeline-Logo-2.png" align="left" hspace="10" width="180">

Contained within this repository is a comprehensive and automated code graph analysis pipeline. While initially designed to support Java through the utilization of [jQAssistant](https://jqassistant.org/get-started), its capabilities extend beyond that particular language. The graph database [Neo4j](https://neo4j.com) serves as the foundation for storing and querying the graph, which encompasses all the structural intricacies of the analyzed code. Additionally, Neo4j's [Graph Data Science](https://neo4j.com/product/graph-data-science) integration maximizes the utilization of its features. The generated reports offer flexibility, ranging from simple query results presented as CSV files to more elaborate Jupyter Notebooks converted to Markdown or PDF formats.
Contained within this repository is a comprehensive and automated code graph analysis pipeline. While initially designed to support Java through the utilization of [jQAssistant](https://jqassistant.org/get-started), it is open to extension for further programming languages. The graph database [Neo4j](https://neo4j.com) serves as the foundation for storing and querying the graph, which encompasses all the structural intricacies of the analyzed code. Additionally, Neo4j's [Graph Data Science](https://neo4j.com/product/graph-data-science) provides additional algorithms like community detection to analyze the code structure. The generated reports offer flexibility, ranging from simple query results presented as CSV files to more elaborate Jupyter Notebooks converted to Markdown or PDF formats.

---

Expand All @@ -16,12 +16,12 @@ Contained within this repository is a comprehensive and automated code graph ana

### 📖 Jupyter Notebook Reports

- [External Dependencies](./jupyter/ExternalDependencies.ipynb) reports with amongst others the most and least used external packages ([Example](./results/AxonFramework-4.7.5/external-dependencies/ExternalDependencies.md))
- [Object Oriented Design Quality Metrics](./jupyter/ObjectOrientedDesignMetrics.ipynb) report based on [OO Design Quality Metrics by Robert Martin](https://www.semanticscholar.org/paper/OO-Design-Quality-Metrics-Martin-October/18acd7eb21b918c8a5f619157f7e4f6d451d18f8) ([Example](./results/AxonFramework-4.7.5/object-oriented-design-metrics/ObjectOrientedDesignMetrics.md))
- [Overview](./jupyter/Overview.ipynb) reports with the number of Java types and packages, method line count, etc. ([Example](./results/AxonFramework-4.7.5/overview/Overview.md))
- [Package Dependencies](./jupyter/PackageDependencies.ipynb) report based on [Analyze java package metrics in a graph database](https://joht.github.io/johtizen/data/2023/04/21/java-package-metrics-analysis.html) including cyclic dependencies ([Example](./results/AxonFramework-4.7.5/package-dependencies/PackageDependencies.md))
- [Visibility Metrics](./jupyter/VisibilityMetrics.ipynb) reports based on [Visibility Metrics and the Importance of Hiding Things](https://dzone.com/articles/visibility-metrics-and-the-importance-of-hiding-th) ([Example](./results/AxonFramework-4.7.5/visibility-metrics/VisibilityMetrics.md))
- [Wordcloud](./jupyter/Wordcloud.ipynb) with a visual representation of Java package and class names ([Example](./results/AxonFramework-4.7.5/wordcloud/Wordcloud.md))
- [External Dependencies](./jupyter/ExternalDependencies.ipynb) contains the most and least used external packages, etc. ([Example](./results/AxonFramework-4.7.5/external-dependencies/ExternalDependencies.md))
- [Object Oriented Design Quality Metrics](./jupyter/ObjectOrientedDesignMetrics.ipynb) is based on [OO Design Quality Metrics by Robert Martin](https://www.semanticscholar.org/paper/OO-Design-Quality-Metrics-Martin-October/18acd7eb21b918c8a5f619157f7e4f6d451d18f8) ([Example](./results/AxonFramework-4.7.5/object-oriented-design-metrics/ObjectOrientedDesignMetrics.md))
- [Overview](./jupyter/Overview.ipynb) contains the number of types and packages, method line count, cyclomatic complexity, etc. ([Example](./results/AxonFramework-4.7.5/overview/Overview.md))
- [Internal Dependencies](./jupyter/InternalDependencies.ipynb) is based on [Analyze java package metrics in a graph database](https://joht.github.io/johtizen/data/2023/04/21/java-package-metrics-analysis.html) including cyclic dependencies ([Example](./results/AxonFramework-4.7.5/internal-dependencies/InternalDependencies.md))
- [Visibility Metrics](./jupyter/VisibilityMetrics.ipynb) is based on [Visibility Metrics and the Importance of Hiding Things](https://dzone.com/articles/visibility-metrics-and-the-importance-of-hiding-th) ([Example](./results/AxonFramework-4.7.5/visibility-metrics/VisibilityMetrics.md))
- [Wordcloud](./jupyter/Wordcloud.ipynb) contains a visual representation of package and class names ([Example](./results/AxonFramework-4.7.5/wordcloud/Wordcloud.md))

### 📖 Graph Data Science Reports

Expand All @@ -36,20 +36,19 @@ Here are some reports that utilize Neo4j's [Graph Data Science Library](https://
- [External Dependencies (CSV)](./scripts/reports/ExternalDependenciesCsv.sh) ([Example](./results/AxonFramework-4.7.5/external-dependencies-csv/External_package_usage_overall.csv))
- [Object Oriented Design Metrics (CSV)](./scripts/reports/ObjectOrientedDesignMetricsCsv.sh) ([Example](./results/AxonFramework-4.7.5/object-oriented-design-metrics-csv/MainSequenceAbstractnessInstabilityDistance.csv))
- [Overview (CSV)](./scripts/reports/OverviewCsv.sh) ([Example](./results/AxonFramework-4.7.5/overview-csv/Cyclomatic_Method_Complexity.csv))
- [Package Dependencies - Cyclic (CSV)](./scripts/reports/PackageDependenciesCsv.sh) ([Example](./results/AxonFramework-4.7.5/package-dependencies-csv/CyclicDependenciesUnwinded.csv))
- [Package Dependencies - Interface Segregation (CSV)](./scripts/reports/PackageDependenciesCsv.sh) ([Example](./results/AxonFramework-4.7.5/package-dependencies-csv/InterfaceSegregationCandidates.csv))
- [Internal Dependencies - Cyclic (CSV)](./scripts/reports/InternalDependenciesCsv.sh) ([Example](./results/AxonFramework-4.7.5/internal-dependencies-csv/CyclicDependenciesUnwinded.csv))
- [Internal Dependencies - Interface Segregation (CSV)](./scripts/reports/InternalDependenciesCsv.sh) ([Example](./results/AxonFramework-4.7.5/internal-dependencies-csv/InterfaceSegregationCandidates.csv))
- [Visibility Metrics (CSV)](./scripts/reports/VisibilityMetricsCsv.sh) ([Example](./results/AxonFramework-4.7.5/visibility-metrics-csv/RelativeVisibilityPerArtifact.csv))

## 🛠 Prerequisites

- Java 11 is required (June 2023 Neo4j 4.x requirement)
- Python with a conda package manager is needed for Jupyter Notebook reports
- Chromium will automatically be downloaded if needed for Jupyter Notebook reports in PDF format.
- Java 17 is required (June 2023 Neo4j 5.x requirement)
- Python and a conda package manager are required for Jupyter Notebook reports
- Chromium will automatically be downloaded if needed for Jupyter Notebook reports in PDF format

## Getting Started

See [Start an analysis](./COMMANDS.md#start-an-analysis) in the [Commands Reference](./COMMANDS.md) on how to start
an analysis on your local machine.
See [Start an analysis](./COMMANDS.md#start-an-analysis) in the [Commands Reference](./COMMANDS.md) on how to start an analysis on your local machine.

## 🏗 Pipeline and Tools

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
//Cyclic Dependencies between Artifacts as unwinded List

MATCH (package:Package)-[:CONTAINS]->(forwardSource:Type)-[:DEPENDS_ON]->(forwardTarget:Type)<-[:CONTAINS]-(dependentPackage:Package)
MATCH (dependentPackage)-[:CONTAINS]->(backwardSource:Type)-[:DEPENDS_ON]->(backwardTarget:Type)<-[:CONTAINS]-(package)
MATCH (artifact:Artifact)-[:CONTAINS]->(package)
MATCH (dependentArtifact:Artifact)-[:CONTAINS]->(dependentPackage)
WITH artifact
,dependentArtifact
,package
,dependentPackage
,collect(DISTINCT forwardSource.name + '->' + forwardTarget.name) AS forwardDependencies
,collect(DISTINCT backwardTarget.name + '<-' + backwardSource.name) AS backwardDependencies
WITH artifact
,dependentArtifact
,package
,dependentPackage
,forwardDependencies
,backwardDependencies
,size(forwardDependencies) AS numberOfForwardDependencies
,size(backwardDependencies) AS numberOfBackwardDependencies
,size(forwardDependencies) + size(backwardDependencies) AS numberOfAllCyclicDependencies
WHERE artifact <> dependentArtifact
AND package <> dependentPackage
AND (size(forwardDependencies) > size(backwardDependencies)
OR (size(forwardDependencies) = size(backwardDependencies)
AND size(package.fqn) >= size(dependentPackage.fqn)))
UNWIND (backwardDependencies + forwardDependencies) AS dependency
RETURN artifact.fileName AS artifactName
,dependentArtifact.fileName AS dependentArtifactName
,package.fqn AS packageName
,dependentPackage.fqn AS dependentPackageName
,dependency
,toFloat(ABS(numberOfForwardDependencies - numberOfBackwardDependencies)) / numberOfAllCyclicDependencies AS forwardToBackwardBalance
,numberOfForwardDependencies AS numberForward
,numberOfBackwardDependencies AS numberBackward
ORDER BY forwardToBackwardBalance DESC, packageName ASC
Binary file added images/DALL-E-Mini-Graph-Pipeline-Logo-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion jupyter/ExternalDependencies.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "2f0eabc4",
"metadata": {},
"source": [
"# External Dependencies of Java Artifacts with Neo4j\n",
"# External Dependencies\n",
"<br> \n",
"\n",
"### References\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "2f0eabc4",
"metadata": {},
"source": [
"# Package Dependencies for Java with Neo4j\n",
"# Internal Dependencies\n",
"<br> \n",
"\n",
"### References\n",
Expand Down
2 changes: 1 addition & 1 deletion jupyter/ObjectOrientedDesignMetrics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "2f0eabc4",
"metadata": {},
"source": [
"# Object Oriented Design Quality Metrics for Java with Neo4j\n",
"# Object Oriented Design Quality Metrics\n",
"<br> \n",
"\n",
"### References\n",
Expand Down
2 changes: 1 addition & 1 deletion jupyter/Overview.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "2f0eabc4",
"metadata": {},
"source": [
"# Overview of Java Artifacts with Neo4j\n",
"# Overview\n",
"<br> \n",
"\n",
"### References\n",
Expand Down
2 changes: 1 addition & 1 deletion jupyter/VisibilityMetrics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "2f0eabc4",
"metadata": {},
"source": [
"# Visibility Metrics for Java with Neo4j\n",
"# Visibility Metrics\n",
"<br> \n",
"\n",
"### References\n",
Expand Down
2 changes: 1 addition & 1 deletion jupyter/Wordcloud.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"id": "2f0eabc4",
"metadata": {},
"source": [
"# Overview of Java Artifacts with Neo4j\n",
"# Wordcloud\n",
"<br> \n",
"\n",
"### References\n",
Expand Down
9 changes: 4 additions & 5 deletions scripts/SCRIPTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@ Script | Directory | Description
| [CentralityCsv.sh](./reports/CentralityCsv.sh) | reports | Looks for centrality using the Graph Data Science Library of Neo4j and creates CSV reports. |
| [CommunityCsv.sh](./reports/CommunityCsv.sh) | reports | Detects communities using the Graph Data Science Library of Neo4j and creates CSV reports. |
| [DatabaseCsvExport.sh](./reports/DatabaseCsvExport.sh) | reports | Exports the whole graph database as a CSV file using the APOC procedure "apoc.export.csv.all" |
| [ExternalDependenciesCsv.sh](./reports/ExternalDependenciesCsv.sh) | reports | Executes "Package_Usage" Cypher queries to get the "package-dependencies" CSV reports. |
| [ExternalDependenciesCsv.sh](./reports/ExternalDependenciesCsv.sh) | reports | Executes "Package_Usage" Cypher queries to get the "external-dependencies-csv" CSV reports. |
| [ExternalDependenciesJupyter.sh](./reports/ExternalDependenciesJupyter.sh) | reports | Creates the "overview" report (ipynb, md, pdf) based on the Jupyter Notebook "Overview.ipynb". |
| [InternalDependenciesCsv.sh](./reports/InternalDependenciesCsv.sh) | reports | Executes "Package_Usage" Cypher queries to get the "internal-dependencies" CSV reports. |
| [InternalDependenciesJupyter.sh](./reports/InternalDependenciesJupyter.sh) | reports | Creates the "internal-dependencies" report (ipynb, md, pdf) based on the Jupyter Notebook "InternalDependencies.ipynb". |
| [ObjectOrientedDesignMetricsCsv.sh](./reports/ObjectOrientedDesignMetricsCsv.sh) | reports | Executes "Metrics" Cypher queries to get the "object-oriented-design-metrics" CSV reports. |
| [ObjectOrientedDesignMetricsJupyter.sh](./reports/ObjectOrientedDesignMetricsJupyter.sh) | reports | Creates the "object-oriented-design-metrics" report (ipynb, md, pdf) based on the Jupyter Notebook "ObjectOrientedDesignMetrics.ipynb". |
| [OverviewCsv.sh](./reports/OverviewCsv.sh) | reports | Executes "Package_Usage" Cypher queries to get the "package-dependencies" CSV reports. |
| [OverviewCsv.sh](./reports/OverviewCsv.sh) | reports | Executes "Overview" Cypher queries to get the "overview-csv" CSV reports. |
| [OverviewJupyter.sh](./reports/OverviewJupyter.sh) | reports | Creates the "overview" report (ipynb, md, pdf) based on the Jupyter Notebook "Overview.ipynb". |
| [PackageDependenciesCsv.sh](./reports/PackageDependenciesCsv.sh) | reports | Executes "Package_Usage" Cypher queries to get the "package-dependencies" CSV reports. |
| [PackageDependenciesJupyter.sh](./reports/PackageDependenciesJupyter.sh) | reports | Creates the "package-dependencies" report (ipynb, md, pdf) based on the Jupyter Notebook "PackageDependencies.ipynb". |
| [SimilarityCsv.sh](./reports/SimilarityCsv.sh) | reports | Looks for similarity using the Graph Data Science Library of Neo4j and creates CSV reports. |
| [VisibilityMetricsCsv.sh](./reports/VisibilityMetricsCsv.sh) | reports | Executes "Visibility" Cypher queries to get the "visibility-metrics" CSV reports. |
| [VisibilityMetricsJupyter.sh](./reports/VisibilityMetricsJupyter.sh) | reports | Creates the "visibility-metrics" report (ipynb, md, pdf) based on the Jupyter Notebook "VisibilityMetrics.ipynb". |
Expand All @@ -40,7 +40,6 @@ Script | Directory | Description
| [JupyterReports.sh](./reports/compilations/JupyterReports.sh) | compilations | Runs all Jupyter Notebook report scripts. |
| [resetAndScan.sh](./resetAndScan.sh) | | Deletes all data in the Neo4j graph database and rescans the downloaded artifacts to create a new graph. |
| [resetAndScanChanged.sh](./resetAndScanChanged.sh) | | Executes "resetAndScan.sh" only if "detectChangedArtifacts.sh" returns detected changes. |
| [resetAndScanMemgraph.sh](./resetAndScanMemgraph.sh) | | Deletes all data in the Neo4j graph database and rescans the downloaded artifacts to create a new graph. |
| [setupJQAssistant.sh](./setupJQAssistant.sh) | | Installs (download and unzip) jQAssistant (https://jqassistant.org/get-started). |
| [setupNeo4j.sh](./setupNeo4j.sh) | | Installs (download, unpack, get plugins, configure) a local Neo4j Graph Database (https://neo4j.com/download-center/#community). |
| [setupNeo4jInitialPassword.sh](./setupNeo4jInitialPassword.sh) | | Sets the initial password for the local Neo4j Graph Database (https://neo4j.com/download-center/#community). |
Expand Down
5 changes: 2 additions & 3 deletions scripts/reports/ExternalDependenciesCsv.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
#!/usr/bin/env bash

# Executes "Package_Usage" Cypher queries to get the "package-dependencies" CSV reports.
# It contains lists of e.g. incoming and outgoing package dependencies,
# abstractness, instability and the distance to the so called "main sequence".
# Executes "Package_Usage" Cypher queries to get the "external-dependencies-csv" CSV reports.
# They list external library package usage like how often a external package is called.

# Overrideable Constants (defaults also defined in sub scripts)
REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

# Executes "Package_Usage" Cypher queries to get the "package-dependencies" CSV reports.
# Executes "Package_Usage" Cypher queries to get the "internal-dependencies" CSV reports.
# It contains lists of e.g. incoming and outgoing package dependencies,
# abstractness, instability and the distance to the so called "main sequence".

Expand All @@ -12,21 +12,21 @@ REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"}
# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes.
# This way non-standard tools like readlink aren't needed.
REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR:-$( CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P )}
echo "PackageDependenciesCsv: REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR}"
echo "InternalDependenciesCsv: REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR}"

# Get the "scripts" directory by taking the path of this script and going one directory up.
SCRIPTS_DIR=${SCRIPTS_DIR:-"${REPORTS_SCRIPT_DIR}/.."}
echo "PackageDependenciesCsv SCRIPTS_DIR=${SCRIPTS_DIR}"
echo "InternalDependenciesCsv SCRIPTS_DIR=${SCRIPTS_DIR}"

# Get the "cypher" directory by taking the path of this script and going two directory up and then to "cypher".
CYPHER_DIR=${CYPHER_DIR:-"${REPORTS_SCRIPT_DIR}/../../cypher"}
echo "PackageDependenciesCsv CYPHER_DIR=${CYPHER_DIR}"
echo "InternalDependenciesCsv CYPHER_DIR=${CYPHER_DIR}"

# Define functions to execute cypher queries from within a given file
source "${SCRIPTS_DIR}/executeQueryFunctions.sh"

# Create report directory
REPORT_NAME="package-dependencies-csv"
REPORT_NAME="internal-dependencies-csv"
FULL_REPORT_DIRECTORY="${REPORTS_DIRECTORY}/${REPORT_NAME}"
mkdir -p "${FULL_REPORT_DIRECTORY}"

Expand All @@ -36,6 +36,7 @@ PACKAGE_USAGE_CYPHER_DIR="${CYPHER_DIR}/Package_Usage"

execute_cypher "${CYCLIC_DEPENDENCIES_CYPHER_DIR}/Cyclic_Dependencies_as_List.cypher" > "${FULL_REPORT_DIRECTORY}/CyclicDependencies.csv"
execute_cypher "${CYCLIC_DEPENDENCIES_CYPHER_DIR}/Cyclic_Dependencies_as_unwinded_List.cypher" > "${FULL_REPORT_DIRECTORY}/CyclicDependenciesUnwinded.csv"
execute_cypher "${CYCLIC_DEPENDENCIES_CYPHER_DIR}/Cyclic_Dependencies_between_Artrifacts_as_unwinded_List.cypher" > "${FULL_REPORT_DIRECTORY}/CyclicArtifactDependenciesUnwinded.csv"
execute_cypher "${CYPHER_DIR}/Candidates_for_Interface_Segregation.cypher" > "${FULL_REPORT_DIRECTORY}/InterfaceSegregationCandidates.csv"
execute_cypher "${PACKAGE_USAGE_CYPHER_DIR}/List_types_that_are_used_by_many_different_packages.cypher" > "${FULL_REPORT_DIRECTORY}/WidelyUsedTypes.csv"
execute_cypher "${PACKAGE_USAGE_CYPHER_DIR}/How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher" > "${FULL_REPORT_DIRECTORY}/ArtifactPackageUsage.csv"
Expand Down
Loading