Skip to content

Commit ac4b6f2

Browse files
authored
Merge pull request #147 from JohT/fix/assure-that-tsne-perplexity-is-lower-than-the-sample-size-for-small-graphs
Assure that t-SNE perplexity parameter is lower than the sample size for small graphs
2 parents 0f3b63c + b7e613c commit ac4b6f2

File tree

6 files changed

+68
-12
lines changed

6 files changed

+68
-12
lines changed

COMMANDS.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
- [Setup Neo4j Graph Database](#setup-neo4j-graph-database)
2323
- [Start Neo4j Graph Database](#start-neo4j-graph-database)
2424
- [Setup jQAssistant Java Code Analyzer](#setup-jqassistant-java-code-analyzer)
25-
- [Download Maven Artifacts to Analyze](#download-maven-artifacts-to-analyze)
25+
- [Download Maven Artifacts to analyze](#download-maven-artifacts-to-analyze)
2626
- [Reset the database and scan the java artifacts](#reset-the-database-and-scan-the-java-artifacts)
2727
- [Database Queries](#database-queries)
2828
- [Cypher Shell](#cypher-shell)
@@ -52,7 +52,8 @@ To run all analysis steps simple execute the following command:
5252
./../../scripts/analysis/analyze.sh
5353
```
5454

55-
👉 See [scripts/examples/analyzeAxonFramework.sh](./scripts/examples/analyzeAxonFramework.sh) as an example script that combines all the above steps.
55+
👉 See [scripts/examples/analyzeAxonFramework.sh](./scripts/examples/analyzeAxonFramework.sh) as an example script that combines all the above steps for a Java Project.
56+
👉 See [scripts/examples/analyzeReactRouter.sh](./scripts/examples/analyzeReactRouter.sh) as an example script that combines all the above steps for a Typescript Project.
5657
👉 See [Code Structure Analysis Pipeline](./.github/workflows/java-code-analysis.yml) on how to do this within a GitHub Actions Workflow.
5758

5859
### Command Line Options

GETTING_STARTED.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,5 +80,6 @@ Please read through the [Prerequisites](./README.md#hammer_and_wrench-prerequisi
8080

8181
Then open your browser and login to your [local Neo4j Web UI](http://localhost:7474/browser) with "neo4j" as user and the initial password you've chosen.
8282
83-
👉 See [scripts/examples/analyzeAxonFramework.sh](./scripts/examples/analyzeAxonFramework.sh) as an example script that combines all the above steps.
83+
👉 See [scripts/examples/analyzeAxonFramework.sh](./scripts/examples/analyzeAxonFramework.sh) as an example script that combines all the above steps for a Java Project.
84+
👉 See [scripts/examples/analyzeReactRouter.sh](./scripts/examples/analyzeReactRouter.sh) as an example script that combines all the above steps for a Typescript Project.
8485
👉 See [Code Structure Analysis Pipeline](./.github/workflows/java-code-analysis.yml) on how to do this within a GitHub Actions Workflow.

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,13 @@ Contained within this repository is a comprehensive and automated code graph ana
2626

2727
Here is an overview of reports made with [Jupyter Notebooks](https://jupyter.org). For a detailed reference see [Jupyter Notebook Report Reference](#page_with_curl-jupyter-notebook-report-reference
2828

29-
- [External Dependencies](./results/AxonFramework-4.9.3/external-dependencies/ExternalDependencies.md) contains detailed information about external library usage ([Notebook](./jupyter/ExternalDependenciesJava.ipynb)).
30-
- [Internal Dependencies](./results/AxonFramework-4.9.3/internal-dependencies/InternalDependencies.md) is based on [Analyze java package metrics in a graph database](https://joht.github.io/johtizen/data/2023/04/21/java-package-metrics-analysis.html) and also includes cyclic dependencies ([Notebook](./jupyter/InternalDependenciesJava.ipynb)).
31-
- [Method Metrics](./results/AxonFramework-4.9.3/method-metrics/MethodMetrics.ipynb) shows how the effective number of lines of code and the cyclomatic complexity are distributed across the methods in the code ([Notebook](./jupyter/MethodMetricsJava.ipynb)).
32-
- [Node Embeddings](./results/AxonFramework-4.9.3/node-embeddings/NodeEmbeddings.md) shows how to generate node embeddings and to further reduce their dimensionality to be able to visualize them in a 2D plot ([Notebook](./jupyter/NodeEmbeddingsJava.ipynb)).
33-
- [Object Oriented Design Quality Metrics](./results/AxonFramework-4.9.3/object-oriented-design-metrics/ObjectOrientedDesignMetrics.md) is based on [OO Design Quality Metrics by Robert Martin](https://api.semanticscholar.org/CorpusID:18246616) ([Notebook](./jupyter/ObjectOrientedDesignMetricsJava.ipynb)).
34-
- [Overview](./results/AxonFramework-4.9.3/overview/Overview.md) contains overall statistics and details about methods and their complexity. ([Notebook](./jupyter/OverviewJava.ipynb)).
35-
- [Visibility Metrics](./results/AxonFramework-4.9.3/visibility-metrics/VisibilityMetrics.md) ([Notebook](./jupyter/VisibilityMetricsJava.ipynb)).
29+
- [External Dependencies](./results/AxonFramework-4.9.3/external-dependencies-java/ExternalDependenciesJava.md) contains detailed information about external library usage ([Notebook](./jupyter/ExternalDependenciesJava.ipynb)).
30+
- [Internal Dependencies](./results/AxonFramework-4.9.3/internal-dependencies-java/InternalDependenciesJava.md) is based on [Analyze java package metrics in a graph database](https://joht.github.io/johtizen/data/2023/04/21/java-package-metrics-analysis.html) and also includes cyclic dependencies ([Notebook](./jupyter/InternalDependenciesJava.ipynb)).
31+
- [Method Metrics](./results/AxonFramework-4.9.3/method-metrics-java/MethodMetricsJava.md) shows how the effective number of lines of code and the cyclomatic complexity are distributed across the methods in the code ([Notebook](./jupyter/MethodMetricsJava.ipynb)).
32+
- [Node Embeddings](./results/AxonFramework-4.9.3/node-embeddings-java/NodeEmbeddingsJava.md) shows how to generate node embeddings and to further reduce their dimensionality to be able to visualize them in a 2D plot ([Notebook](./jupyter/NodeEmbeddingsJava.ipynb)).
33+
- [Object Oriented Design Quality Metrics](./results/AxonFramework-4.9.3/object-oriented-design-metrics-java/ObjectOrientedDesignMetricsJava.md) is based on [OO Design Quality Metrics by Robert Martin](https://api.semanticscholar.org/CorpusID:18246616) ([Notebook](./jupyter/ObjectOrientedDesignMetricsJava.ipynb)).
34+
- [Overview](./results/AxonFramework-4.9.3/overview-java/OverviewJava.md) contains overall statistics and details about methods and their complexity. ([Notebook](./jupyter/OverviewJava.ipynb)).
35+
- [Visibility Metrics](./results/AxonFramework-4.9.3/visibility-metrics-java/VisibilityMetricsJava.md) ([Notebook](./jupyter/VisibilityMetricsJava.ipynb)).
3636
- [Wordcloud](./results/AxonFramework-4.9.3/wordcloud/Wordcloud.md) contains a visual representation of package and class names ([Notebook](./jupyter/Wordcloud.ipynb)).
3737

3838
### :book: Graph Data Science Reports

jupyter/NodeEmbeddingsJava.ipynb

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,9 +258,16 @@
258258
" # See https://bobbyhadz.com/blog/python-attributeerror-list-object-has-no-attribute-shape\n",
259259
" embeddings_as_numpy_array = np.array(embeddings.embedding.to_list())\n",
260260
"\n",
261+
" # The parameter \"perplexity\" needs to be smaller than the sample size\n",
262+
" # See https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html\n",
263+
" number_of_nodes=embeddings.shape[0]\n",
264+
" perplexity = min(number_of_nodes - 1.0, 30.0)\n",
265+
" print(\"t-SNE: Sample size (Number of nodes)={size}\".format(size = number_of_nodes))\n",
266+
" print(\"t-SNE: perplexity={perplexity}\".format(perplexity=perplexity))\n",
267+
"\n",
261268
" # Use t-distributed stochastic neighbor embedding (t-SNE) to reduce the dimensionality \n",
262269
" # of the previously calculated node embeddings to 2 dimensions for visualization\n",
263-
" t_distributed_stochastic_neighbor_embedding = TSNE(n_components=2, verbose=1, random_state=50)\n",
270+
" t_distributed_stochastic_neighbor_embedding = TSNE(n_components=2, perplexity=perplexity, verbose=1, random_state=50)\n",
264271
" two_dimension_node_embeddings = t_distributed_stochastic_neighbor_embedding.fit_transform(embeddings_as_numpy_array)\n",
265272
" display(two_dimension_node_embeddings.shape) # Display the shape of the t-SNE result\n",
266273
"\n",

jupyter/NodeEmbeddingsTypescript.ipynb

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,9 +258,16 @@
258258
" # See https://bobbyhadz.com/blog/python-attributeerror-list-object-has-no-attribute-shape\n",
259259
" embeddings_as_numpy_array = np.array(embeddings.embedding.to_list())\n",
260260
"\n",
261+
" # The parameter \"perplexity\" needs to be smaller than the sample size\n",
262+
" # See https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html\n",
263+
" number_of_nodes=embeddings.shape[0]\n",
264+
" perplexity = min(number_of_nodes - 1.0, 30.0)\n",
265+
" print(\"t-SNE: Sample size (Number of nodes)={size}\".format(size = number_of_nodes))\n",
266+
" print(\"t-SNE: perplexity={perplexity}\".format(perplexity=perplexity))\n",
267+
"\n",
261268
" # Use t-distributed stochastic neighbor embedding (t-SNE) to reduce the dimensionality \n",
262269
" # of the previously calculated node embeddings to 2 dimensions for visualization\n",
263-
" t_distributed_stochastic_neighbor_embedding = TSNE(n_components=2, verbose=1, random_state=50)\n",
270+
" t_distributed_stochastic_neighbor_embedding = TSNE(n_components=2, perplexity=perplexity, verbose=1, random_state=50)\n",
264271
" two_dimension_node_embeddings = t_distributed_stochastic_neighbor_embedding.fit_transform(embeddings_as_numpy_array)\n",
265272
" display(two_dimension_node_embeddings.shape) # Display the shape of the t-SNE result\n",
266273
"\n",
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
#!/usr/bin/env bash
2+
3+
# This is an example for the analysis of a the Typescript project "react-router".
4+
# It includes the creation of the temporary directory, the working directory, the artifacts download and the analysis itself.
5+
6+
# Note: The first (and only) parameter is the version of "react-router" to analyze.
7+
# Note: This script is meant to be started in the root directory of this repository.
8+
9+
# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands)
10+
set -o errexit -o pipefail
11+
12+
# Read the first input argument containing the version of the artifacts
13+
if [ "$#" -ne 1 ]; then
14+
echo "analyzerReactRouter Error: Usage: $0 <version>" >&2
15+
exit 1
16+
fi
17+
projectVersion=$1
18+
19+
# Check if environment variable is set
20+
if [ -z "${NEO4J_INITIAL_PASSWORD}" ]; then
21+
echo "analyzerReactRouter: Error: Requires environment variable NEO4J_INITIAL_PASSWORD to be set first. Use 'export NEO4J_INITIAL_PASSWORD=<your-own-password>'."
22+
exit 1
23+
fi
24+
25+
# Create the temporary directory for all analysis projects.
26+
mkdir -p ./temp
27+
cd ./temp
28+
29+
# Create the working directory for this specific analysis.
30+
mkdir -p "./react-router-${projectVersion}"
31+
cd "./react-router-${projectVersion}"
32+
33+
# Create the artifacts directory that will contain the code to be analyzed.
34+
mkdir -p ./artifacts
35+
36+
# Download AxonFramework artifacts (jar files) from Maven
37+
./../../scripts/downloader/downloadReactRouter.sh "${projectVersion}"
38+
39+
# Start the analysis
40+
./../../scripts/analysis/analyze.sh

0 commit comments

Comments
 (0)