Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,7 @@ coverage/

# Jupyter Notebook
.ipynb_checkpoints
*.nbconvert*
*.nbconvert*

# Python environments
.conda
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ The [Code Structure Analysis Pipeline](./.github/workflows/java-code-analysis.ym
- [pip](https://pip.pypa.io/en/stable)
- [monotonic](https://github.com/atdt/monotonic)
- [Neo4j Python Driver](https://neo4j.com/docs/api/python-driver)
- [sklearn](https://scikit-learn.org)
- [openTSNE](https://github.com/pavlin-policar/openTSNE)
- [wordcloud](https://github.com/amueller/word_cloud)
- [Graph Visualization](./graph-visualization/README.md) uses [node.js](https://nodejs.org/de) and the dependencies listed in [package.json](./graph-visualization/package.json).

Expand Down
27 changes: 11 additions & 16 deletions jupyter/NodeEmbeddingsJava.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
"import matplotlib.pyplot as plot\n",
"import typing as typ\n",
"import numpy as np\n",
"from sklearn.manifold import TSNE\n",
"from openTSNE.sklearn import TSNE\n",
"from neo4j import GraphDatabase"
]
},
Expand All @@ -69,9 +69,9 @@
"metadata": {},
"outputs": [],
"source": [
"import sklearn\n",
"print('The scikit-learn version is {}.'.format(sklearn.__version__))\n",
"print('The pandas version is {}.'.format(pd.__version__))\n"
"from openTSNE import __version__ as openTSNE_version\n",
"print('The openTSNE version is: {}'.format(openTSNE_version))\n",
"print('The pandas version is: {}'.format(pd.__version__))\n"
]
},
{
Expand Down Expand Up @@ -231,7 +231,7 @@
"\n",
"> It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data.\n",
"\n",
"(see https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE)"
"(see https://opentsne.readthedocs.io)"
]
},
{
Expand All @@ -245,7 +245,7 @@
" \"\"\"\n",
" Reduces the dimensionality of the node embeddings (e.g. 64 floating point numbers in an array)\n",
" to two dimensions for 2D visualization.\n",
" see https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE\n",
" see https://opentsne.readthedocs.io\n",
" \"\"\"\n",
"\n",
" if embeddings.empty: \n",
Expand All @@ -258,16 +258,9 @@
" # See https://bobbyhadz.com/blog/python-attributeerror-list-object-has-no-attribute-shape\n",
" embeddings_as_numpy_array = np.array(embeddings.embedding.to_list())\n",
"\n",
" # The parameter \"perplexity\" needs to be smaller than the sample size\n",
" # See https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html\n",
" number_of_nodes=embeddings.shape[0]\n",
" perplexity = min(number_of_nodes - 1.0, 30.0)\n",
" print(\"t-SNE: Sample size (Number of nodes)={size}\".format(size = number_of_nodes))\n",
" print(\"t-SNE: perplexity={perplexity}\".format(perplexity=perplexity))\n",
"\n",
" # Use t-distributed stochastic neighbor embedding (t-SNE) to reduce the dimensionality \n",
" # of the previously calculated node embeddings to 2 dimensions for visualization\n",
" t_distributed_stochastic_neighbor_embedding = TSNE(n_components=2, perplexity=perplexity, verbose=1, random_state=50)\n",
" t_distributed_stochastic_neighbor_embedding = TSNE(n_components=2, verbose=1, random_state=47)\n",
" two_dimension_node_embeddings = t_distributed_stochastic_neighbor_embedding.fit_transform(embeddings_as_numpy_array)\n",
" display(two_dimension_node_embeddings.shape) # Display the shape of the t-SNE result\n",
"\n",
Expand Down Expand Up @@ -365,7 +358,9 @@
"source": [
"### 1.1 Generate Node Embeddings using Fast Random Projection (Fast RP) for Java Packages\n",
"\n",
"[Fast Random Projection](https://neo4j.com/docs/graph-data-science/current/machine-learning/node-embeddings/fastrp) is used to reduce the dimensionality of the node feature space while preserving most of the distance information. Nodes with similar neighborhood result in node embedding with similar vectors."
"[Fast Random Projection](https://neo4j.com/docs/graph-data-science/current/machine-learning/node-embeddings/fastrp) is used to reduce the dimensionality of the node feature space while preserving most of the distance information. Nodes with similar neighborhood result in node embedding with similar vectors.\n",
"\n",
"**👉Hint:** To skip existing node embeddings and always calculate them based on the parameters below edit `Node_Embeddings_0a_Query_Calculated` so that it won't return any results."
]
},
{
Expand Down Expand Up @@ -511,7 +506,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.11.9"
},
"title": "Object Oriented Design Quality Metrics for Java with Neo4j"
},
Expand Down
25 changes: 10 additions & 15 deletions jupyter/NodeEmbeddingsTypescript.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
"import matplotlib.pyplot as plot\n",
"import typing as typ\n",
"import numpy as np\n",
"from sklearn.manifold import TSNE\n",
"from openTSNE.sklearn import TSNE\n",
"from neo4j import GraphDatabase"
]
},
Expand All @@ -69,8 +69,8 @@
"metadata": {},
"outputs": [],
"source": [
"import sklearn\n",
"print('The scikit-learn version is {}.'.format(sklearn.__version__))\n",
"from openTSNE import __version__ as openTSNE_version\n",
"print('The openTSNE version is: {}'.format(openTSNE_version))\n",
"print('The pandas version is {}.'.format(pd.__version__))\n"
]
},
Expand Down Expand Up @@ -231,7 +231,7 @@
"\n",
"> It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data.\n",
"\n",
"(see https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE)"
"(see https://opentsne.readthedocs.io)"
]
},
{
Expand All @@ -245,7 +245,7 @@
" \"\"\"\n",
" Reduces the dimensionality of the node embeddings (e.g. 32 floating point numbers in an array)\n",
" to two dimensions for 2D visualization.\n",
" see https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE\n",
" see https://opentsne.readthedocs.io\n",
" \"\"\"\n",
"\n",
" if embeddings.empty: \n",
Expand All @@ -258,16 +258,9 @@
" # See https://bobbyhadz.com/blog/python-attributeerror-list-object-has-no-attribute-shape\n",
" embeddings_as_numpy_array = np.array(embeddings.embedding.to_list())\n",
"\n",
" # The parameter \"perplexity\" needs to be smaller than the sample size\n",
" # See https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html\n",
" number_of_nodes=embeddings.shape[0]\n",
" perplexity = min(number_of_nodes - 1.0, 30.0)\n",
" print(\"t-SNE: Sample size (Number of nodes)={size}\".format(size = number_of_nodes))\n",
" print(\"t-SNE: perplexity={perplexity}\".format(perplexity=perplexity))\n",
"\n",
" # Use t-distributed stochastic neighbor embedding (t-SNE) to reduce the dimensionality \n",
" # of the previously calculated node embeddings to 2 dimensions for visualization\n",
" t_distributed_stochastic_neighbor_embedding = TSNE(n_components=2, perplexity=perplexity, verbose=1, random_state=50)\n",
" t_distributed_stochastic_neighbor_embedding = TSNE(n_components=2, verbose=1, random_state=47)\n",
" two_dimension_node_embeddings = t_distributed_stochastic_neighbor_embedding.fit_transform(embeddings_as_numpy_array)\n",
" display(two_dimension_node_embeddings.shape) # Display the shape of the t-SNE result\n",
"\n",
Expand Down Expand Up @@ -365,7 +358,9 @@
"source": [
"### 1.1 Generate Node Embeddings for Typescript Modules using Fast Random Projection (Fast RP)\n",
"\n",
"[Fast Random Projection](https://neo4j.com/docs/graph-data-science/current/machine-learning/node-embeddings/fastrp) is used to reduce the dimensionality of the node feature space while preserving most of the distance information. Nodes with similar neighborhood result in node embedding with similar vectors."
"[Fast Random Projection](https://neo4j.com/docs/graph-data-science/current/machine-learning/node-embeddings/fastrp) is used to reduce the dimensionality of the node feature space while preserving most of the distance information. Nodes with similar neighborhood result in node embedding with similar vectors.\n",
"\n",
"**👉 Hint:** To skip existing node embeddings and always calculate them based on the parameters below edit `Node_Embeddings_0a_Query_Calculated` so that it won't return any results."
]
},
{
Expand Down Expand Up @@ -514,7 +509,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.11.9"
},
"title": "Object Oriented Design Quality Metrics for Java with Neo4j"
},
Expand Down
2 changes: 1 addition & 1 deletion jupyter/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ dependencies:
- numpy=1.23.*
- pandas=1.5.*
- pip=22.3.*
- scikit-learn=1.3.* # NodeEmbeddings.ipynb uses sklearn.manifold.TSNE
- opentsne=1.0.* # to visualize node embeddings in 2D (t-SNE dimensionality reduction)
- pip:
- monotonic==1.*
- wordcloud==1.9.*
Expand Down