Skip to content

Commit 3b3a4de

Browse files
authored
Merge pull request #144 from JohT/feature/optional-data-validation-before-jupyter-execution
Add optional data validation before Jupyter notebook execution.
2 parents b9982d2 + 6223dc3 commit 3b3a4de

35 files changed

+1742
-501
lines changed
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: Check links in documentation
2+
3+
on:
4+
pull_request:
5+
branches:
6+
- main
7+
# Only watch root level Markdown documentation file changes
8+
paths:
9+
- 'README.md'
10+
- 'COMMANDS.md'
11+
- 'GETTING_STARTED.md'
12+
- '.github/workflows/check-links-in-documentation.yml' # also run when this file was changed
13+
schedule:
14+
- cron: "15 6 1 * *" # On the first day of each month at 6:15 o'clock
15+
16+
jobs:
17+
reports:
18+
runs-on: ubuntu-latest
19+
steps:
20+
- name: Checkout GIT Repository
21+
uses: actions/checkout@v4
22+
23+
- name: Setup node.js
24+
uses: actions/setup-node@v4
25+
with:
26+
node-version-file: '.nvmrc'
27+
28+
- name: Check links in top level documentation Markdown files
29+
run: npx --yes markdown-link-check --config=markdown-lint-check-config.json README.md COMMANDS.md GETTING_STARTED.md

.github/workflows/java-code-analysis.yml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ jobs:
4545
java: 17
4646
python: 3.11
4747
mambaforge: 24.3.0-0
48-
node: 18
4948

5049
env:
5150
CI_COMMIT_MESSAGE: Automated code structure analysis reports (CI)
@@ -66,12 +65,12 @@ jobs:
6665
distribution: 'adopt'
6766
java-version: ${{ matrix.java }}
6867

69-
- name: Setup node.js ${{ matrix.node }} for Graph Visualization
68+
- name: Setup Node.js for Graph Visualization
7069
uses: actions/setup-node@v4
7170
with:
72-
node-version: ${{ matrix.node }}
71+
node-version-file: 'graph-visualization/.nvmrc'
7372

74-
- name: Install nodes packages for Graph Visualization
73+
- name: Install Node packages for Graph Visualization
7574
working-directory: graph-visualization
7675
run: npm ci
7776

.github/workflows/typescript-code-analysis.yml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ jobs:
4545
java: 17
4646
python: 3.11
4747
mambaforge: 24.3.0-0
48-
node: 18
4948

5049
env:
5150
CI_COMMIT_MESSAGE: Automated code structure analysis reports (CI)
@@ -66,12 +65,12 @@ jobs:
6665
distribution: 'adopt'
6766
java-version: ${{ matrix.java }}
6867

69-
- name: Setup node.js ${{ matrix.node }} for Graph Visualization
68+
- name: Setup Node.js for Graph Visualization
7069
uses: actions/setup-node@v4
7170
with:
72-
node-version: ${{ matrix.node }}
71+
node-version-file: 'graph-visualization/.nvmrc'
7372

74-
- name: Install nodes packages for Graph Visualization
73+
- name: Install Node packages for Graph Visualization
7574
working-directory: graph-visualization
7675
run: npm ci
7776

.nvmrc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
v20.12.1

COMMANDS.md

Lines changed: 106 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,53 @@
11
# Code Graph Analysis Pipeline - Commands
22

3-
## Start an analysis
3+
<!-- TOC -->
4+
5+
- [Start an Analysis](#start-an-analysis)
6+
- [Command Line Options](#command-line-options)
7+
- [Notes](#notes)
8+
- [Examples](#examples)
9+
- [Start an analysis with CSV reports only](#start-an-analysis-with-csv-reports-only)
10+
- [Start an analysis with Jupyter reports only](#start-an-analysis-with-jupyter-reports-only)
11+
- [Start an analysis with PDF generation](#start-an-analysis-with-pdf-generation)
12+
- [Only run setup and explore the Graph manually](#only-run-setup-and-explore-the-graph-manually)
13+
- [Generate Markdown References](#generate-markdown-references)
14+
- [Generate Cypher Reference](#generate-cypher-reference)
15+
- [Generate Script Reference](#generate-script-reference)
16+
- [Generate CSV Cypher Query Report Reference](#generate-csv-cypher-query-report-reference)
17+
- [Generate Jupyter Notebook Report Reference](#generate-jupyter-notebook-report-reference)
18+
- [Generate Image Reference](#generate-image-reference)
19+
- [Generate Environment Variable Reference](#generate-environment-variable-reference)
20+
- [Validate Links in Markdown](#validate-links-in-markdown)
21+
- [Manual Setup](#manual-setup)
22+
- [Setup Neo4j Graph Database](#setup-neo4j-graph-database)
23+
- [Start Neo4j Graph Database](#start-neo4j-graph-database)
24+
- [Setup jQAssistant Java Code Analyzer](#setup-jqassistant-java-code-analyzer)
25+
- [Download Maven Artifacts to Analyze](#download-maven-artifacts-to-analyze)
26+
- [Reset the database and scan the java artifacts](#reset-the-database-and-scan-the-java-artifacts)
27+
- [Database Queries](#database-queries)
28+
- [Cypher Shell](#cypher-shell)
29+
- [HTTP API](#http-api)
30+
- [executeQueryFunctions.sh](#executequeryfunctionssh)
31+
- [Stop Neo4j](#stop-neo4j)
32+
- [Jupyter Notebook](#jupyter-notebook)
33+
- [Create a report with executeJupyterNotebookReport.sh](#create-a-report-with-executejupyternotebookreportsh)
34+
- [Data Availability Validation](#data-availability-validation)
35+
- [Execute a Notebook with executeJupyterNotebook.sh](#execute-a-notebook-with-executejupyternotebooksh)
36+
- [Manually setup the environment using Conda](#manually-setup-the-environment-using-conda)
37+
- [Executing Jupyter Notebooks with nbconvert](#executing-jupyter-notebooks-with-nbconvert)
38+
- [References](#references)
39+
- [Other Commands](#other-commands)
40+
- [Information about a process that listens to a specific local port](#information-about-a-process-that-listens-to-a-specific-local-port)
41+
- [Kill process that listens to a specific local port](#kill-process-that-listens-to-a-specific-local-port)
42+
- [Memory Estimation](#memory-estimation)
43+
44+
<!-- /TOC -->
45+
46+
## Start an Analysis
447

548
An analysis is started with the script [analyze.sh](./scripts/analysis/analyze.sh).
649
To run all analysis steps simple execute the following command:
50+
751
```shell
852
./../../scripts/analysis/analyze.sh
953
```
@@ -55,7 +99,7 @@ Note: Generating a PDF from a Jupyter notebook using [nbconvert](https://nbconve
5599
ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION=true ./../../scripts/analysis/analyze.sh
56100
```
57101

58-
#### Setup everything to explore the graph manually
102+
#### Only run setup and explore the Graph manually
59103

60104
To prepare everything for analysis including installation, configuration and preparation queries to explore the graph manually
61105
without report generation use this command:
@@ -123,6 +167,14 @@ Change into the [scripts](./scripts/) directory e.g. with `cd scripts` and then
123167
./documentation/generateEnvironmentVariableReference.sh
124168
```
125169

170+
## Validate Links in Markdown
171+
172+
The following command shows how to use [markdown-link-check](https://github.com/tcort/markdown-link-check) to for example check the links in the [README.md](./README.md) file:
173+
174+
```script
175+
npx --yes markdown-link-check --quiet --progress --config=markdown-lint-check-config.json README.md COMMANDS.md GETTING_STARTED.md
176+
```
177+
126178
## Manual Setup
127179

128180
The manual setup is only documented for completeness. It isn't needed since the analysis also covers download, installation and configuration of all needed tools.
@@ -141,7 +193,7 @@ It runs the script with a temporary `NEO4J_HOME` environment variable to not int
141193

142194
### Setup jQAssistant Java Code Analyzer
143195

144-
Use [setupJQAssistant.sh](./scripts/setupJQAssistant.sh) to download [jQAssistant](https://jqassistant.org/get-started).
196+
Use [setupJQAssistant.sh](./scripts/setupJQAssistant.sh) to download [jQAssistant](https://jqassistant.github.io/jqassistant/doc).
145197

146198
### Download Maven Artifacts to analyze
147199

@@ -200,7 +252,7 @@ Query parameters can be added as arguments after the file name. Here is an examp
200252
./scripts/executeQuery.sh ./cypher/Get_Graph_Data_Science_Library_Version.cypher a=1
201253
```
202254

203-
### executeQueryFunctions
255+
### executeQueryFunctions.sh
204256

205257
The script [executeQueryFunctions.sh](./scripts/executeQueryFunctions.sh) contains functions to simplify the
206258
call of [executeQuery.sh](./scripts/executeQuery.sh) for different purposes. For example, `execute_cypher_summarized`
@@ -221,7 +273,41 @@ Use [stopNeo4j.sh](./scripts/stopNeo4j.sh) to stop the locally running Neo4j Gra
221273

222274
## Jupyter Notebook
223275

224-
### Commands
276+
### Create a report with executeJupyterNotebookReport.sh
277+
278+
The script [executeJupyterNotebookReport.sh](./scripts/executeJupyterNotebookReport.sh) combines:
279+
280+
- creating a directory within the "reports" directory
281+
- data availability validation using [executeQueryFunctions.sh](#executequeryfunctionssh)
282+
- executing and converting the given Notebook using [executeJupyterNotebook.sh](#execute-a-notebook-with-executejupyternotebooksh)
283+
284+
Here is an example on how to run the report [Wordcloud.ipynb](./jupyter/Wordcloud.ipynb):
285+
286+
```shell
287+
./scripts/executeJupyterNotebookReport.sh --jupyterNotebook Wordcloud.ipynb
288+
```
289+
290+
#### Data Availability Validation
291+
292+
[Jupyter Notebooks](https://jupyter.org) can have additional custom tags within their [metadata section](https://ipython.readthedocs.io/en/3.x/notebook/nbformat.html#metadata). Opening these files with a text editor unveils that typically at the end of the file. Some editors also support editing them directly. Here, the optional metadata property `code_graph_analysis_pipeline_data_validation` is used to specify which data validation query in the [cypher/Validation](./cypher/Validation/) directory should be used. Without this property, the data validation step is skipped. If a validation is specified, it will be executed before the Jupyter Notebook is executed. If the query has at least one result, the validation is seen as successful. Otherwise, the Jupyter Notebook will not be executed.
293+
294+
This is helpful for Jupyter Notebook reports that are specific to a programming language or other specific data prerequisites. The Notebook will be skipped if there is no data available which would otherwise lead to confusing and distracting reports with empty tables and figures.
295+
296+
You can search the messages `Validation succeeded` or `Validation failed` inside the log to get detailed information which Notebook had been skipped for which reason.
297+
298+
### Execute a Notebook with executeJupyterNotebook.sh
299+
300+
[executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh) executes a Jupyter Notebook in the command line and convert it to different formats like Markdown and PDF (optionally). It takes care of [setting up the environment](#manually-setup-the-environment-using-conda) and [uses nbconvert](#executing-jupyter-notebooks-with-nbconvert) to execute the notebook and convert it to other file formats under the hood.
301+
302+
Here is an example on how to use [executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh) to for example run [Wordcloud.ipynb](./jupyter/Wordcloud.ipynb):
303+
304+
```shell
305+
./scripts/executeJupyterNotebook.sh ./jupyter/Wordcloud.ipynb
306+
```
307+
308+
### Manually setup the environment using Conda
309+
310+
[Conda](https://conda.io) provides package, dependency, and environment management for any language. Here, it is used to setup the environment for Juypter Notebooks.
225311

226312
- Setup environment
227313

@@ -230,10 +316,10 @@ Use [stopNeo4j.sh](./scripts/stopNeo4j.sh) to stop the locally running Neo4j Gra
230316
conda activate codegraph
231317
```
232318

233-
or by using the environment file [codegraph-environment.yml](./jupyter/codegraph-environment.yml):
319+
or by using the environment file [codegraph-environment.yml](./jupyter/environment.yml):
234320

235321
```shell
236-
conda env create --file ./jupyter/codegraph-environment.yml
322+
conda env create --file ./jupyter/environment.yml
237323
conda activate codegraph
238324
```
239325

@@ -246,9 +332,13 @@ Use [stopNeo4j.sh](./scripts/stopNeo4j.sh) to stop the locally running Neo4j Gra
246332
- Export only explicit environment.yml
247333

248334
```shell
249-
conda env export --from-history --name codegraph | grep -v "^prefix: " > codegraph-environment.yml
335+
conda env export --from-history --name codegraph | grep -v "^prefix: " > explicit-codegraph-environment.yml
250336
```
251337

338+
### Executing Jupyter Notebooks with nbconvert
339+
340+
[nbconvert](https://nbconvert.readthedocs.io) converts Jupyter Notebooks to other static formats including HTML, LaTeX, PDF, Markdown, reStructuredText, and more.
341+
252342
- Install pandoc used by nbconvert for LaTeX support (Mac)
253343

254344
```shell
@@ -273,23 +363,19 @@ Use [stopNeo4j.sh](./scripts/stopNeo4j.sh) to stop the locally running Neo4j Gra
273363
jupyter nbconvert --to pdf ./jupyter/first-neo4j-tryout.nbconvert.ipynb
274364
```
275365

276-
- Shell script to execute and convert a Jupyter notebook file
277-
278-
Use [executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh) like this:
279-
280-
```shell
281-
./scripts/executeJupyterNotebook.sh ./jupyter/first-neo4j-tryout.ipynb
282-
```
283-
284366
## References
285367

286-
- [Managing environments with Conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)
368+
- [Conda](https://conda.io)
369+
- [jQAssistant](https://jqassistant.github.io/jqassistant/doc)
370+
- [Jupyter Notebook](https://jupyter.org)
287371
- [Jupyter Notebook - Using as a command line tool](https://nbconvert.readthedocs.io/en/latest/usage.html)
288372
- [Jupyter Notebook - Installing TeX for PDF conversion](https://nbconvert.readthedocs.io/en/latest/install.html#installing-tex)
289-
- [Integrate Neo4j with Jupyter notebook](https://medium.com/@technologydata25/connect-neo4j-to-jupyter-notebook-c178f716d6d5)
373+
- [Jupyter Notebook Format - Metadata](https://ipython.readthedocs.io/en/3.x/notebook/nbformat.html#metadata)
374+
- [Integrate Neo4j with Jupyter Notebook](https://medium.com/@technologydata25/connect-neo4j-to-jupyter-notebook-c178f716d6d5)
290375
- [Hello World](https://nicolewhite.github.io/neo4j-jupyter/hello-world.html)
291-
- [py2neo](https://pypi.org/project/py2neo/)
292-
- [The Py2neo Handbook](https://py2neo.org/2021.1/)
376+
- [Managing environments with Conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)
377+
- [Neo4j - Download](https://neo4j.com/download-center)
378+
- [Neo4j - HTTP API](https://neo4j.com/docs/http-api/current/query)
293379
- [How to Use Conda With Github Actions](https://autobencoder.com/2020-08-24-conda-actions)
294380
- [Older database download link (neo4j community)](https://community.neo4j.com/t/older-database-download-link/43334/9)
295381

GETTING_STARTED.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ For more details on how the commands work in detail see [COMMANDS](./COMMANDS.md
66

77
## 🛠 Prerequisites
88

9-
Please read through the [Prerequisites](./README.md#🛠-prerequisites) in the [README](./README.md) file for what is required to run the scripts.
9+
Please read through the [Prerequisites](./README.md#hammer_and_wrench-prerequisites) in the [README](./README.md) file for what is required to run the scripts.
1010

1111
## Start an analysis
1212

@@ -44,7 +44,7 @@ Please read through the [Prerequisites](./README.md#🛠-prerequisites) in the [
4444
./../../scripts/downloader/downloadAxonFramework.sh <version>
4545
```
4646

47-
1. Optionally use a script to download artifacts from Maven ([details](#download-maven-artifacts-to-analyze))
47+
1. Optionally use a script to download artifacts from Maven ([details](./COMMANDS.md#download-maven-artifacts-to-analyze))
4848

4949
1. Start the analysis
5050

0 commit comments

Comments
 (0)