Skip to content

Commit 19a12f6

Browse files
committed
Document Jupyter Notebook execution and data validation
1 parent 8689f0c commit 19a12f6

File tree

2 files changed

+57
-16
lines changed

2 files changed

+57
-16
lines changed

COMMANDS.md

Lines changed: 51 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
An analysis is started with the script [analyze.sh](./scripts/analysis/analyze.sh).
66
To run all analysis steps simple execute the following command:
7+
78
```shell
89
./../../scripts/analysis/analyze.sh
910
```
@@ -55,7 +56,7 @@ Note: Generating a PDF from a Jupyter notebook using [nbconvert](https://nbconve
5556
ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION=true ./../../scripts/analysis/analyze.sh
5657
```
5758

58-
#### Setup everything to explore the graph manually
59+
#### Only run setup and explore the Graph manually
5960

6061
To prepare everything for analysis including installation, configuration and preparation queries to explore the graph manually
6162
without report generation use this command:
@@ -200,7 +201,7 @@ Query parameters can be added as arguments after the file name. Here is an examp
200201
./scripts/executeQuery.sh ./cypher/Get_Graph_Data_Science_Library_Version.cypher a=1
201202
```
202203

203-
### executeQueryFunctions
204+
### [executeQueryFunctions.sh](./scripts/executeQueryFunctions.sh)
204205

205206
The script [executeQueryFunctions.sh](./scripts/executeQueryFunctions.sh) contains functions to simplify the
206207
call of [executeQuery.sh](./scripts/executeQuery.sh) for different purposes. For example, `execute_cypher_summarized`
@@ -221,7 +222,41 @@ Use [stopNeo4j.sh](./scripts/stopNeo4j.sh) to stop the locally running Neo4j Gra
221222

222223
## Jupyter Notebook
223224

224-
### Commands
225+
### Create a report simplified with [executeJupyterNotebookReport.sh](./scripts/executeJupyterNotebookReport.sh)
226+
227+
The script [executeJupyterNotebookReport.sh](./scripts/executeJupyterNotebookReport.sh) combines:
228+
229+
- creating a directory within the "reports" directory
230+
- data availability validation using [executeQueryFunctions.sh](#executequeryfunctionssh)
231+
- executing and converting the given Notebook using [executeJupyterNotebook.sh](#execute-a-notebook-simplified-with-executejupyternotebooksh)
232+
233+
Here is an example on how to run the report [Wordcloud.ipynb](./jupyter/Wordcloud.ipynb):
234+
235+
```shell
236+
./scripts/executeJupyterNotebookReport.sh --jupyterNotebook Wordcloud.ipynb
237+
```
238+
239+
#### Data Availability Validation
240+
241+
[Jupyter Notebooks](https://jupyter.org) can have additional custom tags within their [metadata section](https://ipython.readthedocs.io/en/3.x/notebook/nbformat.html#metadata). Opening these files with a text editor unveils that typically at the end of the file. Some editors also support editing them directly. Here, the optional metadata property `code_graph_analysis_pipeline_data_validation` is used to specify which data validation query in the [cypher/Validation](./cypher/Validation/) directory should be used. Without this property, the data validation step is skipped. If a validation is specified, it will be executed before the Jupyter Notebook is executed. If the query has at least one result, the validation is seen as successful. Otherwise, the Jupyter Notebook will not be executed.
242+
243+
This is helpful for Jupyter Notebook reports that are specific to a programming language or other specific data prerequisites. The Notebook will be skipped if there is no data available which would otherwise lead to confusing and distracting reports with empty tables and figures.
244+
245+
You can search the messages `Validation succeeded` or `Validation failed` inside the log to get detailed information which Notebook had been skipped for which reason.
246+
247+
### Execute a Notebook simplified with [executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh)
248+
249+
[executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh) contains everything that is needed to execute a Jupyter Notebook in the command line and convert it to different formats like Markdown and PDF (optionally). It takes care of [setting up the environment](#manually-setup-the-environment-using-conda) and [uses nbconvert](#executing-jupyter-notebooks-with-nbconvert) to execute the notebook and convert it to other file formats under the hood.
250+
251+
Here is an example on how to use [executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh) to for example run [Wordcloud.ipynb](./jupyter/Wordcloud.ipynb):
252+
253+
```shell
254+
./scripts/executeJupyterNotebook.sh ./jupyter/Wordcloud.ipynb
255+
```
256+
257+
### Manually setup the environment using [Conda](https://conda.io)
258+
259+
[Conda](https://conda.io) provides package, dependency, and environment management for any language. Here, it is used to setup the environment for Juypter Notebooks.
225260

226261
- Setup environment
227262

@@ -249,6 +284,10 @@ Use [stopNeo4j.sh](./scripts/stopNeo4j.sh) to stop the locally running Neo4j Gra
249284
conda env export --from-history --name codegraph | grep -v "^prefix: " > codegraph-environment.yml
250285
```
251286

287+
### Executing Jupyter Notebooks with [nbconvert](https://nbconvert.readthedocs.io)
288+
289+
[nbconvert](https://nbconvert.readthedocs.io) converts Jupyter Notebooks to other static formats including HTML, LaTeX, PDF, Markdown, reStructuredText, and more.
290+
252291
- Install pandoc used by nbconvert for LaTeX support (Mac)
253292

254293
```shell
@@ -273,23 +312,21 @@ Use [stopNeo4j.sh](./scripts/stopNeo4j.sh) to stop the locally running Neo4j Gra
273312
jupyter nbconvert --to pdf ./jupyter/first-neo4j-tryout.nbconvert.ipynb
274313
```
275314

276-
- Shell script to execute and convert a Jupyter notebook file
277-
278-
Use [executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh) like this:
279-
280-
```shell
281-
./scripts/executeJupyterNotebook.sh ./jupyter/first-neo4j-tryout.ipynb
282-
```
283-
284315
## References
285316

286-
- [Managing environments with Conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)
317+
- [Conda](https://conda.io)
318+
- [jQAssistant](https://jqassistant.org/get-started)
319+
- [Jupyter Notebook](https://jupyter.org)
287320
- [Jupyter Notebook - Using as a command line tool](https://nbconvert.readthedocs.io/en/latest/usage.html)
288321
- [Jupyter Notebook - Installing TeX for PDF conversion](https://nbconvert.readthedocs.io/en/latest/install.html#installing-tex)
289-
- [Integrate Neo4j with Jupyter notebook](https://medium.com/@technologydata25/connect-neo4j-to-jupyter-notebook-c178f716d6d5)
322+
- [Jupyter Notebook Format - Metadata](https://ipython.readthedocs.io/en/3.x/notebook/nbformat.html#metadata)
323+
- [Integrate Neo4j with Jupyter Notebook](https://medium.com/@technologydata25/connect-neo4j-to-jupyter-notebook-c178f716d6d5)
290324
- [Hello World](https://nicolewhite.github.io/neo4j-jupyter/hello-world.html)
325+
- [Managing environments with Conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)
326+
- [Neo4j - Download](https://neo4j.com/download-center)
327+
- [Neo4j - HTTP API](https://neo4j.com/docs/http-api/current/query)
291328
- [py2neo](https://pypi.org/project/py2neo/)
292-
- [The Py2neo Handbook](https://py2neo.org/2021.1/)
329+
- [The Py2neo Handbook](https://py2neo.org/2021.1)
293330
- [How to Use Conda With Github Actions](https://autobencoder.com/2020-08-24-conda-actions)
294331
- [Older database download link (neo4j community)](https://community.neo4j.com/t/older-database-download-link/43334/9)
295332

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -179,10 +179,10 @@ The [Code Structure Analysis Pipeline](./.github/workflows/java-code-analysis.ym
179179
👉 Create a new artifacts download script in the [scripts/downloader](./scripts/downloader/) directory. Take for example [downloadAxonFramework.sh](./scripts/downloader/downloadAxonFramework.sh) as a reference.
180180
👉 Run the script separately before executing [analyze.sh](./scripts/analysis/analyze.sh) also in the [pipeline](./.github/workflows/java-code-analysis.yml).
181181

182-
- How can i trigger a full rescan of all artifacts?
182+
- How can i trigger a full re-scan of all artifacts?
183183
👉 Delete the file `artifactsChangeDetectionHash.txt` in the `artifacts` directory.
184184

185-
- How can PDF generation for Jupyter Notebooks be enabled (depends on chromium, takes more time)?
185+
- How can i enable PDF generation for Jupyter Notebooks (depends on chromium, takes more time)?
186186
👉 Set environment variable `ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION` to anything except an empty string. Example:
187187

188188
```shell
@@ -195,6 +195,10 @@ The [Code Structure Analysis Pipeline](./.github/workflows/java-code-analysis.ym
195195
ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION=true ./../../scripts/analysis/analyze.sh
196196
```
197197

198+
- Why are some Jupyter Notebook reports skipped?
199+
👉 The custom Jupyter Notebook metadata property `code_graph_analysis_pipeline_data_validation` can be set to choose a query from [cypher/Validation](./cypher/Validation) that will be executed preliminary to the notebook. If the query leads to at least one result, the validation succeeds and the notebook will be run. If the query leads to no result, the notebook will be skipped.
200+
For more details see [Data Availability Validation](./COMMANDS.md#data-availability-validation).
201+
198202
## 🕸 Web References
199203

200204
- [Graph Data Science 101: Understanding Graphs and Graph Data Science](https://techfirst.medium.com/graph-data-science-101-understanding-graphs-and-graph-data-science-c25055a9db01)

0 commit comments

Comments
 (0)