Skip to content

Commit 947e757

Browse files
committed
Document Jupyter Notebook execution and data validation
1 parent 8689f0c commit 947e757

File tree

1 file changed

+36
-11
lines changed

1 file changed

+36
-11
lines changed

COMMANDS.md

Lines changed: 36 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
An analysis is started with the script [analyze.sh](./scripts/analysis/analyze.sh).
66
To run all analysis steps simple execute the following command:
7+
78
```shell
89
./../../scripts/analysis/analyze.sh
910
```
@@ -55,7 +56,7 @@ Note: Generating a PDF from a Jupyter notebook using [nbconvert](https://nbconve
5556
ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION=true ./../../scripts/analysis/analyze.sh
5657
```
5758

58-
#### Setup everything to explore the graph manually
59+
#### Only run setup and explore the Graph manually
5960

6061
To prepare everything for analysis including installation, configuration and preparation queries to explore the graph manually
6162
without report generation use this command:
@@ -200,7 +201,7 @@ Query parameters can be added as arguments after the file name. Here is an examp
200201
./scripts/executeQuery.sh ./cypher/Get_Graph_Data_Science_Library_Version.cypher a=1
201202
```
202203

203-
### executeQueryFunctions
204+
### [executeQueryFunctions.sh](./scripts/executeQueryFunctions.sh)
204205

205206
The script [executeQueryFunctions.sh](./scripts/executeQueryFunctions.sh) contains functions to simplify the
206207
call of [executeQuery.sh](./scripts/executeQuery.sh) for different purposes. For example, `execute_cypher_summarized`
@@ -221,7 +222,35 @@ Use [stopNeo4j.sh](./scripts/stopNeo4j.sh) to stop the locally running Neo4j Gra
221222

222223
## Jupyter Notebook
223224

224-
### Commands
225+
### Create a report simplified with [executeJupyterNotebookReport.sh](./scripts/executeJupyterNotebookReport.sh)
226+
227+
[executeJupyterNotebookReport.sh](./scripts/executeJupyterNotebookReport.sh) includes everything from creating a directory within the "reports" directory to data availability validation and executing and converting the given Notebook. This is the all in one script that is also used inside the pipeline. Under the hood it uses [executeJupyterNotebook.sh](#execute-a-notebook-simplified-with-executejupyternotebooksh) to execute the Notebook and [executeQueryFunctions.sh](#executequeryfunctionssh) to query the Database for data availability validation.
228+
229+
Here is an example on how to use [executeJupyterNotebookReport.sh](./scripts/executeJupyterNotebookReport.sh) to for example run the report [Wordcloud.ipynb](./jupyter/Wordcloud.ipynb):
230+
231+
```shell
232+
./scripts/executeJupyterNotebookReport.sh --jupyterNotebook Wordcloud.ipynb
233+
```
234+
235+
#### Data Availability Validation
236+
237+
Jupyter notebooks can have additional custom tags within their "metadata" section. Opening these files with a text editor unveils that typically at the end of the file. Some editors also support editing them directly. Here, the optional metadata property `code_graph_analysis_pipeline_data_validation` is used to specify which data validation query in the [cypher/Validation](./cypher/Validation/) directory should be used. Without this property, the data validation step is skipped. If a validation is specified, it will be executed before the Jupyter Notebook is executed. If the query has at least one result, the validation is seen as successful. Otherwise, the Jupyter Notebook will not be executed.
238+
239+
This is helpful for Juypter Notebook reports that are specific to a programming language or other specific data prerequisites. The Notebook will be skipped if there is no data available which would otherwise lead to confusing and distracting reports with empty tables and figures.
240+
241+
### Execute a Notebook simplified with [executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh)
242+
243+
[executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh) contains everything that is needed to execute a Jupyter Notebook in the command line and convert it to different formats like Markdown and PDF (optionally). It takes care of [setting up the environment](#manually-setup-the-environment-using-conda) and [uses nbconvert](#executing-jupyter-notebooks-with-nbconvert) to execute the notebook and convert it to other file formats under the hood.
244+
245+
Here is an example on how to use [executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh) to for example run [Wordcloud.ipynb](./jupyter/Wordcloud.ipynb):
246+
247+
```shell
248+
./scripts/executeJupyterNotebook.sh ./jupyter/Wordcloud.ipynb
249+
```
250+
251+
### Manually setup the environment using [Conda](https://conda.io)
252+
253+
[Conda](https://conda.io) provides package, dependency, and environment management for any language. Here, it is used to setup the environment for Juypter Notebooks.
225254

226255
- Setup environment
227256

@@ -249,6 +278,10 @@ Use [stopNeo4j.sh](./scripts/stopNeo4j.sh) to stop the locally running Neo4j Gra
249278
conda env export --from-history --name codegraph | grep -v "^prefix: " > codegraph-environment.yml
250279
```
251280

281+
### Executing Jupyter Notebooks with [nbconvert](https://nbconvert.readthedocs.io)
282+
283+
[nbconvert](https://nbconvert.readthedocs.io) converts Jupyter Notebooks to other static formats including HTML, LaTeX, PDF, Markdown, reStructuredText, and more.
284+
252285
- Install pandoc used by nbconvert for LaTeX support (Mac)
253286

254287
```shell
@@ -273,14 +306,6 @@ Use [stopNeo4j.sh](./scripts/stopNeo4j.sh) to stop the locally running Neo4j Gra
273306
jupyter nbconvert --to pdf ./jupyter/first-neo4j-tryout.nbconvert.ipynb
274307
```
275308

276-
- Shell script to execute and convert a Jupyter notebook file
277-
278-
Use [executeJupyterNotebook.sh](./scripts/executeJupyterNotebook.sh) like this:
279-
280-
```shell
281-
./scripts/executeJupyterNotebook.sh ./jupyter/first-neo4j-tryout.ipynb
282-
```
283-
284309
## References
285310

286311
- [Managing environments with Conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)

0 commit comments

Comments
 (0)