|
9 | 9 | - [Start an analysis with CSV reports only](#start-an-analysis-with-csv-reports-only) |
10 | 10 | - [Start an analysis with Jupyter reports only](#start-an-analysis-with-jupyter-reports-only) |
11 | 11 | - [Start an analysis with PDF generation](#start-an-analysis-with-pdf-generation) |
| 12 | + - [Start an analysis without importing git log data](#start-an-analysis-without-importing-git-log-data) |
12 | 13 | - [Only run setup and explore the Graph manually](#only-run-setup-and-explore-the-graph-manually) |
13 | 14 | - [Generate Markdown References](#generate-markdown-references) |
14 | 15 | - [Generate Cypher Reference](#generate-cypher-reference) |
|
24 | 25 | - [Setup jQAssistant Java Code Analyzer](#setup-jqassistant-java-code-analyzer) |
25 | 26 | - [Download Maven Artifacts to analyze](#download-maven-artifacts-to-analyze) |
26 | 27 | - [Reset the database and scan the java artifacts](#reset-the-database-and-scan-the-java-artifacts) |
| 28 | + - [Import git log](#import-git-log) |
| 29 | + - [Parameters](#parameters) |
| 30 | + - [Resolving git files to code files](#resolving-git-files-to-code-files) |
| 31 | + - [Import aggregated git log](#import-aggregated-git-log) |
27 | 32 | - [Database Queries](#database-queries) |
28 | 33 | - [Cypher Shell](#cypher-shell) |
29 | 34 | - [HTTP API](#http-api) |
@@ -100,6 +105,14 @@ Note: Generating a PDF from a Jupyter notebook using [nbconvert](https://nbconve |
100 | 105 | ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION=true ./../../scripts/analysis/analyze.sh |
101 | 106 | ``` |
102 | 107 |
|
| 108 | +#### Start an analysis without importing git log data |
| 109 | + |
| 110 | +To speed up analysis and get a smaller data footprint you can switch of git log data import of the "source" directory (if present) with `IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none"` as shown below or choose `IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="aggregated"` to reduce data size by only importing monthly grouped changes instead of all commits. |
| 111 | + |
| 112 | +```shell |
| 113 | +IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none" ./../../scripts/analysis/analyze.sh |
| 114 | +``` |
| 115 | + |
103 | 116 | #### Only run setup and explore the Graph manually |
104 | 117 |
|
105 | 118 | To prepare everything for analysis including installation, configuration and preparation queries to explore the graph manually |
@@ -214,6 +227,35 @@ enhance the data further with relationships between artifacts and packages. |
214 | 227 |
|
215 | 228 | Be aware that this script deletes all previous relationships and nodes in the local Neo4j Graph database. |
216 | 229 |
|
| 230 | +### Import git log |
| 231 | + |
| 232 | +Use [importGitLog.sh](./scripts/importGitLog.sh) to import git log data into the Graph. |
| 233 | +It uses `git log` to extract commits, their authors and the names of the files changed with them. These are stored in an intermediate CSV file and are then imported into Neo4j with the following schema: |
| 234 | + |
| 235 | +```Cypher |
| 236 | +(Git:Log:Author)-[:AUTHORED]->(Git:Log:Commit)->[:CONTAINS]->(Git:Log:File) |
| 237 | +``` |
| 238 | + |
| 239 | +👉**Note:** Commit messages containing `[bot]` are filtered out to ignore changes made by bots. |
| 240 | + |
| 241 | +#### Parameters |
| 242 | + |
| 243 | +The optional parameter `--repository directory-path-to-a-git-repository` can be used to select a different directory for the repository. By default, the `source` directory within the analysis workspace directory is used. This command only needs the git history to be present so a `git clone --bare` is sufficient. If the `source` directory is also used for the analysis then a full git clone is of course needed (like for Typescript). |
| 244 | + |
| 245 | +#### Resolving git files to code files |
| 246 | + |
| 247 | +After git log data has been imported successfully, [Add_RESOLVES_TO_relationships_to_git_files_for_Java.cypher](./cypher/GitLog/Add_RESOLVES_TO_relationships_to_git_files_for_Java.cypher) is used to try to resolve the imported git file names to code files. This first attempt will cover most cases, but not all of them. With this approach it is, for example, not possible to distinguish identical file names in different Java jars from the git source files of a mono repo. |
| 248 | + |
| 249 | +You can use [List_unresolved_git_files.cypher](./cypher/GitLog/List_unresolved_git_files.cypher) to find code files that couldn't be matched to git file names and [List_ambiguous_git_files.cypher](./cypher/GitLog/List_ambiguous_git_files.cypher) to find ambiguously resolved git files. If you have any idea on how to improve this feel free to [open an issue](https://github.com/JohT/code-graph-analysis-pipeline/issues/new). |
| 250 | + |
| 251 | +### Import aggregated git log |
| 252 | + |
| 253 | +Use [importAggregatedGitLog.sh](./scripts/importAggregatedGitLog.sh) to import git log data in an aggregated form into the Graph. It works similar to the [full git log version above](#import-git-log). The only difference is that not every single commit is imported. Instead, changes are grouped per month including their commit count. This is in many cases sufficient and reduces data size and processing time significantly. Here is the resulting schema: |
| 254 | + |
| 255 | +```Cypher |
| 256 | +(Git:Log:Author)-[:AUTHORED]->(Git:Log:ChangeSpan)-[:CONTAINS]->(Git:Log:File) |
| 257 | +``` |
| 258 | + |
217 | 259 | ## Database Queries |
218 | 260 |
|
219 | 261 | ### Cypher Shell |
|
0 commit comments