Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

MATCH (global_git_commit:Git:Commit)
WITH count(global_git_commit) AS globalCommitCount
MATCH (git_commit:Git:Commit)-[:CONTAINS_CHANGE]->(git_change:Git:Change:Update)-[:UPDATES]->(git_file:Git:File)MATCH (git_repository:Git&Repository)-[:HAS_FILE]->(git_file)
MATCH (git_commit:Git:Commit)-[:CONTAINS_CHANGE]->(git_change:Git:Change)-[:UPDATES]->(git_file:Git:File)
MATCH (git_repository:Git&Repository)-[:HAS_FILE]->(git_file)
WHERE git_file.deletedAt IS NULL
// Order files to assure, that pairs of distinct files are grouped together (fileA, fileB) without (fileB, fileA)
ORDER BY git_commit.sha, git_file.relativePath
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

MATCH (global_git_commit:Git:Commit)
WITH count(global_git_commit) AS globalCommitCount
MATCH (git_commit:Git:Commit)-[:CONTAINS_CHANGE]->(git_change:Git:Change:Update)-[:UPDATES]->(git_file:Git:File)MATCH (git_repository:Git&Repository)-[:HAS_FILE]->(git_file)
MATCH (git_commit:Git:Commit)-[:CONTAINS_CHANGE]->(git_change:Git:Change)-[:UPDATES]->(git_file:Git:File)MATCH (git_repository:Git&Repository)-[:HAS_FILE]->(git_file)
MATCH (git_repository:Git&Repository)-[:HAS_FILE]->(git_file)
WHERE git_file.deletedAt IS NULL
WITH *, git_repository.name + '/' + git_file.relativePath AS filePath
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

MATCH (global_git_commit:Git:Commit)
WITH count(global_git_commit) AS globalCommitCount
MATCH (git_commit:Git:Commit)-[:CONTAINS_CHANGE]->(git_change:Git:Change:Update)-[:UPDATES]->(git_file:Git:File)
MATCH (git_commit:Git:Commit)-[:CONTAINS_CHANGE]->(git_change:Git:Change)-[:UPDATES]->(git_file:Git:File)
MATCH (git_repository:Git&Repository)-[:HAS_FILE]->(git_file)
WHERE git_file.deletedAt IS NULL
WITH *, git_repository.name + '/' + git_file.relativePath AS filePath
Expand Down
38 changes: 38 additions & 0 deletions cypher/GitLog/List_pairwise_changed_files_with_dependencies.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// List pair of files that were changed together and that have a declared dependency between each other.

MATCH (firstCodeFile:File)-[dependency:DEPENDS_ON]->(secondCodeFile:File)
MATCH (firstCodeFile)-[pairwiseChange:CHANGED_TOGETHER_WITH]-(secondCodeFile)
WHERE elementId(firstCodeFile) < elementId(secondCodeFile)
WITH firstCodeFile.fileName AS firstFileName
,secondCodeFile.fileName AS secondFileName
,coalesce(dependency.weight, dependency.cardinality) AS dependencyWeight
,pairwiseChange.commitCount AS commitCount
,dependency.fileDistanceAsFewestChangeDirectoryCommands AS fileDistanceAsFewestChangeDirectoryCommands
RETURN dependencyWeight
,commitCount
,fileDistanceAsFewestChangeDirectoryCommands
// ,count(*) AS occurrences
// ,collect(firstFileName + ' -> ' + secondFileName)[0..3] AS examples
ORDER BY dependencyWeight, commitCount

// MATCH (firstCodeFile:File)-[dependency:DEPENDS_ON]->(secondCodeFile:File)
// MATCH (firstCodeFile)-[pairwiseChange:CHANGED_TOGETHER_WITH]-(secondCodeFile)
// WHERE elementId(firstCodeFile) < elementId(secondCodeFile)
// RETURN firstCodeFile.fileName AS firstFileName
// ,secondCodeFile.fileName AS secondFileName
// ,dependency.weight AS dependencyWeight
// ,pairwiseChange.commitCount AS commitCount
// ORDER BY dependencyWeight, commitCount

// MATCH (g1:!Git&File)-[relation:CHANGED_TOGETHER_WITH|DEPENDS_ON]-(g2:!Git&File)
// WITH count(DISTINCT relation) AS relatedFilesCount
// ,collect(DISTINCT relation) AS relations
// UNWIND relations AS relation
// WITH relatedFilesCount
// ,coalesce(relation.commitCount, 0) AS commitCount
// ,coalesce(relation.weight, 0) AS dependencyWeight
// ,coalesce(relation.fileDistanceAsFewestChangeDirectoryCommands, 0) AS fileDistanceAsFewestChangeDirectoryCommands
// RETURN dependencyWeight
// ,commitCount
// ,fileDistanceAsFewestChangeDirectoryCommands
// ORDER BY dependencyWeight, commitCount
95 changes: 95 additions & 0 deletions jupyter/GitHistoryGeneral.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1281,6 +1281,101 @@
" figure.show(**plotly_treemap_figure_show_settings)"
]
},
{
"cell_type": "markdown",
"id": "c15669ef",
"metadata": {},
"source": [
"## Pairwise Changed Files vs. Dependency Weight\n",
"\n",
"This section explores the correlation between how often pairs of files are changed together (common commit count) and their dependency weight. Note that these results should be interpreted cautiously, as comparing pairwise changes and dependencies is inherently challenging.\n",
"\n",
"### Considerations\n",
"- **Historical vs. Current State**: Pairwise changes reflect the entire git history, while dependency weight represents the current state of the codebase.\n",
"- **Commit Granularity**: Developers may use different commit strategies, such as squashing changes into a single commit or creating fine-grained commits. Ideally, each commit should represent a single semantic change for accurate analysis.\n",
"- **Dependency Representation**: Some file types (e.g., Java files with import statements) clearly define dependencies, while others (e.g., shell scripts, XML, YAML) lack explicit dependency relationships.\n",
"- **Repository Characteristics**: Repositories with generated code may have many large commits, while stabilized repositories may only update configuration files for dependency changes."
]
},
{
"cell_type": "markdown",
"id": "98a2feea",
"metadata": {},
"source": [
"#### Data Preview"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a067f8e6",
"metadata": {},
"outputs": [],
"source": [
"pairwise_changed_git_files_with_dependencies = query_cypher_to_data_frame(\"../cypher/GitLog/List_pairwise_changed_files_with_dependencies.cypher\")\n",
"pairwise_changed_git_files_with_dependencies.head(20)"
]
},
{
"cell_type": "markdown",
"id": "01db2db9",
"metadata": {},
"source": [
"#### Data Statistics"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9fe48db8",
"metadata": {},
"outputs": [],
"source": [
"display(\"Pairwise changed git files compared to dependency weights - Overall statistics\")\n",
"display(pairwise_changed_git_files_with_dependencies.describe())\n",
"\n",
"display(\"Pairwise changed git files compared to dependency weights - Pearson Correlation\")\n",
"display(pairwise_changed_git_files_with_dependencies.corr(method='pearson'))\n",
"\n",
"display(\"Pairwise changed git files compared to dependency weights - Spearman Correlation\")\n",
"display(pairwise_changed_git_files_with_dependencies.corr(method='spearman'))\n",
"\n",
"from scipy.stats import pearsonr, spearmanr\n",
"\n",
"display(\"Pearson Correlation with p-value for commitCount and dependencyWeight\")\n",
"display(pearsonr(pairwise_changed_git_files_with_dependencies['commitCount'], pairwise_changed_git_files_with_dependencies['dependencyWeight']))\n",
"\n",
"display(\"Spearman Correlation with p-value for commitCount and dependencyWeight\")\n",
"display(spearmanr(pairwise_changed_git_files_with_dependencies['commitCount'], pairwise_changed_git_files_with_dependencies['dependencyWeight']))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "747f9590",
"metadata": {},
"outputs": [],
"source": [
"# Scatter plot of all pairs of files with their commit count on the x axis and dependency weight on the y axis\n",
"\n",
"if pairwise_changed_git_files_with_dependencies.empty:\n",
" print(\"No data to plot\")\n",
"else:\n",
" figure = plotly_graph_objects.Figure(plotly_graph_objects.Scatter(\n",
" x=pairwise_changed_git_files_with_dependencies['commitCount'], \n",
" y=pairwise_changed_git_files_with_dependencies['dependencyWeight'],\n",
" mode='markers',\n",
" # marker=dict(size=pairwise_changed_git_files_with_dependencies['occurrences'] + 8)\n",
" ))\n",
" figure.update_layout(\n",
" **plotly_bar_layout_base_settings,\n",
" title='Pairwise changed files: Number of changes (commitCount) vs. dependency weight',\n",
" xaxis_title='commit count',\n",
" yaxis_title='dependency weight',\n",
" )\n",
" figure.show(**plotly_treemap_figure_show_settings)"
]
},
{
"cell_type": "markdown",
"id": "14e87aff",
Expand Down
Loading