Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 49 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Code Graph Analysis Pipeline Examples

This repository provides examples of how to analyze TypeScript code and Java artifacts using a fully automated GitHub Workflows pipeline with the [code-graph-analysis-pipeline](https://github.com/JohT/code-graph-analysis-pipeline).
This repository provides examples of how to analyze TypeScript code and Java artifacts using a fully automated GitHub Actions workflow pipeline with the [code-graph-analysis-pipeline](https://github.com/JohT/code-graph-analysis-pipeline).

The process involves three steps:

1. **Extract**: Upload TypeScript source code and/or Java artifacts, optionally including their git history, using [actions/upload-artifact](https://github.com/actions/upload-artifact).
1. **Extract**: Upload TypeScript source code and/or Java artifacts, optionally including their Git history, using [actions/upload-artifact](https://github.com/actions/upload-artifact).

1. **Analyze**: Use the shared workflow [JohT/code-graph-analysis-pipeline/.github/workflows/public-analyze-code-graph.yml](https://github.com/JohT/code-graph-analysis-pipeline/blob/main/.github/workflows/public-analyze-code-graph.yml) to analyze the code and artifacts, then upload the results.

1. **Use**: Download the analysis results with [actions/download-artifact](https://github.com/actions/download-artifact) and utilize them as needed.
1. **Use**: Download the analysis results with [actions/download-artifact](https://github.com/actions/download-artifact) and consume them as needed.

## Table of Contents
<!-- TOC -->
Expand Down Expand Up @@ -37,17 +37,21 @@ The process involves three steps:
- [Clustering coefficient vs. Page Rank](#clustering-coefficient-vs-page-rank)
- [Java Types that are surprisingly central or popular](#java-types-that-are-surprisingly-central-or-popular)
- [Largest Java Type Clusters](#largest-java-type-clusters)
- [Java Type Anomalies](#java-type-anomalies)
- [Java Type Top 1 Authority](#java-type-top-1-authority)
- [Java Type Top 1 Bottleneck](#java-type-top-1-bottleneck)
- [Java Type Top 1 Bridge](#java-type-top-1-bridge)
- [Java Type Top 1 Hub](#java-type-top-1-hub)
- [Java Type Top 1 Outlier](#java-type-top-1-outlier)

<!-- /TOC -->

## :rocket: TypeScript Code Pipeline

This example demonstrates how to analyze TypeScript code in a GitHub Workflows pipeline.
This example demonstrates how to analyze TypeScript code in a GitHub Actions workflow.

1. The first job, [prepare-code-to-analyze](https://github.com/JohT/code-graph-analysis-examples/blob/23143b34d8fc6e0ab7d80102d8de0b6e6a4ec98e/.github/workflows/typescript-code-analysis.yml#L40), in the GitHub Actions Workflow [typescript-code-analysis.yml](https://github.com/JohT/code-graph-analysis-examples/blob/23143b34d8fc6e0ab7d80102d8de0b6e6a4ec98e/.github/workflows/typescript-code-analysis.yml), shows how to extract TypeScript code from a repository and upload it for analysis.
1. The first job, [prepare-code-to-analyze](https://github.com/JohT/code-graph-analysis-examples/blob/23143b34d8fc6e0ab7d80102d8de0b6e6a4ec98e/.github/workflows/typescript-code-analysis.yml#L40), in the workflow [typescript-code-analysis.yml](https://github.com/JohT/code-graph-analysis-examples/blob/23143b34d8fc6e0ab7d80102d8de0b6e6a4ec98e/.github/workflows/typescript-code-analysis.yml), shows how to extract TypeScript code from a repository and upload it for analysis.

2. The second job, [analyze-code-graph](https://github.com/JohT/code-graph-analysis-examples/blob/23143b34d8fc6e0ab7d80102d8de0b6e6a4ec98e/.github/workflows/typescript-code-analysis.yml#L89), calls the shared analysis workflows using the uploaded artifacts' names as parameters. Here is a simple example:
2. The second job, [analyze-code-graph](https://github.com/JohT/code-graph-analysis-examples/blob/23143b34d8fc6e0ab7d80102d8de0b6e6a4ec98e/.github/workflows/typescript-code-analysis.yml#L89), calls the shared analysis workflow using the uploaded artifacts' names as parameters. Example:

```yaml
name: Analyze Code Graph
Expand All @@ -64,11 +68,11 @@ This example demonstrates how to analyze TypeScript code in a GitHub Workflows p

Java artifacts are analyzed similarly to TypeScript code. The main difference is that Java artifacts are downloaded from a Maven repository instead of being part of the repository.

To include the git history in the analysis, checkout the corresponding source repository and upload it as the source artifact, similar to the TypeScript example. The Java source code isn't used for the analysis, so a bare git clone is sufficient.
To include Git history in the analysis, checkout the corresponding source repository and upload it as the source artifact, as in the TypeScript example. The Java source code isn't used in the analysis, so a bare git clone is sufficient.

The first job, [prepare-code-to-analyze](https://github.com/JohT/code-graph-analysis-examples/blob/23143b34d8fc6e0ab7d80102d8de0b6e6a4ec98e/.github/workflows/java-code-analysis.yml#L40), in the GitHub Actions Workflow [java-code-analysis.yml](https://github.com/JohT/code-graph-analysis-examples/blob/23143b34d8fc6e0ab7d80102d8de0b6e6a4ec98e/.github/workflows/java-code-analysis.yml), shows how to prepare the Java artifacts and git history for analysis.
The first job, [prepare-code-to-analyze](https://github.com/JohT/code-graph-analysis-examples/blob/23143b34d8fc6e0ab7d80102d8de0b6e6a4ec98e/.github/workflows/java-code-analysis.yml#L40), in the workflow [java-code-analysis.yml](https://github.com/JohT/code-graph-analysis-examples/blob/23143b34d8fc6e0ab7d80102d8de0b6e6a4ec98e/.github/workflows/java-code-analysis.yml), shows how to prepare the Java artifacts and Git history for analysis.

The second and third jobs are the same as for the TypeScript example.
The second and third jobs are the same as in the TypeScript example.

## :bookmark_tabs: CSV Report Reference

Expand Down Expand Up @@ -100,7 +104,7 @@ This repository is licensed under the Apache License, Version 2.0. See [LICENSE]

## :bar_chart: Analysis Results

Here are some examples from over a hundred reports generated by the analysis. These examples illustrate the results of analyzing [AxonFramework](https://github.com/AxonFramework/AxonFramework), a Java framework for Evolutionary Message-Driven Microservices on the JVM. For the complete set of reports, visit the [analysis-results](./analysis-results) directory.
Below are examples drawn from more than a hundred reports produced by the analysis. They illustrate results from analyzing [AxonFramework](https://github.com/AxonFramework/AxonFramework), a Java framework for evolutionary, message-driven microservices on the JVM. For the complete set of reports, see the [analysis-results](./analysis-results) directory.

### External Dependencies of Java Packages

Expand All @@ -120,7 +124,7 @@ Here are some examples from over a hundred reports generated by the analysis. Th

### Object-Oriented Design Metrics for Java Packages

<img src="./analysis-results/AxonFramework/latest/object-oriented-design-metrics-java/ObjectOrientedDesignMetricsJava_files/ObjectOrientedDesignMetricsJava_41_0.png" width="600" alt="Object-Oriented Design Metrics for Java packages">
<img src="./analysis-results/AxonFramework/latest/object-oriented-design-metrics-java/ObjectOrientedDesignMetricsJava_files/ObjectOrientedDesignMetricsJava_41_0.png" width="600" alt="Object-oriented design metrics for Java packages">

### Effective Line Count of Java Methods

Expand All @@ -140,7 +144,7 @@ Here are some examples from over a hundred reports generated by the analysis. Th

### Word Cloud of Git Authors

<img src="./analysis-results/AxonFramework/latest/wordcloud/Wordcloud_files/Wordcloud_16_0.png" width="600" alt="Word cloud of git authors">
<img src="./analysis-results/AxonFramework/latest/wordcloud/Wordcloud_files/Wordcloud_16_0.png" width="600" alt="Word cloud of Git authors">

### Number of distinct commit authors

Expand All @@ -152,18 +156,44 @@ Here are some examples from over a hundred reports generated by the analysis. Th

### Clustering coefficient vs. Page Rank

This scatter plot compares the importance of Java types to the density of their connections. The Y axis shows the [PageRank](https://en.wikipedia.org/wiki/PageRank) score. Higher values indicate more important and frequently used types. The X axis shows the [clustering coefficient](https://en.wikipedia.org/wiki/Clustering_coefficient). Higher values mean more densely connected neighborhoods. Important bridge or hub Types can be found on the Top-left. Highly influential nodes in dense, well-connected communities can be found on the Top-Right.
The scatter plot below compares the importance of Java types to the density of their connections. The Y axis shows the [PageRank](https://en.wikipedia.org/wiki/PageRank) score (higher values indicate more important and frequently used types). The X axis shows the [clustering coefficient](https://en.wikipedia.org/wiki/Clustering_coefficient) (higher values indicate more densely connected neighborhoods). Important bridge or hub types appear toward the top-left; highly influential nodes in dense communities appear toward the top-right.

<img src="./analysis-results/AxonFramework/latest/anomaly-detection/Java_Type_ClusteringCoefficient_versus_PageRank.svg" width="600" alt="Clustering Coefficient vs. PageRank">
<img src="./analysis-results/AxonFramework/latest/anomaly-detection/Java_Type/ClusteringCoefficient_versus_PageRank.svg" width="600" alt="Clustering Coefficient vs. PageRank">

### Java Types that are surprisingly central or popular

<img src="./analysis-results/AxonFramework/latest/anomaly-detection/Java_Type_ClusterNoise_highly_central_and_popular.svg" width="600" alt="">
<img src="./analysis-results/AxonFramework/latest/anomaly-detection/Java_Type/ClusterNoise_highly_central_and_popular.svg" width="600" alt="Surprisingly central or popular Java Types">

### Largest Java Type Clusters

<img src="./analysis-results/AxonFramework/latest/anomaly-detection/Java_Type_Clusters_largest_size.svg" width="600" alt="">
<img src="./analysis-results/AxonFramework/latest/anomaly-detection/Java_Type/Clusters_largest_size.svg" width="600" alt="Largest Java Type Clusters">

### Java Type Anomalies
### Java Type Top 1 Authority

<img src="./analysis-results/AxonFramework/latest/anomaly-detection/Java_Type_Anomalies.svg" width="600" alt="">
An "Authority" is a code unit many important parts depend on: it has high global importance (PageRank) but low local support (ArticleRank). A large PageRank − ArticleRank gap flags widely used utilities or entry points that are central but not well supported locally.

<img src="./analysis-results/AxonFramework/AxonFramework-4.12.1/anomaly-detection/Java_Type/GraphVisualizations/TopAuthority1.svg" width="600" alt="Top 1 Java Type Authority Graph Visualization">

### Java Type Top 1 Bottleneck

A "Bottleneck" is a code unit with exceptionally high Betweenness centrality — it lies on many shortest paths between other nodes, so it mediates a large fraction of dependency flows and is a potential single point of failure or architectural hotspot. Potentially an unintended dependency concentration: if removed, communication between modules breaks.

<img src="./analysis-results/AxonFramework/AxonFramework-4.12.1/anomaly-detection/Java_Type/GraphVisualizations/TopBottleneck1.svg" width="600" alt="Top 1 Java Type Bottleneck Graph Visualization">

### Java Type Top 1 Bridge

A "Bridge" is a code unit that connects different parts of the codebase. It is detected as an anomaly with a high contribution of node embedding features, which encode the structural position in the graph. It shows code that might integrate various layers or boundaries (e.g., API facades) or violates architecture (tangled dependencies).

<img src="./analysis-results/AxonFramework/AxonFramework-4.12.1/anomaly-detection/Java_Type/GraphVisualizations/TopBridge1.svg" width="600" alt="Top 1 Java Type Bridge Graph Visualization">

### Java Type Top 1 Hub

A "Hub" is a code unit with a high out-degree (many dependencies) but low clustering coefficient (its neighbors are not well connected). Hubs are central dependencies that many other parts rely on, making them potential fragile hotspots in the architecture. The low clustering coefficient indicates that these hubs may not be well integrated into the surrounding code, increasing the risk of failure if the hub encounters issues.

<img src="./analysis-results/AxonFramework/AxonFramework-4.12.1/anomaly-detection/Java_Type/GraphVisualizations/TopHub1.svg" width="600" alt="Top 1 Java Type Hub Graph Visualization">

### Java Type Top 1 Outlier

A "Outlier" is a code unit that significantly deviates from typical patterns in the codebase. It has a low clustering probability and a high distance to the nearest cluster centroid in the node embedding space. This indicates that the outlier has a unique structural position in the dependency graph, potentially representing specialized functionality or an architectural anomaly.

<img src="./analysis-results/AxonFramework/AxonFramework-4.12.1/anomaly-detection/Java_Type/GraphVisualizations/TopOutlier1.svg" width="600" alt="Top 1 Java Type Outlier Graph Visualization">