Skip to content

Commit 5bb061d

Browse files
authored
Merge pull request #4 from JohT/feature/refine-external-dependencies
Refine external dependencies report
2 parents e9a3298 + 46b7eee commit 5bb061d

File tree

2 files changed

+169
-20
lines changed

2 files changed

+169
-20
lines changed

jupyter/ExternalDependencies.ipynb

Lines changed: 160 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,11 @@
7878
"cell_type": "code",
7979
"execution_count": null,
8080
"id": "9deaabce",
81-
"metadata": {},
81+
"metadata": {
82+
"tags": [
83+
"table-css"
84+
]
85+
},
8286
"outputs": [],
8387
"source": [
8488
"%%html\n",
@@ -129,6 +133,10 @@
129133
"source": [
130134
"### Table 1 - Top 20 most used external packages overall\n",
131135
"\n",
136+
"This table shows the external packages that are used by the most different internal types overall.\n",
137+
"Additionally, it shows which types of the external package are actually used. External annotations are also listed.\n",
138+
"\n",
139+
"**Columns:**\n",
132140
"- *externalPackageName* identifies the external package as described above\n",
133141
"- *numberOfExternalTypeCaller* refers to the distinct types that make use of the external package\n",
134142
"- *numberOfExternalTypeCalls* includes every invocation or reference to the types in the external package\n",
@@ -143,33 +151,69 @@
143151
"metadata": {},
144152
"outputs": [],
145153
"source": [
146-
"external_package_useage=query_cypher_to_data_frame(\"../cypher/External_Dependencies/External_package_usage_overall.cypher\")\n",
154+
"external_package_usage=query_cypher_to_data_frame(\"../cypher/External_Dependencies/External_package_usage_overall.cypher\")\n",
147155
"\n",
148156
"# Select columns and only show the first 20 entries (head)\n",
149-
"external_package_useage.head(20)"
157+
"external_package_usage.head(20)"
158+
]
159+
},
160+
{
161+
"attachments": {},
162+
"cell_type": "markdown",
163+
"id": "1143afcb",
164+
"metadata": {},
165+
"source": [
166+
"### Chart 1 - Most called external packages in %\n",
167+
"\n",
168+
"Packages that are used less than 0.7% are grouped into the name \"others\" to get a cleaner chart\n",
169+
"with the most significant external packages and how ofter they are called in percent."
170+
]
171+
},
172+
{
173+
"cell_type": "code",
174+
"execution_count": null,
175+
"id": "99ef3fad",
176+
"metadata": {},
177+
"outputs": [],
178+
"source": [
179+
"external_package_usage_significant = external_package_usage.copy();\n",
180+
"\n",
181+
"# Add column \"percentOfExternalTypeCalls\" with the percentage of the \"numberOfExternalTypeCalls\".\n",
182+
"external_package_usage_significant['percentOfExternalTypeCalls'] = external_package_usage_significant['numberOfExternalTypeCalls'] / external_package_usage_significant['numberOfExternalTypeCalls'].sum() * 100\n",
183+
"\n",
184+
"# Change the external package name to \"others\" if it is called less than 0.7 percent\n",
185+
"external_package_usage_significant.loc[external_package_usage_significant['percentOfExternalTypeCalls'] < 0.7, 'externalPackageName'] = 'others'\n",
186+
"\n",
187+
"# Group external package name (foremost the new \"others\" entries) and sum their \"percentOfExternalTypeCalls\"\n",
188+
"external_package_usage_significant = external_package_usage_significant.groupby('externalPackageName')['percentOfExternalTypeCalls'].sum()\n",
189+
"\n",
190+
"# Sort by \"percentOfExternalTypeCalls\" descending\n",
191+
"external_package_usage_significant.sort_values(ascending=False, inplace=True)"
150192
]
151193
},
152194
{
153195
"cell_type": "code",
154196
"execution_count": null,
155-
"id": "7767f100",
197+
"id": "688b6d56",
156198
"metadata": {},
157199
"outputs": [],
158200
"source": [
159201
"plot.figure();\n",
160202
"\n",
161203
"# Set the name of the index to artifactName\n",
162-
"external_package_useage_by_name=external_package_useage.set_index('externalPackageName')\n",
204+
"#external_package_usage_significant=external_package_usage_significant.set_index('externalPackageName')\n",
163205
"\n",
164-
"axis = external_package_useage_by_name.head(20).plot(\n",
165-
" y='numberOfExternalTypeCalls', \n",
206+
"axis = external_package_usage_significant.plot(\n",
207+
" #y='numberOfExternalTypeCalls', \n",
166208
" kind='pie',\n",
167-
" title='External Package Usage',\n",
209+
" title='Significant External Package Usage',\n",
168210
" legend=True,\n",
169211
" labeldistance=None,\n",
212+
" autopct='%1.1f%%',\n",
213+
" pctdistance=1.2,\n",
170214
" cmap=main_color_map\n",
171215
")\n",
172-
"axis.legend(bbox_to_anchor=(1, 1), loc='upper left')\n",
216+
"axis.legend(bbox_to_anchor=(1.05, 1), loc='upper left')\n",
173217
"plot.show()"
174218
]
175219
},
@@ -181,10 +225,11 @@
181225
"source": [
182226
"### Table 2 - Top 20 least used external packages overall\n",
183227
"\n",
184-
"- *externalPackageName* identifies the external package as described above\n",
185-
"- *numberOfExternalTypeCalls* includes every invocation or reference to the types in the external package\n",
228+
"This table identifies external packages that aren't used very often. This could help to find libraries that aren't actually needed or maybe easily replaceable. Some of them might be used sparsely on purpose for example as an adapter to an external library that is actually important. Thus, decisions need to be made on a case-by-case basis.\n",
186229
"\n",
187-
"This table identifies external packages that aren't used very often. This could help to find libraries that aren't actually needed or maybe easily replaced. Some of them might be used only in very few spots in the code on purpose and can't be replaced. This needs to be decided on a case-by-case basis."
230+
"**Columns:**\n",
231+
"- *externalPackageName* identifies the external package as described above\n",
232+
"- *numberOfExternalTypeCalls* includes every invocation or reference to the types in the external package"
188233
]
189234
},
190235
{
@@ -195,7 +240,7 @@
195240
"outputs": [],
196241
"source": [
197242
"# Sort by number of external type calls\n",
198-
"external_package_least_used=external_package_useage.sort_values(by='numberOfExternalTypeCalls', ascending=True)\n",
243+
"external_package_least_used=external_package_usage.sort_values(by='numberOfExternalTypeCalls', ascending=True)\n",
199244
"\n",
200245
"# Reset index\n",
201246
"external_package_least_used = external_package_least_used.reset_index(drop=True)\n",
@@ -212,6 +257,9 @@
212257
"source": [
213258
"### Table 3 - External usage per artifact\n",
214259
"\n",
260+
"The following table shows the most used external packages separately for each artifact including external annotations. \n",
261+
"\n",
262+
"**Columns:**\n",
215263
"- *artifactName* is used to group the the external package usage per artifact for a more detailed analysis.\n",
216264
"- *externalPackageName* identifies the external package as described above\n",
217265
"- *numberOfExternalTypeCaller* refers to the distinct types that make use of the external package\n",
@@ -236,7 +284,19 @@
236284
"id": "4fb87c8a",
237285
"metadata": {},
238286
"source": [
239-
"### Table 4 - External usage per artifact and package"
287+
"### Table 4 - External usage per artifact and package\n",
288+
"\n",
289+
"The next table lists internal packages and the artifacts they belong to that use many different external types of a specific external package without taken external annotations into account. Only the first 30 rows are shown.\n",
290+
"\n",
291+
"**Columns:**\n",
292+
"- *artifactName* that contains the type that calls the external package\n",
293+
"- *fullPackageName* is the package within the artifact that contains the type that calls the external package\n",
294+
"- *externalPackageName* identifies the external package as described above\n",
295+
"- *numberOfExternalTypeCaller* refers to the distinct types that make use of the external package\n",
296+
"- *numberOfExternalTypeCalls* includes every invocation or reference to the types in the external package\n",
297+
"- *numberOfTypesInPackage* represents the total count of all types in that package\n",
298+
"- *externalTypeNames* contains a list of actually utilized types of the external package\n",
299+
"- *packageName* contains the name of the package (last part of *fullPackageName*)"
240300
]
241301
},
242302
{
@@ -247,7 +307,7 @@
247307
"outputs": [],
248308
"source": [
249309
"external_package_usage_per_package = query_cypher_to_data_frame(\"../cypher/External_Dependencies/External_package_usage_per_artifact_and_package.cypher\")\n",
250-
"external_package_usage_per_package"
310+
"external_package_usage_per_package.head(30)"
251311
]
252312
},
253313
{
@@ -256,7 +316,21 @@
256316
"id": "a3161e2b",
257317
"metadata": {},
258318
"source": [
259-
"### Table 5 - Top 20 external package usage per type"
319+
"### Table 5 - Top 20 external package usage per type\n",
320+
"\n",
321+
"This table lists the internal types that utilize the most different external types and packages. These have the highest probability of change depending on external libraries. A case-by-case approach is also advisable here because there could for example also be code units that encapsulate an external library and have this high count of external dependencies on purpose.\n",
322+
"\n",
323+
"**Columns:**\n",
324+
"- *artifactName* that contains the type that calls the external package\n",
325+
"- *fullPackageName* is the package within the artifact that contains the type that calls external types\n",
326+
"- *typeName* identifies the internal type within the package and artifact that calls external types\n",
327+
"- *numberOfExternalTypeCaller* and *numberOfExternalTypes* refers to the distinct external types that are used by the internal type\n",
328+
"- *numberOfExternalTypeCalls* includes every invocation or reference to the types in the external package\n",
329+
"- *numberOfTypesInPackage* represents the total count of all types in that package\n",
330+
"- *numberOfExternalPackages* shows how many different external packages are used by the internal type\n",
331+
"- *externalPackageNames* contains the list of names of the different external packages that are used by the internal type\n",
332+
"- *externalTypeNames* contains a list of actually utilized types of the external package\n",
333+
"- *packageName* contains the name of the package (last part of *fullPackageName*)"
260334
]
261335
},
262336
{
@@ -267,7 +341,6 @@
267341
"outputs": [],
268342
"source": [
269343
"external_package_usage_per_type = query_cypher_to_data_frame(\"../cypher/External_Dependencies/External_package_usage_per_type.cypher\")\n",
270-
"\n",
271344
"external_package_usage_per_type.head(20)"
272345
]
273346
},
@@ -279,8 +352,18 @@
279352
"source": [
280353
"### Table 6 - External package usage distribution per type\n",
281354
"\n",
282-
"The table shown here only includes the first 20 rows at most which typically represents the most significant entries.\n",
283-
"Have a look above to find out which types have the highest external package dependency usage."
355+
"The next table shown here only includes the first 20 rows.\n",
356+
"It shows how many types use one external package, how many use two, etc. .\n",
357+
"This gives an overview of the distribution of external package calls and the overall coupling to external libraries. The higher the count of distinct external packages the lower should be the count of types that use them. Dependencies to external annotations are left out here.\n",
358+
"\n",
359+
"Have a look above to find out which types have the highest external package dependency usage.\n",
360+
"\n",
361+
"**Columns:**\n",
362+
"- *artifactName* that contains the type that calls the external package\n",
363+
"- *artifactTypes* the total count of types in the artifact\n",
364+
"- *numberOfExternalPackages* the number of distinct external packages used\n",
365+
"- *numberOfTypes* in the artifact where the *numberOfExternalPackages* applies\n",
366+
"- *numberOfTypesPercentage* in the artifact where the *numberOfExternalPackages* applies in %"
284367
]
285368
},
286369
{
@@ -294,6 +377,17 @@
294377
"external_package_usage_per_type_distribution[['artifactName', 'artifactTypes', 'numberOfExternalPackages', 'numberOfTypes', 'numberOfTypesPercentage']].head(20)"
295378
]
296379
},
380+
{
381+
"attachments": {},
382+
"cell_type": "markdown",
383+
"id": "39c045f6",
384+
"metadata": {},
385+
"source": [
386+
"### Table 7 - External package usage distribution in percentage\n",
387+
"\n",
388+
"The following table uses the same data as Table 6 but has a column per internal artifact and a row for the number of different external packages used. The values are the percentages of types that fulfill both conditions so they belong to artifact and have the exact count of different external packages used. Dependencies to external annotations are left out here."
389+
]
390+
},
297391
{
298392
"cell_type": "code",
299393
"execution_count": null,
@@ -315,6 +409,17 @@
315409
"external_package_usage_per_type_distribution.head(10)"
316410
]
317411
},
412+
{
413+
"attachments": {},
414+
"cell_type": "markdown",
415+
"id": "121a215f",
416+
"metadata": {},
417+
"source": [
418+
"### Chart 2 - External package usage distribution in percentage\n",
419+
"\n",
420+
"The next chart shows the number of types per artifact that use the given number of different external packages as listed in Table 7. Dependencies to external annotations are left out here."
421+
]
422+
},
318423
{
319424
"cell_type": "code",
320425
"execution_count": null,
@@ -334,6 +439,37 @@
334439
"plot.show()"
335440
]
336441
},
442+
{
443+
"attachments": {},
444+
"cell_type": "markdown",
445+
"id": "e4780292",
446+
"metadata": {},
447+
"source": [
448+
"### Chart 3 - External package usage distribution in percentage stacked per artifact\n",
449+
"\n",
450+
"The following chart shows a stacked bar for each artifact. Every color represents a different count of different external packages used. The y axis then shows how many percent of types (compared to all types of that artifact) use these external packages. By stacking them above each other it is easier to compare the artifacts and their external package usage. Dependencies to external annotations are left out here."
451+
]
452+
},
453+
{
454+
"cell_type": "code",
455+
"execution_count": null,
456+
"id": "cd612166",
457+
"metadata": {},
458+
"outputs": [],
459+
"source": [
460+
"plot.figure();\n",
461+
"axes = external_package_usage_per_type_distribution.transpose().plot(\n",
462+
" kind='bar', \n",
463+
" grid=True,\n",
464+
" title='Relative External Package Usage', \n",
465+
" xlabel='artifact',\n",
466+
" ylabel='percentage of types',\n",
467+
" stacked=True,\n",
468+
" cmap=main_color_map,\n",
469+
")\n",
470+
"plot.show()"
471+
]
472+
},
337473
{
338474
"attachments": {},
339475
"cell_type": "markdown",
@@ -342,7 +478,10 @@
342478
"source": [
343479
"## Maven POMs\n",
344480
"\n",
345-
"### Table 7 - Maven POMs and their declared dependencies"
481+
"\n",
482+
"### Table 8 - Maven POMs and their declared dependencies\n",
483+
"\n",
484+
"If Maven is used as for package and dependency management and a \".pom\" file is included in the artifact, the following table shows the external dependencies that are declared there."
346485
]
347486
},
348487
{
@@ -362,6 +501,7 @@
362501
"name": "JohT"
363502
}
364503
],
504+
"celltoolbar": "Tags",
365505
"kernelspec": {
366506
"display_name": "Python 3 (ipykernel)",
367507
"language": "python",

scripts/executeJupyterNotebook.sh

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,9 @@ echo "executeJupyterNotebook: jupyter_notebook_output_file_name=$jupyter_noteboo
6464
jupyter_notebook_output_file="./${jupyter_notebook_file_name}${JUPYTER_OUTPUT_FILE_POSTFIX}.${jupyter_notebook_file_extension}"
6565
echo "executeJupyterNotebook: jupyter_notebook_output_file=$jupyter_notebook_output_file"
6666

67+
jupyter_notebook_markdown_file="./${jupyter_notebook_file_name}${JUPYTER_OUTPUT_FILE_POSTFIX}.md"
68+
echo "executeJupyterNotebook: jupyter_notebook_markdown_file=$jupyter_notebook_markdown_file"
69+
6770
if [ ! -f "${jupyter_notebook_file_path}/.env" ] ; then
6871
echo "executeJupyterNotebook: Creating file ${jupyter_notebook_file_path}.env ..."
6972
echo "NEO4J_INITIAL_PASSWORD=${NEO4J_INITIAL_PASSWORD}" > "${jupyter_notebook_file_path}/.env"
@@ -114,6 +117,12 @@ jupyter nbconvert --to notebook \
114117
# Convert the Jupyter Notebook to Markdown
115118
jupyter nbconvert --to markdown --no-input "$jupyter_notebook_output_file" || exit 6
116119

120+
# Remove style blocks from Markdown file
121+
# The inplace option -i of sed doesn't seem to work at least on Mac in this case.
122+
# Therefore the temporary file ".nostyle" is created and then moved to overwrite the original markdown file.
123+
sed -E '/<style( scoped)?>/,/<\/style>/d' "${jupyter_notebook_markdown_file}" > "${jupyter_notebook_markdown_file}.nostyle"
124+
mv -f "${jupyter_notebook_markdown_file}.nostyle" "${jupyter_notebook_markdown_file}"
125+
117126
# Convert the Jupyter Notebook to PDF
118127
jupyter nbconvert --to webpdf --no-input --allow-chromium-download --disable-chromium-sandbox "$jupyter_notebook_output_file" || exit 7
119128

0 commit comments

Comments
 (0)