Skip to content

Commit cb9990b

Browse files
committed
Optimize visibility metrics report
1 parent fabc4e8 commit cb9990b

File tree

1 file changed

+259
-31
lines changed

1 file changed

+259
-31
lines changed

jupyter/VisibilityMetrics.ipynb

Lines changed: 259 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"- [Visibility Metrics and the Importance of Hiding Things](https://dzone.com/articles/visibility-metrics-and-the-importance-of-hiding-th)\n",
1414
"- [Calculate metrics](https://101.jqassistant.org/calculate-metrics/index.html)\n",
1515
"- [Controlling Access to Members of a Class](https://docs.oracle.com/javase/tutorial/java/javaOO/accesscontrol.html)\n",
16-
"- [py2neo](https://py2neo.org/2021.1/)"
16+
"- [Neo4j Python Driver](https://neo4j.com/docs/api/python-driver/current)"
1717
]
1818
},
1919
{
@@ -29,6 +29,47 @@
2929
"from neo4j import GraphDatabase"
3030
]
3131
},
32+
{
33+
"cell_type": "code",
34+
"execution_count": null,
35+
"id": "acf605be",
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"#The following cell uses the build-in %html \"magic\" to override the CSS style for tables to a much smaller size.\n",
40+
"#This is especially needed for PDF export of tables with multiple columns."
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": null,
46+
"id": "3cc19954",
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"%%html\n",
51+
"<style>\n",
52+
"/* CSS style for smaller dataframe tables. */\n",
53+
".dataframe th {\n",
54+
" font-size: 8px;\n",
55+
"}\n",
56+
".dataframe td {\n",
57+
" font-size: 8px;\n",
58+
"}\n",
59+
"</style>"
60+
]
61+
},
62+
{
63+
"cell_type": "code",
64+
"execution_count": null,
65+
"id": "33c356d7",
66+
"metadata": {},
67+
"outputs": [],
68+
"source": [
69+
"# Main Colormap\n",
70+
"main_color_map = 'nipy_spectral'"
71+
]
72+
},
3273
{
3374
"cell_type": "code",
3475
"execution_count": null,
@@ -98,29 +139,6 @@
98139
"</style>"
99140
]
100141
},
101-
{
102-
"attachments": {},
103-
"cell_type": "markdown",
104-
"id": "91d80bf7",
105-
"metadata": {},
106-
"source": [
107-
"## Artifacts\n",
108-
"\n",
109-
"### Table 1\n",
110-
"\n",
111-
"- List all the artifacts this notebook is based on"
112-
]
113-
},
114-
{
115-
"cell_type": "code",
116-
"execution_count": null,
117-
"id": "dc682db6",
118-
"metadata": {},
119-
"outputs": [],
120-
"source": [
121-
"query_cypher_to_data_frame(\"../cypher/List_all_existing_artifacts.cypher\")"
122-
]
123-
},
124142
{
125143
"attachments": {},
126144
"cell_type": "markdown",
@@ -141,11 +159,32 @@
141159
"\n",
142160
"The relative visibility is between zero (all types are package protected) and one (all types are public). A value lower than one means that there are types that are declared package protected. The lower the value is, the better implementation details are hidden. \n",
143161
"\n",
144-
"Non public classes can't be accessed from another package so they can be changed without affecting code in other packages. They clearly indicate functionality that only belongs to one package. This also motivates to use more classes and to split up code into smaller pieces with a single responsibility and reason to change.\n",
162+
"Non public classes can't be accessed from another package so they can be changed without affecting code in other packages. They clearly indicate functionality that only belongs to one package. This also motivates to use more classes and to split up code into smaller pieces with a single responsibility and reason to change."
163+
]
164+
},
165+
{
166+
"cell_type": "markdown",
167+
"id": "c9536fd9",
168+
"metadata": {},
169+
"source": [
170+
"### Table 1a - Top 40 artifacts with lowest median of package protection encapsulation\n",
145171
"\n",
146-
"### Table 2\n",
172+
"This table shows the relative visibility statistics aggregated for all packages per artifact and focusses on artifacts with many packages and hardly any package protected types (lowest median, high visibility). Package protected types would help to improve encapsulation.\n",
147173
"\n",
148-
"- Show relative visibility statistics aggregated for all packages per artifact "
174+
"Only the top 40 entries are shown. The whole table can be found in the following CSV report: \n",
175+
"`Global_relative_visibility_statistics_for_types`"
176+
]
177+
},
178+
{
179+
"cell_type": "code",
180+
"execution_count": null,
181+
"id": "68ed42d0",
182+
"metadata": {},
183+
"outputs": [],
184+
"source": [
185+
"# Query the visibility statistics per artifact (all packages aggregated)\n",
186+
"# The results Will be used in multiple tables below.\n",
187+
"relative_visibility_per_artifact_aggregated=query_cypher_to_data_frame(\"../cypher/Visibility/Global_relative_visibility_statistics_for_types.cypher\")"
149188
]
150189
},
151190
{
@@ -157,7 +196,110 @@
157196
},
158197
"outputs": [],
159198
"source": [
160-
"query_cypher_to_data_frame(\"../cypher/Visibility/Global_relative_visibility_statistics_for_types.cypher\")"
199+
"# Sort by the \"percentile50\" (median) and \"all\" (number of packages in the artifact) descending\n",
200+
"relative_visibility_statistics_highest_median=relative_visibility_per_artifact_aggregated.sort_values(by=['percentile50', 'all'], ascending=[False, False])\n",
201+
"\n",
202+
"# Reset the index (row numbering starting at 0 and increasing by 1)\n",
203+
"relative_visibility_statistics_highest_median=relative_visibility_statistics_highest_median.reset_index(drop=True)\n",
204+
"\n",
205+
"relative_visibility_statistics_highest_median.head(40)"
206+
]
207+
},
208+
{
209+
"cell_type": "markdown",
210+
"id": "1b84fd51",
211+
"metadata": {},
212+
"source": [
213+
"### Table 1b - Top 40 artifacts with highest median of package protection encapsulation\n",
214+
"\n",
215+
"This table shows the relative visibility statistics aggregated for all packages per artifact and focusses on artifacts with many packages and the highest median of package protected types (low visibility). Package protected types help to improve encapsulation.\n",
216+
"\n",
217+
"Only the top 40 entries are shown. The whole table can be found in the following CSV report: \n",
218+
"`Global_relative_visibility_statistics_for_types`"
219+
]
220+
},
221+
{
222+
"cell_type": "code",
223+
"execution_count": null,
224+
"id": "dc59a07d",
225+
"metadata": {},
226+
"outputs": [],
227+
"source": [
228+
"# Sort by the \"percentile50\" (median) ascending and \"all\" (number of packages in the artifact) descending\n",
229+
"relative_visibility_statistics_lowest_median=relative_visibility_per_artifact_aggregated.sort_values(by=['percentile50', 'all'], ascending=[True, False])\n",
230+
"\n",
231+
"# Reset the index (row numbering starting at 0 and increasing by 1)\n",
232+
"relative_visibility_statistics_lowest_median=relative_visibility_statistics_lowest_median.reset_index(drop=True)\n",
233+
"\n",
234+
"relative_visibility_statistics_lowest_median.head(40)"
235+
]
236+
},
237+
{
238+
"cell_type": "markdown",
239+
"id": "5196ecc2",
240+
"metadata": {},
241+
"source": [
242+
"### Table 1 Chart 1 - Relative visibility in artifacts"
243+
]
244+
},
245+
{
246+
"cell_type": "code",
247+
"execution_count": null,
248+
"id": "f467a8dd",
249+
"metadata": {},
250+
"outputs": [],
251+
"source": [
252+
"plot.figure();\n",
253+
"fig, axes = plot.subplots(nrows=3, ncols=1, sharex=True)\n",
254+
"\n",
255+
"number_of_packages_grid_ticks=[1, 2, 5, 10, 20, 50, 100, 200, 500, 1_000, 2_000, 5_000, 10_000]\n",
256+
"\n",
257+
"relative_visibility_per_artifact_aggregated.plot(\n",
258+
" ax=axes[0],\n",
259+
" kind='scatter',\n",
260+
" title='Relative visibility in artifacts (75% percentile)', \n",
261+
" x='percentile75',\n",
262+
" y='all',\n",
263+
" grid=True,\n",
264+
" logy=True,\n",
265+
" yticks=number_of_packages_grid_ticks,\n",
266+
" xlabel='relative visibility',\n",
267+
" ylabel='number of packages',\n",
268+
" cmap=main_color_map,\n",
269+
" figsize=(10,4),\n",
270+
")\n",
271+
"relative_visibility_per_artifact_aggregated.plot(\n",
272+
" ax=axes[1],\n",
273+
" kind='scatter',\n",
274+
" title='Relative visibility in artifacts (50% percentile)', \n",
275+
" x='percentile50',\n",
276+
" y='all',\n",
277+
" grid=True,\n",
278+
" logy=True,\n",
279+
" yticks=number_of_packages_grid_ticks,\n",
280+
" xlabel='relative visibility',\n",
281+
" ylabel='number of packages',\n",
282+
" cmap=main_color_map,\n",
283+
" figsize=(10,4),\n",
284+
")\n",
285+
"relative_visibility_per_artifact_aggregated.plot(\n",
286+
" ax=axes[2],\n",
287+
" kind='scatter',\n",
288+
" title='Relative visibility in artifacts (25% percentile)', \n",
289+
" x='percentile25',\n",
290+
" y='all',\n",
291+
" grid=True,\n",
292+
" logy=True,\n",
293+
" yticks=number_of_packages_grid_ticks,\n",
294+
" xlabel='relative visibility',\n",
295+
" ylabel='number of packages',\n",
296+
" cmap=main_color_map,\n",
297+
" figsize=(10,10),\n",
298+
")\n",
299+
"axes[0].grid(color = 'grey', linestyle = '-', linewidth = 0.2)\n",
300+
"axes[1].grid(color = 'grey', linestyle = '-', linewidth = 0.2)\n",
301+
"axes[2].grid(color = 'grey', linestyle = '-', linewidth = 0.2)\n",
302+
"plot.show()"
161303
]
162304
},
163305
{
@@ -166,9 +308,12 @@
166308
"id": "3f59da8d",
167309
"metadata": {},
168310
"source": [
169-
"### Table 3\n",
311+
"### Table 2a - Top 40 packages with the highest visibility and lowest encapsulation\n",
170312
"\n",
171-
"- List the top 40 packages and their artifact with the highest relative visibility"
313+
"This table shows the relative visibility statistics per packages and artifact and focusses on packages with many types, hardly any package protected ones and therefore the highest relative visibility (lowest encapsulation). Package protected types would help to improve encapsulation.\n",
314+
"\n",
315+
"Only the top 40 entries are shown. The whole table can be found in the following CSV report: \n",
316+
"`Relative_visibility_public_types_to_all_types_per_package`"
172317
]
173318
},
174319
{
@@ -180,7 +325,90 @@
180325
},
181326
"outputs": [],
182327
"source": [
183-
"query_cypher_to_data_frame(\"../cypher/Visibility/Relative_visibility_public_types_to_all_types_per_package.cypher\").head(50)"
328+
"# Query the visibility statistics per package and artifact (all types aggregated)\n",
329+
"# The results Will be used in multiple tables below.\n",
330+
"relative_visibility_per_package=query_cypher_to_data_frame(\"../cypher/Visibility/Relative_visibility_public_types_to_all_types_per_package.cypher\")"
331+
]
332+
},
333+
{
334+
"cell_type": "code",
335+
"execution_count": null,
336+
"id": "48f7f2d2",
337+
"metadata": {},
338+
"outputs": [],
339+
"source": [
340+
"# Sort by the \"relativeVisibility\" and \"allTypes\" (number of types in the package) descending\n",
341+
"highest_relative_visibility_packages=relative_visibility_per_package.sort_values(by=['relativeVisibility', 'allTypes'], ascending=[False, False])\n",
342+
"\n",
343+
"# Reset the index (row numbering starting at 0 and increasing by 1)\n",
344+
"highest_relative_visibility_packages=highest_relative_visibility_packages.reset_index(drop=True)\n",
345+
"\n",
346+
"highest_relative_visibility_packages.head(40)"
347+
]
348+
},
349+
{
350+
"cell_type": "markdown",
351+
"id": "c6786ef1",
352+
"metadata": {},
353+
"source": [
354+
"### Table 2b - Top 40 packages with the lowest visibility and highest encapsulation\n",
355+
"\n",
356+
"This table shows the relative visibility statistics per packages and artifact and focusses on packages with many types, many package protected ones and therefore the lowest relative visibility (highest encapsulation). Package protected types help to improve encapsulation. Zero percent visibility and therefore packages with no public visible type are suspicious to be dead code.\n",
357+
"\n",
358+
"Only the top 40 entries are shown. The whole table can be found in the following CSV report: \n",
359+
"`Relative_visibility_public_types_to_all_types_per_package`"
360+
]
361+
},
362+
{
363+
"cell_type": "code",
364+
"execution_count": null,
365+
"id": "48c20ca4",
366+
"metadata": {},
367+
"outputs": [],
368+
"source": [
369+
"# Sort by the \"relativeVisibility\" ascending and \"allTypes\" (number of types in the package) descending\n",
370+
"lowest_relative_visibility_packages=relative_visibility_per_package.sort_values(by=['relativeVisibility', 'allTypes'], ascending=[True, False])\n",
371+
"\n",
372+
"# Reset the index (row numbering starting at 0 and increasing by 1)\n",
373+
"lowest_relative_visibility_packages=lowest_relative_visibility_packages.reset_index(drop=True)\n",
374+
"\n",
375+
"lowest_relative_visibility_packages.head(40)"
376+
]
377+
},
378+
{
379+
"cell_type": "markdown",
380+
"id": "8ff237fd",
381+
"metadata": {},
382+
"source": [
383+
"### Table 2 Chart 1 - Relative visibility of packages"
384+
]
385+
},
386+
{
387+
"cell_type": "code",
388+
"execution_count": null,
389+
"id": "98b12846",
390+
"metadata": {},
391+
"outputs": [],
392+
"source": [
393+
"plot.figure();\n",
394+
"\n",
395+
"number_of_types_grid_ticks=[1, 2, 5, 10, 20, 50, 100, 200, 500, 1_000, 2_000, 5_000, 10_000]\n",
396+
"\n",
397+
"relative_visibility_per_package.plot(\n",
398+
" kind='scatter',\n",
399+
" title='Relative visibility of packages', \n",
400+
" x='relativeVisibility',\n",
401+
" y='allTypes',\n",
402+
" grid=True,\n",
403+
" logy=True,\n",
404+
" yticks=number_of_types_grid_ticks,\n",
405+
" xlabel='relative visibility',\n",
406+
" ylabel='number of types',\n",
407+
" cmap=main_color_map,\n",
408+
" figsize=(10,4),\n",
409+
")\n",
410+
"\n",
411+
"plot.show()"
184412
]
185413
}
186414
],

0 commit comments

Comments
 (0)