diff --git a/report/main.md b/report/main.md new file mode 100644 index 0000000..819bdfd --- /dev/null +++ b/report/main.md @@ -0,0 +1,111 @@ +--- +author: Claudio Maggioni +title: Information Modelling & Analysis -- Project 1 +--- + + + +# Code Repository + +The code and result files part of this submission can be found at: + +::: center +Repository: \url{https://github.com/infoMA2023/project-01-god-classes-maggicl} + +Commit ID: **TBD** +::: + +# Data Pre-Processing + +## God Classes + +::: {#tab:god_classes} + ---------------------------------------------- --------------- + **Class Name** **\# Methods** + org.apache.xerces.dom.CoreDocumentImpl 125 + org.apache.xerces.impl.xs.traversers.XSDHandler 118 + org.apache.xerces.xinclude.XIncludeHandler 116 + org.apache.xerces.impl.dtd.DTDGrammar 101 + ---------------------------------------------- --------------- + + : Identified God Classes +::: + +The god classes I identified, and their corresponding number of methods +can be found in Table [1](#tab:god_classes){reference-type="ref" +reference="tab:god_classes"}. + +## Feature Vectors + +Table [2](#tab:feat_vec){reference-type="ref" reference="tab:feat_vec"} +shows aggregate numbers regarding the extracted feature vectors for the +god classes. + +::: {#tab:feat_vec} + ---------------- ------------------------ --------------------- + **Class Name** **\# Feature Vectors** **\# Attributes\*** + \... \... \... + ---------------- ------------------------ --------------------- + + : Feature vector summary (\*= used at least once) +::: + +# Clustering {#sec:clustering} + +## Algorithm Configurations + +Report/comment the algorithm configurations (distance function, linkage +rule, etc.). You may do so in any form you feel suited, but a short +paragraph of text is probably sufficient. + +## Testing Various K & Silhouette Scores + +\(1\) Report data about the clusters produced by the two algorithms at +various k (#clusters, size of clusters, silhouette scores). You may use +any suitable format (table, graph, \...). + +\(2\) Briefly comment your results. What is the best configuration, and +why? Anything else you observed? + +# Evaluation + +## Ground Truth + +I computed the ground truth using the command \.... The generated files +are checked into the repository with the names \.... + +Comment briefly on the strengths & weaknesses of our ground truth. + +## Precision and Recall + +::: {#tab:eval} + ---------------- ------------------- -------- ------------- -------- + **Class Name** **Agglomerative** **K-Means** + Prec. Recall Prec. Recall + \... \... \... \... \... + ---------------- ------------------- -------- ------------- -------- + + : Evaluation Summary +::: + +Precision and Recall, for the optimal configurations found in Section +[3](#sec:clustering){reference-type="ref" reference="sec:clustering"}, +are reported in Table [3](#tab:eval){reference-type="ref" +reference="tab:eval"}. + +## Practical Usefulness + +Discuss the practical usefulness of the obtained code refactoring +assistant in a realistic setting (1 paragraph). diff --git a/report/main.pdf b/report/main.pdf new file mode 100644 index 0000000..c2645e3 Binary files /dev/null and b/report/main.pdf differ