diff --git a/report.pdf b/report.pdf index 328528e..6642b0b 100644 Binary files a/report.pdf and b/report.pdf differ diff --git a/report.tex b/report.tex index 5252269..7463e6e 100644 --- a/report.tex +++ b/report.tex @@ -7,6 +7,8 @@ \usepackage{listings} \usepackage{xcolor} \usepackage{lmodern} +\usepackage{booktabs} +\usepackage{float} \usepackage{listings} \setlength{\parindent}{0cm} \setlength{\parskip}{0.3em} @@ -45,49 +47,111 @@ \section{Project selection process} \pagenumbering{arabic} -We need to find a project that is a single unit in terms of compilation -modules\footnote{A problem for Pattern4J as compiled \textit{.class} files are -distributed across several directories and would have to be merged manually for -analyzing them} -self contained and with as little external dependencies as possible to ease the -analysis project. Additionally, it would be nice if we choose a project that we -already know as library clients. +We have to choose a Java-based project on GitHub that follows the following +requirements: -\subsection {Projects Considered} +\begin{itemize} + \item 100 or greater number of stars; + \item 100 or greater number of forks; + \item 10 or more open issues; + \item 50.000 or more lines of code. +\end{itemize} -We considered the following GitHub repositories: +Additionally, we personally added some (less strict) constraints that we thought +would lead to a more significant and effective analysis: + +\begin{itemize} + \item There must be evidence that the project follows business-oriented + conventions. This excludes amateur or personal projects that due to + their nature might have less design pattern applications. + \item Repository data, documentation and comments must be written in the + english language. Many repositories that are at the top of the search + results provided by the hard requirements are not in english and this + drastically hampers our ability to understand the code; + \item The artifact the project produces must not rely on external components + and have a streamlined build process, with all code stored in a single + Maven/Gradle module. This improves both our ability to tinker with the + project more easily and the pattern detection process, which requires all + \textit{.class} files related to the project to be stored in a single + directory tree. +\end{itemize} + +Additionally, instead of querying GitHub directly for projects we decided to see +if libraries we knew already in our Java development career would match both the +hard and soft requirements we set for ourselves. + +Therefore, we considered the following GitHub repositories: \begin{description} \item[vavr-io/vavr] a Java library for functional programming, discarded as - the project is less than 20K LOC and doesn't meet the selection criteria; + the project is less than 20.000 lines of code and does not meet the hard + requirements; \item[bitcoin4j/bitcoin4j] a Java implementation of the bitcoin protocol, - discarded as the project is distributed in several subprojects; - \item[FasterXML/jackson-core] a Java JSON serialization and - deserialization library. We chose this library because it meets the - selection criteria, it doesn't rely on external components for its - execution, and its project structure uses a single Maven module for its + discarded as the project is distributed in several subprojects and therefore + the build process is nontrivial; + \item[FasterXML/jackson-core] the core ``module'' of a Java JSON serialization + and deserialization library. We chose this project because it meets the + selection criteria, it does not rely on external components for its + execution. Finally, the project structure uses a single Maven module for its sources and thus easy to analyze. \end{description} - -\subsection {The Jackson Core Library} -As already mentioned, \textit{Jackson} is a library that offers serialization -and deseralization capabilities in JSON format. The library is highly extensible -and customizable through a robust but flexible API and module suite that allows -to change the serialization and deserialization rules, or in the case of the -\textit{jackson-dataformat-xml} module, to allow to target XML instead of JSON. +\subsection {The Jackson Core Project} +As already mentioned, Jackson is a library that offers serialization +and deseralization capabilities in JSON format. It is highly extensible +and customizable through a robust but flexible API. The library is divided in +what the Jackson developers call ``modules'', i.e.\ plug-ins that can augment +the serialization and deserialization process. Some modules, like +\textit{jackson-dataformat-xml} module, even allow to target different encoding +languages like XML. The chosen repository contains only the \textit{core} module of Jackson. The \textit{core} module implements the necessary library abstractions and interfaces to allow other modules to be plugged-in. Additionally, the \textit{core} module implements the tokenizer and low-level abstractions to work -with the JSON format. +with the JSON format. We will refer to this module as ``Jackson'' or ``Jackson +Core'' interchangeably throughout this report. -We chose to analyze version 2.13.4 of the module (i.e.\ the code -under the git tag \textit{jackson-core-2.13.4}) because it is the latest stable +We choose to analyze version 2.13.4 of the module (i.e.\ the code under the +\textit{git} tag \textit{jackson-core-2.13.4}) because it is the latest stable version available at the time of writing. -\section{Analysis Implementation} +After verifying that the project meets the hard requirements related to GitHub +(more than 2000 stars, more than 600 forks, 35 open issues\footnote{as of +2022-10-19}), we ensured that the project had enough lines of code by using the +cloc tool, which provided the following output shown in Figure \ref{fig:cloc}. +By looking at the results we can finally assert that the project contains 58.787 +lines of Java code and this satisfies all the requirements. + +\begin{figure}[H] + \centering + \begin{tabular}{lrrrr} + \toprule + Language & Files & Blank & Comment & Code \\ + \midrule + HTML & 4846 & 18473 & 235544 & 1997020\\ + Java & 285 & 8532 & 20004 & 48783\\ + CSS & 3 & 18 & 69 & 990\\ + Logos & 2 & 260 & 212 & 605\\ + Bourne Shell & 3 & 35 & 62 & 223\\ + XML & 7 & 5 & 1 & 179\\ + DOS Batch & 1 & 35 & 0 & 153\\ + Markdown & 3 & 58 & 0 & 125\\ + Maven & 1 & 13 & 23 & 112\\ + YAML & 3 & 1 & 5 & 71\\ + JavaScript & 1 & 1 & 0 & 29\\ + JSON & 1 & 0 & 0 & 10\\ + Properties & 2 & 0 & 16 & 5\\ + \midrule + Total & 5158 & 27431 & 255936 & 2048305\\ + \bottomrule + \end{tabular} + \caption{Output of the \textit{cloc} tool for the Jackson Core project at revision + \textit{jackson-core-3.13.4}.} + \label{fig:cloc} +\end{figure} + +\section{TO REWRITE Analysis Implementation} We use \href{https://users.encs.concordia.ca/~nikolaos/pattern\_detection.html}{\textit{Pattern4J}} @@ -95,7 +159,7 @@ as a pattern detection tool. This tool needs compiled \textit{.class} files in order to perform analysis. Therefore, as \textit{jackson-core} is a standard Maven project, we compile the sources using the command \textit{mvn clean compile}. The \textit{pom.xml} of the library specifies Java 1.6 as a -compilation target, which is not supported by JDK 17 or above. We used JDK 11 +build target, which is not supported by JDK 17 or above. We used JDK 11 instead, as it is the previous LTS version. An XML dump of the \textit{Pattern4j} analysis results are included in the @@ -103,7 +167,7 @@ submission as the file \textit{analysis.xml}. \section{Structural Patterns} -\subsection{Singleton Pattern} +\subsection{TO REWRITE Singleton Pattern} Lots of false positives for the Singleton pattern. Example, com.fasterxml.jackson.core.sym.Name1 has a package private constructor and a public static final instance of it, but reading the documentation the class