project selection and presentation rewritten

This commit is contained in:
Claudio Maggioni 2022-10-19 18:52:42 +02:00
parent bbf3df6192
commit c3e409bdc7
2 changed files with 92 additions and 28 deletions

Binary file not shown.

View file

@ -7,6 +7,8 @@
\usepackage{listings}
\usepackage{xcolor}
\usepackage{lmodern}
\usepackage{booktabs}
\usepackage{float}
\usepackage{listings}
\setlength{\parindent}{0cm}
\setlength{\parskip}{0.3em}
@ -45,49 +47,111 @@
\section{Project selection process}
\pagenumbering{arabic}
We need to find a project that is a single unit in terms of compilation
modules\footnote{A problem for Pattern4J as compiled \textit{.class} files are
distributed across several directories and would have to be merged manually for
analyzing them}
self contained and with as little external dependencies as possible to ease the
analysis project. Additionally, it would be nice if we choose a project that we
already know as library clients.
We have to choose a Java-based project on GitHub that follows the following
requirements:
\subsection {Projects Considered}
\begin{itemize}
\item 100 or greater number of stars;
\item 100 or greater number of forks;
\item 10 or more open issues;
\item 50.000 or more lines of code.
\end{itemize}
We considered the following GitHub repositories:
Additionally, we personally added some (less strict) constraints that we thought
would lead to a more significant and effective analysis:
\begin{itemize}
\item There must be evidence that the project follows business-oriented
conventions. This excludes amateur or personal projects that due to
their nature might have less design pattern applications.
\item Repository data, documentation and comments must be written in the
english language. Many repositories that are at the top of the search
results provided by the hard requirements are not in english and this
drastically hampers our ability to understand the code;
\item The artifact the project produces must not rely on external components
and have a streamlined build process, with all code stored in a single
Maven/Gradle module. This improves both our ability to tinker with the
project more easily and the pattern detection process, which requires all
\textit{.class} files related to the project to be stored in a single
directory tree.
\end{itemize}
Additionally, instead of querying GitHub directly for projects we decided to see
if libraries we knew already in our Java development career would match both the
hard and soft requirements we set for ourselves.
Therefore, we considered the following GitHub repositories:
\begin{description}
\item[vavr-io/vavr] a Java library for functional programming, discarded as
the project is less than 20K LOC and doesn't meet the selection criteria;
the project is less than 20.000 lines of code and does not meet the hard
requirements;
\item[bitcoin4j/bitcoin4j] a Java implementation of the bitcoin protocol,
discarded as the project is distributed in several subprojects;
\item[FasterXML/jackson-core] a Java JSON serialization and
deserialization library. We chose this library because it meets the
selection criteria, it doesn't rely on external components for its
execution, and its project structure uses a single Maven module for its
discarded as the project is distributed in several subprojects and therefore
the build process is nontrivial;
\item[FasterXML/jackson-core] the core ``module'' of a Java JSON serialization
and deserialization library. We chose this project because it meets the
selection criteria, it does not rely on external components for its
execution. Finally, the project structure uses a single Maven module for its
sources and thus easy to analyze.
\end{description}
\subsection {The Jackson Core Library}
As already mentioned, \textit{Jackson} is a library that offers serialization
and deseralization capabilities in JSON format. The library is highly extensible
and customizable through a robust but flexible API and module suite that allows
to change the serialization and deserialization rules, or in the case of the
\textit{jackson-dataformat-xml} module, to allow to target XML instead of JSON.
\subsection {The Jackson Core Project}
As already mentioned, Jackson is a library that offers serialization
and deseralization capabilities in JSON format. It is highly extensible
and customizable through a robust but flexible API. The library is divided in
what the Jackson developers call ``modules'', i.e.\ plug-ins that can augment
the serialization and deserialization process. Some modules, like
\textit{jackson-dataformat-xml} module, even allow to target different encoding
languages like XML.
The chosen repository contains only the \textit{core} module of Jackson. The
\textit{core} module implements the necessary library abstractions and
interfaces to allow other modules to be plugged-in. Additionally, the
\textit{core} module implements the tokenizer and low-level abstractions to work
with the JSON format.
with the JSON format. We will refer to this module as ``Jackson'' or ``Jackson
Core'' interchangeably throughout this report.
We chose to analyze version 2.13.4 of the module (i.e.\ the code
under the git tag \textit{jackson-core-2.13.4}) because it is the latest stable
We choose to analyze version 2.13.4 of the module (i.e.\ the code under the
\textit{git} tag \textit{jackson-core-2.13.4}) because it is the latest stable
version available at the time of writing.
\section{Analysis Implementation}
After verifying that the project meets the hard requirements related to GitHub
(more than 2000 stars, more than 600 forks, 35 open issues\footnote{as of
2022-10-19}), we ensured that the project had enough lines of code by using the
cloc tool, which provided the following output shown in Figure \ref{fig:cloc}.
By looking at the results we can finally assert that the project contains 58.787
lines of Java code and this satisfies all the requirements.
\begin{figure}[H]
\centering
\begin{tabular}{lrrrr}
\toprule
Language & Files & Blank & Comment & Code \\
\midrule
HTML & 4846 & 18473 & 235544 & 1997020\\
Java & 285 & 8532 & 20004 & 48783\\
CSS & 3 & 18 & 69 & 990\\
Logos & 2 & 260 & 212 & 605\\
Bourne Shell & 3 & 35 & 62 & 223\\
XML & 7 & 5 & 1 & 179\\
DOS Batch & 1 & 35 & 0 & 153\\
Markdown & 3 & 58 & 0 & 125\\
Maven & 1 & 13 & 23 & 112\\
YAML & 3 & 1 & 5 & 71\\
JavaScript & 1 & 1 & 0 & 29\\
JSON & 1 & 0 & 0 & 10\\
Properties & 2 & 0 & 16 & 5\\
\midrule
Total & 5158 & 27431 & 255936 & 2048305\\
\bottomrule
\end{tabular}
\caption{Output of the \textit{cloc} tool for the Jackson Core project at revision
\textit{jackson-core-3.13.4}.}
\label{fig:cloc}
\end{figure}
\section{TO REWRITE Analysis Implementation}
We use
\href{https://users.encs.concordia.ca/~nikolaos/pattern\_detection.html}{\textit{Pattern4J}}
@ -95,7 +159,7 @@ as a pattern detection tool. This tool needs compiled \textit{.class} files in
order to perform analysis. Therefore, as \textit{jackson-core} is a standard
Maven project, we compile the sources using the command \textit{mvn clean
compile}. The \textit{pom.xml} of the library specifies Java 1.6 as a
compilation target, which is not supported by JDK 17 or above. We used JDK 11
build target, which is not supported by JDK 17 or above. We used JDK 11
instead, as it is the previous LTS version.
An XML dump of the \textit{Pattern4j} analysis results are included in the
@ -103,7 +167,7 @@ submission as the file \textit{analysis.xml}.
\section{Structural Patterns}
\subsection{Singleton Pattern}
\subsection{TO REWRITE Singleton Pattern}
Lots of false positives for the Singleton pattern. Example,
com.fasterxml.jackson.core.sym.Name1 has a package private constructor and a
public static final instance of it, but reading the documentation the class