451 lines
19 KiB
TeX
451 lines
19 KiB
TeX
\documentclass[11pt,a4paper]{scrartcl}
|
|
\usepackage{algorithm}
|
|
\usepackage{algpseudocode}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[margin=2.25cm]{geometry}
|
|
\usepackage{hyperref}
|
|
\usepackage{listings}
|
|
\usepackage{xcolor}
|
|
\usepackage{lmodern}
|
|
\usepackage{booktabs}
|
|
\usepackage{multirow}
|
|
\usepackage{graphicx}
|
|
\usepackage{float}
|
|
\usepackage{multicol}
|
|
\usepackage{tikz}
|
|
\usepackage{listings}
|
|
\usepackage{pgfplots}
|
|
\pgfplotsset{compat=1.18}
|
|
\usepackage{subcaption}
|
|
\setlength{\parindent}{0cm}
|
|
\setlength{\parskip}{0.3em}
|
|
\hypersetup{pdfborder={0 0 0}}
|
|
%\usepackage[nomessages]{fp} no easter eggs this time
|
|
\usepackage{amsmath}
|
|
\DeclareMathOperator*{\argmax}{arg\,max}
|
|
\DeclareMathOperator*{\argmin}{arg\,min}
|
|
\usepackage{minted}
|
|
|
|
\definecolor{codegreen}{rgb}{0,0.6,0}
|
|
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
|
|
\definecolor{codepurple}{rgb}{0.58,0,0.82}
|
|
\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
|
|
|
|
\lstdefinestyle{mystyle}{
|
|
backgroundcolor=\color{backcolour},
|
|
commentstyle=\color{codegreen},
|
|
keywordstyle=\color{magenta},
|
|
keywordstyle=[2]{\color{olive}},
|
|
numberstyle=\tiny\color{codegray},
|
|
stringstyle=\color{codepurple},
|
|
basicstyle=\ttfamily\footnotesize,
|
|
breakatwhitespace=false,
|
|
breaklines=true,
|
|
captionpos=b,
|
|
keepspaces=true,
|
|
numbers=left,
|
|
numbersep=5pt,
|
|
showspaces=false,
|
|
showstringspaces=false,
|
|
showtabs=false,
|
|
tabsize=2,
|
|
aboveskip=0.8em,
|
|
belowcaptionskip=0.8em
|
|
}
|
|
\lstset{style=mystyle}
|
|
|
|
\geometry{left=2cm,right=2cm,top=2cm,bottom=3cm}
|
|
\title{
|
|
\vspace{-5ex}
|
|
Assignment 3 -- Software Analysis \\\vspace{0.5cm}
|
|
\Large Extended Java Typechecking
|
|
\vspace{-1ex}
|
|
}
|
|
\author{Claudio Maggioni}
|
|
\date{\vspace{-3ex}}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
\section{Project selection}
|
|
The assignment description requires to find a project with more than 1000 lines
|
|
of code making significant use of arrays or strings.
|
|
|
|
Given these requirements, I decide to analyze the Apache Commons Text project
|
|
in the GitHub repository
|
|
\href{https://github.com/apache/commons-text}{\textbf{apache/commons-text}}.
|
|
|
|
\subsection{The Apache Commons Text Project}
|
|
The Apache Commons family of libraries is an Apache Software
|
|
Foundation\footnote{\url{https://apache.org/}} sponsored collection of Java
|
|
libraries designed to complement the standard libraries of Java. The Apache
|
|
Commons Text project focuses on text manipulation, encoding and decoding of
|
|
\textit{String}s and \textit{CharSequence}-implementing classes in general.
|
|
|
|
All the source and test classes are contained within in the package
|
|
\textit{org.apache.commons.text} or in a sub-package of that package. For the
|
|
sake of brevity, this prefix is omitted from now on when mentioning file paths
|
|
and classes in the project.
|
|
|
|
I choose to analyze the project at the \textit{git} commit
|
|
\texttt{78fac0f157f74feb804140613e4ffec449070990} as it is the latest commit on
|
|
the \textit{master} branch at the time of writing.
|
|
|
|
To verify that the project satisfies the 1000 lines of code requirement, I run
|
|
the \textit{cloc} tool. Results are shown in table \ref{tab:cloc}. Given the
|
|
project has more than 29,000 lines of Java code, this requirement is satisfied.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\begin{tabular}{lrrrr}
|
|
\toprule
|
|
Language & Files & Blank & Comment & Code \\
|
|
\midrule
|
|
Java & 194 & 5642 & 18704 & 26589 \\
|
|
XML & 16 & 205 & 425 & 1370 \\
|
|
Text & 6 & 194 & 0 & 667 \\
|
|
Maven & 1 & 23 & 24 & 536 \\
|
|
YAML & 6 & 39 & 110 & 160 \\
|
|
Markdown & 4 & 40 & 106 & 109 \\
|
|
Velocity Template Language & 1 & 21 & 31 & 87 \\
|
|
CSV & 1 & 0 & 0 & 5 \\
|
|
Properties & 2 & 2 & 28 & 5 \\
|
|
Bourne Shell & 1 & 0 & 2 & 2 \\
|
|
\midrule
|
|
Total & 232 & 6166 & 19430 & 29530 \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\caption{Output of the \textit{cloc} tool for the Apache Commons Text project
|
|
at tag \textit{78fac0f1} (before refactoring is carried out).}
|
|
\label{tab:cloc}
|
|
\end{table}
|
|
|
|
\section{Running the CheckerFramework Type Checker}
|
|
|
|
The relevant source code to analyze has been copied to the directory
|
|
\textit{sources} in the assignment repository
|
|
|
|
\begin{center}
|
|
\href{https://gitlab.com/usi-si-teaching/msde/2022-2023/software-analysis/maggioni/assignment-3}{\textit{usi-si-teaching/msde/2022-2023/software-analysis/maggioni/assignment-3}}
|
|
\end{center}
|
|
|
|
on \textit{gitlab.com}. The Maven build specification for the project has been
|
|
modified to run the CheckerFramework extended type checker (version 3.33.0) as
|
|
an annotation processor to be ran on top of the Java compiler. Both source code
|
|
and test code is checked with the tool for violations, which are reported with
|
|
compilation warnings. To run the type checker simply run:
|
|
|
|
\begin{verbatim}
|
|
mvn clean compile
|
|
\end{verbatim}
|
|
|
|
in a suitable environment (i.e. with JDK 1.8 or greater and Maven installed). To
|
|
additionally run the Apache Commons Text test suite and enable \texttt{assert}
|
|
assertions (later useful for CheckerFramework \texttt{@AssumeAssertion(index)}
|
|
assertions) simply run:
|
|
|
|
\begin{verbatim}
|
|
env MAVEN_OPTS="-ea" mvn clean test
|
|
\end{verbatim}
|
|
|
|
Apache Commons Text includes classes that have been deprecated. As changing the
|
|
interface and behaviour of these classes would be useless, as alternatives to
|
|
them exist in the library already, I choose to ignore them while refactoring by
|
|
adding a \textit{@SuppressWarning} annotation in each of them. The state of the
|
|
assignment repository after the deprecated classes are annotated and when the
|
|
type checker was first ran successfully is pinned by the \textit{git} tag
|
|
\textit{before-refactor}. A copy of the CheckerFramework relevant portion of the
|
|
compilation output at that tag is stored in the file
|
|
\textit{before-refactor.txt}.
|
|
|
|
No CheckerFramework checkers other than the index checker is used in this
|
|
analysis as the code in the project mainly manipulates strings and arrays and a
|
|
significant number of warnings are generated even by using this checker only.
|
|
|
|
\section{Refactoring}
|
|
|
|
\begin{table}[!ht]
|
|
\centering
|
|
\begin{tabular}{lrr}
|
|
\toprule
|
|
Warning type & Before refactoring & After refactoring \\ \midrule
|
|
argument & 254 & 241 \\
|
|
array.access.unsafe.high & 130 & 117 \\
|
|
array.access.unsafe.high.constant & 31 & 28 \\
|
|
array.access.unsafe.high.range & 22 & 22 \\
|
|
array.access.unsafe.low & 59 & 58 \\
|
|
array.length.negative & 3 & 3 \\
|
|
cast.unsafe & 2 & 2 \\
|
|
override.return & 12 & 12 \\ \midrule
|
|
Total & 513 & 483 \\ \bottomrule
|
|
\end{tabular}
|
|
\caption{Number of CheckerFramework Type Checker warnings by category before
|
|
and after refactoring, ignoring deprecated classes.}
|
|
\label{tab:check}
|
|
\end{table}
|
|
|
|
Table \ref{tab:check} provides a summary on the extent of the refactoring
|
|
performed in response to index checker warnings across the Apache Commons Text
|
|
project. In total, 513 warnings are found before refactoring, with 30 of them
|
|
later being extinguished by introducing annotations and assertions in the code
|
|
in the following classes:
|
|
|
|
\begin{multicols}{2}
|
|
\begin{itemize}
|
|
\item AlphabetConverter % done
|
|
\item StringSubstitutor % done
|
|
\item similarity.LongestCommonSubsequence
|
|
\item translate.AggregateTranslator
|
|
\item translate.CharSequenceTranslator
|
|
\end{itemize}
|
|
\vfill\null
|
|
\columnbreak
|
|
\begin{itemize}
|
|
\item translate.CodePointTranslator
|
|
\item translate.CsvTranslators
|
|
\item translate.JavaUnicodeEscaper
|
|
\item translate.SinglePassTranslator
|
|
\item translate.UnicodeEscaper
|
|
\end{itemize}
|
|
\end{multicols}
|
|
|
|
The strategy I adopt to perform the refactoring is based on the compiler errors
|
|
thrown on the original code. In every flagged statement I attempt to find the
|
|
root cause of the warning and eliminate it with either extended type qualifier
|
|
annotations or assertions when adding only the former fails.
|
|
|
|
Instead of using \texttt{@SuppressWarning} annotations I choose to use
|
|
\texttt{@AssumeAssertion}-annotated assertions as I aim to use the existing
|
|
Commons Text test suite to aid in finding incorrectly-placed annotations. As
|
|
mentioned before in the report, I run the test suite of the project by enabling
|
|
assertions and I verify that all tests still pass and no
|
|
\textit{AssertionError}s are thrown.
|
|
|
|
In total, the refactor consists in the placement of 16 extended type qualifiers
|
|
and 14 assertions. A more detailed description of salient refactoring decisions
|
|
taken to extinguish the warnings follows.
|
|
|
|
\subsection{Class \textit{AlphabetConverter}}
|
|
|
|
\begin{minted}[linenos,firstnumber=387]{java}
|
|
for (int j = 0; j < encoded.length();) {
|
|
final int i = encoded.codePointAt(j);
|
|
final String s = codePointToString(i);
|
|
|
|
if (s.equals(originalToEncoded.get(i))) {
|
|
result.append(s);
|
|
j++; // because we do not encode in Unicode extended the
|
|
// length of each encoded char is 1
|
|
} else {
|
|
if (j + encodedLetterLength > encoded.length()) {
|
|
throw new UnsupportedEncodingException("Unexpected end "
|
|
+ "of string while decoding " + encoded);
|
|
}
|
|
final String nextGroup = encoded.substring(j,
|
|
j + encodedLetterLength);
|
|
\end{minted}
|
|
|
|
Here the \texttt{substring(\ldots)} call at line 151 is flagged by
|
|
CheckerFramework warning the start and end index may be negative and that the
|
|
start index may be greater than the end index. As the attribute
|
|
\texttt{encodedLetterLength} is positive according to the contract of the class
|
|
constructor and \texttt{j} is only incremented in the for loop or by a factor of
|
|
\texttt{encodedLetterLength}, the code is correct. After introducing a
|
|
\mintinline{java}{@Positive} annotation on the declaration of \texttt{j} and an
|
|
\mintinline{java}{assert encodedLength > 0} after line 395, CheckerFramework
|
|
agrees with my judgement.
|
|
|
|
\subsection{Class \textit{StringSubstitutor}}
|
|
|
|
\begin{minted}[linenos,breaklines,firstnumber=910]{java}
|
|
/** [...]
|
|
* @throws IllegalArgumentException if variable is not found when its allowed to throw exception
|
|
* @throws StringIndexOutOfBoundsException if {@code offset} is not in the
|
|
* range {@code 0 <= offset <= source.length()}
|
|
* @throws StringIndexOutOfBoundsException if {@code length < 0}
|
|
* @throws StringIndexOutOfBoundsException if {@code offset + length > source.length()}
|
|
*/
|
|
public String replace(final String source, final int offset, final int length) {
|
|
if (source == null) {
|
|
return null;
|
|
}
|
|
final TextStringBuilder buf = new TextStringBuilder(length).append(source, offset, length);
|
|
if (!substitute(buf, 0, length)) {
|
|
return source.substring(offset, offset + length);
|
|
}
|
|
return buf.toString();
|
|
}
|
|
\end{minted}
|
|
|
|
The implementation of method \texttt{replace} is flagged by the extended type
|
|
checker as the indices \texttt{offset} and \texttt{length} are not bound checked
|
|
against the string \texttt{source}. As the unsafe behaviour of the method is
|
|
documented in its \textit{javadoc} with appropriate \texttt{@throws} clauses, I
|
|
simply add this implied preconditions to the method's contract by using extended
|
|
type qualifiers:
|
|
|
|
\begin{minted}[breaklines]{java}
|
|
public String replace(final String source,
|
|
final @IndexOrHigh("#1") int offset,
|
|
final @NonNegative @LTLengthOf(value = "#1", offset = "#2 - 1") int length)
|
|
\end{minted}
|
|
|
|
\subsection{Class \textit{translate.CharSequenceTranslator} and implementors}
|
|
|
|
Apache Commons Text provides the aforementioned abstract class implementation as
|
|
a template method of sorts for expressing text encoding and decoding algorithms.
|
|
The class essentially provides facilities to scan UTF-16 code points
|
|
sequentially, and delegating the translation of each code point to the
|
|
implementation of the abstract method:
|
|
|
|
\begin{minted}[breaklines]{java}
|
|
public abstract int translate(CharSequence input, int index, Writer writer) throws IOException;
|
|
\end{minted}
|
|
|
|
CheckerFramework gives some warnings about some of the implementations of this
|
|
method, highlighting that they assume the \texttt{input} \textit{CharSequence}
|
|
is non-empty and the \texttt{index} parameter is a valid index for the string.
|
|
|
|
Even if the method is public, I choose to interpret this hierarchy to mainly be
|
|
a template method pattern, with high coupling between the algorithm in the
|
|
abstract class and each abstract method implementation. Given this, I decide to
|
|
restrict the method's precondition to highlight conditions already provided by
|
|
the caller algorithm, namely the length and index constraints provided by
|
|
CheckerFramework.
|
|
|
|
The new signature of the abstract method is this:
|
|
|
|
\begin{minted}[breaklines]{java}
|
|
public abstract int translate(@MinLen(1) CharSequence input,
|
|
@NonNegative @LTLengthOf("#1") int index,
|
|
Writer writer) throws IOException;
|
|
\end{minted}
|
|
|
|
As some methods have a more forgiving implementation, and a broader child method
|
|
argument type from a more restrictive parent type does not break the rules of
|
|
inheritance (thanks to contravariance), I choose to propagate the extended type
|
|
annotations only when needed and avoid introducing additional preconditions to
|
|
more tolerant implementations of the template method.
|
|
|
|
\subsection{Class \textit{translate.SinglePassTranslator} and implementors}
|
|
|
|
\textit{SinglePassTranslator} is one of the implementor classes of the
|
|
aforementioned \textit{CharSequenceTranslator} template method. However, the
|
|
class is itself a template method pattern ``for processing whole input in single
|
|
pass''\footnote{According to the class \textit{javadoc} description.}, i.e.
|
|
essentially performing an abstraction inversion of the codepoint-by-codepoint
|
|
algorithm in \textit{CharSequenceTranslator} by implementing the encoding or
|
|
decoding process in a single go.
|
|
|
|
The class immediately delegates the implementation of the translation algorithm
|
|
to the abstract package-private method:
|
|
|
|
\begin{minted}[breaklines]{java}
|
|
abstract void translateWhole(CharSequence input, Writer writer) throws IOException;
|
|
\end{minted}
|
|
|
|
and requires callers of the public implementation of \texttt{translate} to call
|
|
it with \texttt{index} equal to 0.
|
|
|
|
I simply propagate the non-empty extended type annotation on \texttt{input}
|
|
(i.e. \mintinline{java}{@MinLen(1)}) on this new abstract method and
|
|
implementors.
|
|
|
|
The \textit{translate.CsvTranslators\$CsvUnescaper} implementation of this new
|
|
template method requires additional attention to extinguish all
|
|
CheckerFramework's index checker warnings.
|
|
|
|
\begin{minted}[breaklines,linenos,firstnumber=60]{java}
|
|
void translateWhole(final @MinLen(1) CharSequence input, final Writer writer) throws IOException {
|
|
// is input not quoted?
|
|
if (input.charAt(0) != CSV_QUOTE || input.charAt(input.length() - 1) != CSV_QUOTE) {
|
|
writer.write(input.toString());
|
|
return;
|
|
}
|
|
|
|
// strip quotes
|
|
final String quoteless = input.subSequence(1, input.length() - 1).toString();
|
|
\end{minted}
|
|
|
|
Here CheckerFramework was unable to deduce that the
|
|
\mintinline{java}{input.length() - 1} indeed results in a safe index for
|
|
\texttt{input} as the \textit{CharSequence} is always non-empty (as specified
|
|
with the propagated type qualifiers from the abstract signature of
|
|
\texttt{translateWhole}). This warning is fixed by precomputing the last index
|
|
of the string and introducing a trivially true assertion on it:
|
|
|
|
\begin{minted}[breaklines,linenos,firstnumber=60]{java}
|
|
void translateWhole(final @MinLen(1) CharSequence input, final Writer writer) throws IOException {
|
|
final int lastIndex = input.length() - 1;
|
|
|
|
assert lastIndex >= 0 : "@AssumeAssertion(index): input.length() is >= 1 by contract";
|
|
|
|
// is input not quoted?
|
|
if (input.charAt(0) != CSV_QUOTE || input.charAt(lastIndex) != CSV_QUOTE) {
|
|
writer.write(input.toString());
|
|
return;
|
|
}
|
|
|
|
// strip quotes
|
|
final String quoteless = input.subSequence(1, lastIndex).toString();
|
|
\end{minted}
|
|
|
|
This assertion is one good example of CheckerFramework's limitations in
|
|
verifying transitively true properties due to the lack of deductive verification
|
|
capabilities. All assertions added in the project are trivial in this way to
|
|
some degree.
|
|
|
|
\section{Conclusions}
|
|
|
|
As evidenced by the Apache Common Text test suite and the previous section of
|
|
this report, no changes in the implementation behaviour were introduced in the
|
|
code by the refactor. Only extended type annotations and assertions (that hold
|
|
when executing the test suite) were added to the code.
|
|
|
|
No implementaton-derived bugs were discovered during refactoring, altough the
|
|
introduction of additional method preconditions via extended type annotations
|
|
was sometimes needed. Some questionable design choices were found in the
|
|
library, namely the \textit{CharSequenceTranslator} template method hierarchy
|
|
with \mintinline{java}{public} implementors and the inversion of abstration by
|
|
refused bequest in the \textit{SinglePassTranslator} partial implementation,
|
|
however I managed to introduce correct extended types without drastic
|
|
alterations (which may have broken clients of the library).
|
|
|
|
It was quite easy to introduce the CheckerFramework index checker to the Maven
|
|
build toolchain of the project. However, compiling the project after this was
|
|
significantly slower than before and lots of spurious warnings (i.e. false
|
|
positives) were produced by the tool due to its lack of deductive verification
|
|
capabilities. Some benefits were gained by using the tool, namely being able to
|
|
better refine the precondition of the methods in the library by documenting them
|
|
in an automatedly parsable fashion. However, the sheer number of trivially true
|
|
assertions introduced to silence some warnings is a significant downside to
|
|
consider when using the index checker.
|
|
|
|
The CheckerFramework extended type checker has a cost and complexity of usage
|
|
greater than the Infer static analysis tool and lower than the Dafny deductive
|
|
verifier. However, with respect to my experience with all the tools in carrying
|
|
out the assignment, this is the tool that has shown to be the less useful.
|
|
|
|
The less powerful Infer static analyzer, even with a great number of false
|
|
positives, was able to highlight some bugs in the project I have chosen for that
|
|
assignment (Apache Commons Lang). As all Apache Commons libraries are quite
|
|
mature and well-maintained it is expected that both tools find relatively few
|
|
things to flag in them. However, no noteworthy errors were found by
|
|
CheckerFramework in this project other than some small imprecisions in the
|
|
precondition specification of some methods.
|
|
|
|
When compared to Dafny and deductive verifiers in general, clearly both Infer
|
|
and CheckerFramework pale in comparison in terms of capabilities. However, the
|
|
increased cost of adoption and of the tool's execution make it suitable only to
|
|
critical portions of project codebases.
|
|
|
|
Finally, I would like to mention the
|
|
\href{https://github.com/JetBrains/java-annotations}{Jetbrains Java Annotations}
|
|
as a much less powerful but effective extended type checking mechanism to handle
|
|
nullable an non-null values in plain Java. Even if its scope of analysis is
|
|
rather narrow and warnings an the type checking process is proprietary (as only
|
|
Jetbrains IDEs interpret the annotations) I have found it an effective
|
|
development aid in my university and work experience.
|
|
\end{document}
|
|
|