diff --git a/report.pdf b/report.pdf index f2bcfb2..a7c98ab 100644 Binary files a/report.pdf and b/report.pdf differ diff --git a/report.tex b/report.tex index 28a3af3..baaf817 100644 --- a/report.tex +++ b/report.tex @@ -148,10 +148,15 @@ assertions) simply run: env MAVEN_OPTS="-ea" mvn clean test \end{verbatim} -The state of the assignment repository when the type checker was first ran -successfully is pinned by the \textit{git} tag \textit{before-refactor}. A copy -of the CheckerFramework relevant portion of the compilation output at that tag -is stored in the file \textit{before-refactor.txt}. +Apache Commons Text includes classes that have been deprecated. As changing the +interface and behaviour of these classes would be useless, as alternatives to +them exist in the library already, I choose to ignore them while refactoring by +adding a \textit{@SuppressWarning} annotation in each of them. The state of the +assignment repository after the deprecated classes are annotated and when the +type checker was first ran successfully is pinned by the \textit{git} tag +\textit{before-refactor}. A copy of the CheckerFramework relevant portion of the +compilation output at that tag is stored in the file +\textit{before-refactor.txt}. No CheckerFramework checkers other than the index checker is used in this analysis as the code in the project mainly manipulates strings and arrays and a @@ -175,7 +180,7 @@ significant number of warnings are generated even by using this checker only.. Total & 513 & 483 \\ \bottomrule \end{tabular} \caption{Number of CheckerFramework Type Checker warnings by category before - and after refactoring.} + and after refactoring, ignoring deprecated classes.} \label{tab:check} \end{table} @@ -187,8 +192,8 @@ in the following classes: \begin{multicols}{2} \begin{itemize} -\item AlphabetConverter -\item StringSubstitutor +\item AlphabetConverter % done +\item StringSubstitutor % done \item similarity.LongestCommonSubsequence \item translate.AggregateTranslator \item translate.CharSequenceTranslator @@ -204,20 +209,147 @@ in the following classes: \end{itemize} \end{multicols} +The strategy I adopt to perform the refactoring is based on the compiler errors +thrown on the original code. In every flagged statement I attempt to find the +root cause of the warning and eliminate it with either extended type qualifier +annotations or assertions when adding only the former fails. +Instead of using \texttt{@SuppressWarning} annotations I choose to use +\texttt{@AssumeAssertion}-annotated assertions as I aim to use the existing +Commons Text test suite to aid in finding incorrectly-placed annotations. As +mentioned before in the report, I run the test suite of the project by enabling +assertions and I verify that all tests still pass and no +\textit{AssertionError}s are thrown. +In total, the refactor consists in the placement of 16 extended type qualifiers +and 14 assertions. A more detailed description of salient refactoring decisions +taken to extinguish the warnings follows. -\begin{listing}[H] -\begin{minted}[linenos,firstnumber=139]{java} -public static int toMillisInt(final Duration duration) { - Objects.requireNonNull(duration, "duration"); - // intValue() does not do a narrowing conversion here - return LONG_TO_INT_RANGE.fit(Long.valueOf(duration.toMillis())).intValue(); +\subsection{Class \textit{AlphabetConverter}} + +\begin{minted}[linenos,firstnumber=387]{java} +for (int j = 0; j < encoded.length();) { + final int i = encoded.codePointAt(j); + final String s = codePointToString(i); + + if (s.equals(originalToEncoded.get(i))) { + result.append(s); + j++; // because we do not encode in Unicode extended the + // length of each encoded char is 1 + } else { + if (j + encodedLetterLength > encoded.length()) { + throw new UnsupportedEncodingException("Unexpected end " + + "of string while decoding " + encoded); + } + final String nextGroup = encoded.substring(j, + j + encodedLetterLength); +\end{minted} + +Here the \texttt{substring(\ldots)} call at line 151 is flagged by +CheckerFramework warning the start and end index may be negative and that the +start index may be greater than the end index. As the attribute +\texttt{encodedLetterLength} is positive according to the contract of the class +constructor and \texttt{j} is only incremented in the for loop or by a factor of +\texttt{encodedLetterLength}, the code is correct. After introducing a +\mintinline{java}{@Positive} annotation on the declaration of \texttt{j} and an +\mintinline{java}{assert encodedLength > 0} after line 395, CheckerFramework +agrees with my judgement. + +\subsection{Class \textit{StringSubstitutor}} + +\begin{minted}[linenos,breaklines,firstnumber=910]{java} +/** [...] + * @throws IllegalArgumentException if variable is not found when its allowed to throw exception + * @throws StringIndexOutOfBoundsException if {@code offset} is not in the + * range {@code 0 <= offset <= source.length()} + * @throws StringIndexOutOfBoundsException if {@code length < 0} + * @throws StringIndexOutOfBoundsException if {@code offset + length > source.length()} + */ +public String replace(final String source, final int offset, final int length) { + if (source == null) { + return null; + } + final TextStringBuilder buf = new TextStringBuilder(length).append(source, offset, length); + if (!substitute(buf, 0, length)) { + return source.substring(offset, offset + length); + } + return buf.toString(); } \end{minted} - \caption{Method \textit{toMillisInt(Duration)} of class - \textit{time.DurationUtils} in Apache Commons Lang 3.12.0.} -\end{listing} + +The implementation of method \texttt{replace} is flagged by the extended type +checker as the indices \texttt{offset} and \texttt{length} are not bound checked +against the string \texttt{source}. As the unsafe behaviour of the method is +documented in its \textit{javadoc} with appropriate \texttt{@throws} clauses, I +simply add this implied preconditions to the method's contract by using extended +type qualifiers: + +\begin{minted}[breaklines]{java} +public String replace(final String source, + final @IndexOrHigh("#1") int offset, + final @NonNegative @LTLengthOf(value = "#1", offset = "#2 - 1") int length) +\end{minted} + +\subsection{Class \textit{translate.CharSequenceTranslator} and implementors} + +Apache Commons Text provides the aforementioned abstract class implementation as +a template method of sorts for expressing text encoding and decoding algorithms. +The class essentially provides facilities to scan UTF-16 code points +sequentially, and delegating the translation of each code point to the +implementation of the abstract method: + +\begin{minted}[breaklines]{java} +public abstract int translate(CharSequence input, int index, Writer writer) throws IOException; +\end{minted} + +CheckerFramework gives some warnings about some of the implementations of this +method, highlighting that they assume the \texttt{input} \textit{CharSequence} +is non-empty and the \texttt{index} parameter is a valid index for the string. + +Even if the method is public, I choose to interpret this hierarchy to mainly be +a template method pattern, with high coupling between the algorithm in the +abstract class and each abstract method implementation. Given this, I decide to +restrict the method's precondition to highlight conditions already provided by +the caller algorithm, namely the length and index constraints provided by +CheckerFramework. + +The new signature of the abstract method is this: + +\begin{minted}[breaklines]{java} +public abstract int translate(@MinLen(1) CharSequence input, + @NonNegative @LTLengthOf("#1") int index, + Writer writer) throws IOException; +\end{minted} + +As some methods have a more forgiving implementation, and a broader child method +argument type from a more restrictive parent type does not break the rules of +inheritance (thanks to contravariance), I choose to propagate the extended type +annotations only when needed and avoid introducing additional preconditions to +more tolerant implementations of the template method. + +\subsection{Class \textit{translate.SinglePassTranslator} and implementors} + +\textit{SinglePassTranslator} is one of the implementor classes of the +aforementioned \textit{CharSequenceTranslator} template method. However, the +class is itself a template method pattern ``for processing whole input in single +pass''\footnote{According to the class \textit{javadoc} description.}, i.e. +essentially performing an abstraction inversion of the codepoint-by-codepoint +algorithm in \textit{CharSequenceTranslator} by implementing the encoding or +decoding process in a single go. + +The class immediately delegates the implementation of the translation algorithm +to the abstract package-private method: + +\begin{minted}[breaklines]{java} +abstract void translateWhole(CharSequence input, Writer writer) throws IOException; +\end{minted} + +and requires callers of the public implementation of \texttt{translate} to call +it with \texttt{index} equal to 0. + +I simply propagate the non-empty extended type annotation on \texttt{input} +(i.e. \mintinline{java}{@MinLen(1)}) on this new abstract method and +implementors. \section{Conclusions}