This commit is contained in:
Claudio Maggioni 2023-12-28 10:43:47 +01:00
parent f77c297ca0
commit 9d6315152e
2 changed files with 65 additions and 14 deletions

Binary file not shown.

View file

@ -47,7 +47,13 @@
\subsection*{Section 1 - Instrumentation}
Report and comment the instrumentation of the code (e.g. number of files, number of functions, number of branches).
The script \textit{instrument.py} in the main directory of the project performs instrumentation to replace each
condition node in the Python files present benchmark suite with a call to \texttt{evaluate\_condition}, which will
preserve program behaviour but as a side effect will compute and store condition distance for each traversed branch.
Table~\ref{tab:count1} summarizes the number of Python files, function definition (\textit{FunctionDef}) nodes,
and comparison nodes (\textit{Compare} nodes not in an \texttt{assert} or \texttt{return} statement) found by the
instrumentation script.
\begin{table} [H]
\centering
@ -66,12 +72,57 @@
\subsection*{Section 2: Fuzzer test generator}
Describe and comment the steps to generate test cases using Fuzzer (include any hyper parameter used during the process)
The script \textit{fuzzer.py} loads the instrumented benchmark suite and generates tests at random to maximize branch
coverage.
The implementation submitted with this report slightly improves on the specification required as it is
able to deal with an arbitrary number of function parameters, which must be type-hinted as either \texttt{str} or
\texttt{int}. The fuzzing process generates a pool of 1000 test case inputs according to the function signature,
using randomly generated integers $\in [-1000, 1000]$, and randomly generated string of length $\in [0, 10]$ with
ASCII characters with code $\in [32, 127]$. Note that test cases generated in the pool may not satisfy the
preconditions (i.e.\ the \texttt{assert} statements on the inputs) for the given function.
250 test cases are extracted from the pool following this procedure. With equal probabilities (each with $p=1/3$):
\begin{itemize}
\item The extracted test case may be kept as is;
\item The extracted test case may be randomly mutated using the \textit{mutate} function. An argument will be
chosen at random, and if of type \texttt{str} a random position in the string will be replaced with a
random character. If the argument is of type \texttt{int}, a random value $\in [-10, 10]$ will be added to
the argument. If the resulting test case is not present in the pool, it will be added to the pool;
\item The extracted test case may be randomly combined with another randomly extracted test using the
\textit{crossover} function. The function will choose at random an argument, and if of type \texttt{int} it will
swap the values assigned to the two tests. If the argument is of type \texttt{str}, the strings from the two test
cases will be split in two substrings at random and they will be joined by combining the ``head'' substring from
one test case with the ``tail'' substring from the other. If the two resulting test cases are new, they will be
added to the pool.
\end{itemize}
If the resulting test case (or test cases) satisfy the function precondition, and if their execution covers branches
that have not been covered by other test cases, they will be added to the test suite. The resulting test suite is
then saved as a \textit{unittest} file, comprising of one test class per function present in the benchmark test file.
\subsection*{Section 3: Genetic Algorithm test generator}
Describe and comment the steps to generated test cases using Genetic Algorithm (include any hyper parameter used during the process)
The script \textit{genetic.py} loads the instrumented benchmark suite and generates tests using a genetic algorithm
to maximize branch coverage and minimize distance to condition boundary values.
The genetic algorithm is implemented via the library \textit{deap} using the \textit{eaSimple} procedure.
The algorithm is initialized with 200 individuals extracted from a pool generated in the same way as the previous
section. The algorithm runs for 20 generations, and it implements the \textit{mate} and \textit{mutate} operators
using the \textit{crossover} and \textit{mutate} functions respectively as described in the previous section.
The fitness function used returns a value of $\infty$ if the test case does not satisfy the function precondition,
a value of $1000000$ if the test case does not cover any new branches,
or the sum of normalized ($1 / (x + 1)$) sum of distances for branches that are not yet covered by other test cases.
A penalty of $2$ is summed to the fitness value for every branch that is already covered. The fitness function is
minimized by the genetic algorithm.
The genetic algorithm is ran 10 times. At the end of each execution the best individuals (sorted by increasing
fitness) are selected if they cover at least one branch that has not been covered. This is the only point in the
procedure where the set of covered branches is updated\footnote{This differs from the reference implementation of
\texttt{sb\_cgi\_decode.py}, which performs the update directly in the fitness function}.
\subsection*{Section 4: Statistical comparison of test generators}
@ -112,18 +163,18 @@
\centering
\begin{tabular}{lrrp{3.5cm}r}
\toprule
\textbf{File} & \textbf{$E(\text{Fuzzer})$} & \textbf{$E(\text{Genetic})$} & \textbf{Cohen's $|d|$} & \textbf{Wilcoxon $p$} \\
\textbf{File} & \textbf{$E(\text{Fuzzer})$} & \textbf{$E(\text{Genetic})$} & \textbf{Cohen's $|d|$} & \textbf{Wilcoxon $p$} \\
\midrule
check\_armstrong & 58.07 & 93.50 & 2.0757 \hfill Huge & 0.0020 \\
railfence\_cipher & 88.41 & 87.44 & 0.8844 \hfill Very large & 0.1011 \\
longest\_substring & 77.41 & 76.98 & 0.0771 \hfill Small & 0.7589 \\
common\_divisor\_count & 76.17 & 72.76 & 0.7471 \hfill Large & 0.1258 \\
zellers\_birthday & 68.09 & 71.75 & 1.4701 \hfill Huge & 0.0039 \\
exponentiation & 69.44 & 67.14 & 0.3342 \hfill Medium & 0.7108 \\
caesar\_cipher & 60.59 & 61.20 & 0.3549 \hfill Medium & 0.2955 \\
gcd & 59.15 & 55.66 & 0.5016 \hfill Large & 0.1627 \\
rabin\_karp & 27.90 & 47.55 & 2.3688 \hfill Huge & 0.0078 \\
anagram\_check & 23.10 & 7.70 & $\infty$ \hfill Huge & 0.0020 \\
check\_armstrong & 58.07 & 93.50 & 2.0757 \hfill Huge & 0.0020 \\
railfence\_cipher & 88.41 & 87.44 & 0.8844 \hfill Very large & 0.1011 \\
longest\_substring & 77.41 & 76.98 & 0.0771 \hfill Small & 0.7589 \\
common\_divisor\_count & 76.17 & 72.76 & 0.7471 \hfill Large & 0.1258 \\
zellers\_birthday & 68.09 & 71.75 & 1.4701 \hfill Huge & 0.0039 \\
exponentiation & 69.44 & 67.14 & 0.3342 \hfill Medium & 0.7108 \\
caesar\_cipher & 60.59 & 61.20 & 0.3549 \hfill Medium & 0.2955 \\
gcd & 59.15 & 55.66 & 0.5016 \hfill Large & 0.1627 \\
rabin\_karp & 27.90 & 47.55 & 2.3688 \hfill Huge & 0.0078 \\
anagram\_check & 23.10 & 7.70 & $\infty$ \hfill Huge & 0.0020 \\
\bottomrule
\end{tabular}
\caption{Statistical comparison between fuzzer and genetic algorithm test case generation in terms of mutation