report

2023-12-28 10:43:47 +01:00 · 2023-12-28 10:43:47 +01:00 · 9d6315152e
commit 9d6315152e
parent f77c297ca0
2 changed files with 65 additions and 14 deletions
--- a/report/main.pdf
+++ b/report/main.pdf
--- a/report/main.tex
+++ b/report/main.tex
@ -47,7 +47,13 @@

    \subsection*{Section 1 - Instrumentation}

-    Report and comment the instrumentation of the code (e.g. number of files, number of functions, number of branches).
+    The script \textit{instrument.py} in the main directory of the project performs instrumentation to replace each
+    condition node in the Python files present benchmark suite with a call to \texttt{evaluate\_condition}, which will
+    preserve program behaviour but as a side effect will compute and store condition distance for each traversed branch.
+
+    Table~\ref{tab:count1} summarizes the number of Python files, function definition (\textit{FunctionDef}) nodes,
+    and comparison nodes (\textit{Compare} nodes not in an \texttt{assert} or \texttt{return} statement) found by the
+    instrumentation script.

    \begin{table} [H]
        \centering
@ -66,12 +72,57 @@

    \subsection*{Section 2: Fuzzer test generator}

-    Describe and comment the steps to generate test cases using Fuzzer (include any hyper parameter used during the process)
+    The script \textit{fuzzer.py} loads the instrumented benchmark suite and generates tests at random to maximize branch
+    coverage.

+    The implementation submitted with this report slightly improves on the specification required as it is
+    able to deal with an arbitrary number of function parameters, which must be type-hinted as either \texttt{str} or
+    \texttt{int}. The fuzzing process generates a pool of 1000 test case inputs according to the function signature,
+    using randomly generated integers $\in [-1000, 1000]$, and randomly generated string of length $\in [0, 10]$ with
+    ASCII characters with code $\in [32, 127]$. Note that test cases generated in the pool may not satisfy the
+    preconditions (i.e.\ the \texttt{assert} statements on the inputs) for the given function.
+
+    250 test cases are extracted from the pool following this procedure. With equal probabilities (each with $p=1/3$):
+
+    \begin{itemize}
+        \item The extracted test case may be kept as is;
+        \item The extracted test case may be randomly mutated using the \textit{mutate} function. An argument will be
+        chosen at random, and if of type \texttt{str} a random position in the string will be replaced with a
+        random character. If the argument is of type \texttt{int}, a random value $\in [-10, 10]$ will be added to
+        the argument. If the resulting test case is not present in the pool, it will be added to the pool;
+        \item The extracted test case may be randomly combined with another randomly extracted test using the
+        \textit{crossover} function. The function will choose at random an argument, and if of type \texttt{int} it will
+        swap the values assigned to the two tests. If the argument is of type \texttt{str}, the strings from the two test
+        cases will be split in two substrings at random and they will be joined by combining the ``head'' substring from
+        one test case with the ``tail'' substring from the other. If the two resulting test cases are new, they will be
+        added to the pool.
+    \end{itemize}
+
+    If the resulting test case (or test cases) satisfy the function precondition, and if their execution covers branches
+    that have not been covered by other test cases, they will be added to the test suite. The resulting test suite is
+    then saved as a \textit{unittest} file, comprising of one test class per function present in the benchmark test file.

    \subsection*{Section 3: Genetic Algorithm test generator}

-    Describe and comment the steps to generated test cases using Genetic Algorithm (include any hyper parameter used during the process)
+    The script \textit{genetic.py} loads the instrumented benchmark suite and generates tests using a genetic algorithm
+    to maximize branch coverage and minimize distance to condition boundary values.
+
+    The genetic algorithm is implemented via the library \textit{deap} using the \textit{eaSimple} procedure.
+    The algorithm is initialized with 200 individuals extracted from a pool generated in the same way as the previous
+    section. The algorithm runs for 20 generations, and it implements the \textit{mate} and \textit{mutate} operators
+    using the \textit{crossover} and \textit{mutate} functions respectively as described in the previous section.
+
+    The fitness function used returns a value of $\infty$ if the test case does not satisfy the function precondition,
+    a value of $1000000$ if the test case does not cover any new branches,
+    or the sum of normalized ($1 / (x + 1)$) sum of distances for branches that are not yet covered by other test cases.
+    A penalty of $2$ is summed to the fitness value for every branch that is already covered. The fitness function is
+    minimized by the genetic algorithm.
+
+    The genetic algorithm is ran 10 times. At the end of each execution the best individuals (sorted by increasing
+    fitness) are selected if they cover at least one branch that has not been covered. This is the only point in the
+    procedure where the set of covered branches is updated\footnote{This differs from the reference implementation of
+    \texttt{sb\_cgi\_decode.py}, which performs the update directly in the fitness function}.
+

    \subsection*{Section 4: Statistical comparison of test generators}

@ -112,18 +163,18 @@
        \centering
        \begin{tabular}{lrrp{3.5cm}r}
            \toprule
-            \textbf{File} & \textbf{$E(\text{Fuzzer})$} & \textbf{$E(\text{Genetic})$} & \textbf{Cohen's $|d|$} & \textbf{Wilcoxon $p$} \\
+            \textbf{File}          & \textbf{$E(\text{Fuzzer})$} & \textbf{$E(\text{Genetic})$} & \textbf{Cohen's $|d|$} & \textbf{Wilcoxon $p$} \\
            \midrule
-            check\_armstrong       & 58.07 & 93.50 & 2.0757  \hfill Huge       & 0.0020  \\
-            railfence\_cipher      & 88.41 & 87.44 & 0.8844 \hfill Very large & 0.1011 \\
-            longest\_substring     & 77.41 & 76.98 & 0.0771 \hfill Small      & 0.7589 \\
-            common\_divisor\_count & 76.17 & 72.76 & 0.7471 \hfill Large      & 0.1258 \\
-            zellers\_birthday      & 68.09 & 71.75 & 1.4701  \hfill Huge       & 0.0039 \\
-            exponentiation         & 69.44 & 67.14 & 0.3342 \hfill Medium     & 0.7108 \\
-            caesar\_cipher         & 60.59 & 61.20 & 0.3549  \hfill Medium     & 0.2955 \\
-            gcd                    & 59.15 & 55.66 & 0.5016 \hfill Large      & 0.1627 \\
-            rabin\_karp            & 27.90 & 47.55 & 2.3688  \hfill Huge       & 0.0078 \\
-            anagram\_check         & 23.10 & 7.70  & $\infty$  \hfill Huge       & 0.0020  \\
+            check\_armstrong       & 58.07                       & 93.50                        & 2.0757  \hfill Huge        & 0.0020                \\
+            railfence\_cipher      & 88.41                       & 87.44                        & 0.8844 \hfill Very large & 0.1011                \\
+            longest\_substring     & 77.41                       & 76.98                        & 0.0771 \hfill Small      & 0.7589                \\
+            common\_divisor\_count & 76.17                       & 72.76                        & 0.7471 \hfill Large      & 0.1258                \\
+            zellers\_birthday      & 68.09                       & 71.75                        & 1.4701  \hfill Huge        & 0.0039                \\
+            exponentiation         & 69.44                       & 67.14                        & 0.3342 \hfill Medium     & 0.7108                \\
+            caesar\_cipher         & 60.59                       & 61.20                        & 0.3549  \hfill Medium      & 0.2955                \\
+            gcd                    & 59.15                       & 55.66                        & 0.5016 \hfill Large      & 0.1627                \\
+            rabin\_karp            & 27.90                       & 47.55                        & 2.3688  \hfill Huge        & 0.0078                \\
+            anagram\_check         & 23.10                       & 7.70                         & $\infty$  \hfill Huge      & 0.0020                \\
            \bottomrule
        \end{tabular}
        \caption{Statistical comparison between fuzzer and genetic algorithm test case generation in terms of mutation