diff --git a/report/main.pdf b/report/main.pdf index 8f4de5d..d9f34e4 100644 Binary files a/report/main.pdf and b/report/main.pdf differ diff --git a/report/main.tex b/report/main.tex index 78f0415..daaa092 100644 --- a/report/main.tex +++ b/report/main.tex @@ -47,7 +47,13 @@ \subsection*{Section 1 - Instrumentation} - Report and comment the instrumentation of the code (e.g. number of files, number of functions, number of branches). + The script \textit{instrument.py} in the main directory of the project performs instrumentation to replace each + condition node in the Python files present benchmark suite with a call to \texttt{evaluate\_condition}, which will + preserve program behaviour but as a side effect will compute and store condition distance for each traversed branch. + + Table~\ref{tab:count1} summarizes the number of Python files, function definition (\textit{FunctionDef}) nodes, + and comparison nodes (\textit{Compare} nodes not in an \texttt{assert} or \texttt{return} statement) found by the + instrumentation script. \begin{table} [H] \centering @@ -66,12 +72,57 @@ \subsection*{Section 2: Fuzzer test generator} - Describe and comment the steps to generate test cases using Fuzzer (include any hyper parameter used during the process) + The script \textit{fuzzer.py} loads the instrumented benchmark suite and generates tests at random to maximize branch + coverage. + The implementation submitted with this report slightly improves on the specification required as it is + able to deal with an arbitrary number of function parameters, which must be type-hinted as either \texttt{str} or + \texttt{int}. The fuzzing process generates a pool of 1000 test case inputs according to the function signature, + using randomly generated integers $\in [-1000, 1000]$, and randomly generated string of length $\in [0, 10]$ with + ASCII characters with code $\in [32, 127]$. Note that test cases generated in the pool may not satisfy the + preconditions (i.e.\ the \texttt{assert} statements on the inputs) for the given function. + + 250 test cases are extracted from the pool following this procedure. With equal probabilities (each with $p=1/3$): + + \begin{itemize} + \item The extracted test case may be kept as is; + \item The extracted test case may be randomly mutated using the \textit{mutate} function. An argument will be + chosen at random, and if of type \texttt{str} a random position in the string will be replaced with a + random character. If the argument is of type \texttt{int}, a random value $\in [-10, 10]$ will be added to + the argument. If the resulting test case is not present in the pool, it will be added to the pool; + \item The extracted test case may be randomly combined with another randomly extracted test using the + \textit{crossover} function. The function will choose at random an argument, and if of type \texttt{int} it will + swap the values assigned to the two tests. If the argument is of type \texttt{str}, the strings from the two test + cases will be split in two substrings at random and they will be joined by combining the ``head'' substring from + one test case with the ``tail'' substring from the other. If the two resulting test cases are new, they will be + added to the pool. + \end{itemize} + + If the resulting test case (or test cases) satisfy the function precondition, and if their execution covers branches + that have not been covered by other test cases, they will be added to the test suite. The resulting test suite is + then saved as a \textit{unittest} file, comprising of one test class per function present in the benchmark test file. \subsection*{Section 3: Genetic Algorithm test generator} - Describe and comment the steps to generated test cases using Genetic Algorithm (include any hyper parameter used during the process) + The script \textit{genetic.py} loads the instrumented benchmark suite and generates tests using a genetic algorithm + to maximize branch coverage and minimize distance to condition boundary values. + + The genetic algorithm is implemented via the library \textit{deap} using the \textit{eaSimple} procedure. + The algorithm is initialized with 200 individuals extracted from a pool generated in the same way as the previous + section. The algorithm runs for 20 generations, and it implements the \textit{mate} and \textit{mutate} operators + using the \textit{crossover} and \textit{mutate} functions respectively as described in the previous section. + + The fitness function used returns a value of $\infty$ if the test case does not satisfy the function precondition, + a value of $1000000$ if the test case does not cover any new branches, + or the sum of normalized ($1 / (x + 1)$) sum of distances for branches that are not yet covered by other test cases. + A penalty of $2$ is summed to the fitness value for every branch that is already covered. The fitness function is + minimized by the genetic algorithm. + + The genetic algorithm is ran 10 times. At the end of each execution the best individuals (sorted by increasing + fitness) are selected if they cover at least one branch that has not been covered. This is the only point in the + procedure where the set of covered branches is updated\footnote{This differs from the reference implementation of + \texttt{sb\_cgi\_decode.py}, which performs the update directly in the fitness function}. + \subsection*{Section 4: Statistical comparison of test generators} @@ -112,18 +163,18 @@ \centering \begin{tabular}{lrrp{3.5cm}r} \toprule - \textbf{File} & \textbf{$E(\text{Fuzzer})$} & \textbf{$E(\text{Genetic})$} & \textbf{Cohen's $|d|$} & \textbf{Wilcoxon $p$} \\ + \textbf{File} & \textbf{$E(\text{Fuzzer})$} & \textbf{$E(\text{Genetic})$} & \textbf{Cohen's $|d|$} & \textbf{Wilcoxon $p$} \\ \midrule - check\_armstrong & 58.07 & 93.50 & 2.0757 \hfill Huge & 0.0020 \\ - railfence\_cipher & 88.41 & 87.44 & 0.8844 \hfill Very large & 0.1011 \\ - longest\_substring & 77.41 & 76.98 & 0.0771 \hfill Small & 0.7589 \\ - common\_divisor\_count & 76.17 & 72.76 & 0.7471 \hfill Large & 0.1258 \\ - zellers\_birthday & 68.09 & 71.75 & 1.4701 \hfill Huge & 0.0039 \\ - exponentiation & 69.44 & 67.14 & 0.3342 \hfill Medium & 0.7108 \\ - caesar\_cipher & 60.59 & 61.20 & 0.3549 \hfill Medium & 0.2955 \\ - gcd & 59.15 & 55.66 & 0.5016 \hfill Large & 0.1627 \\ - rabin\_karp & 27.90 & 47.55 & 2.3688 \hfill Huge & 0.0078 \\ - anagram\_check & 23.10 & 7.70 & $\infty$ \hfill Huge & 0.0020 \\ + check\_armstrong & 58.07 & 93.50 & 2.0757 \hfill Huge & 0.0020 \\ + railfence\_cipher & 88.41 & 87.44 & 0.8844 \hfill Very large & 0.1011 \\ + longest\_substring & 77.41 & 76.98 & 0.0771 \hfill Small & 0.7589 \\ + common\_divisor\_count & 76.17 & 72.76 & 0.7471 \hfill Large & 0.1258 \\ + zellers\_birthday & 68.09 & 71.75 & 1.4701 \hfill Huge & 0.0039 \\ + exponentiation & 69.44 & 67.14 & 0.3342 \hfill Medium & 0.7108 \\ + caesar\_cipher & 60.59 & 61.20 & 0.3549 \hfill Medium & 0.2955 \\ + gcd & 59.15 & 55.66 & 0.5016 \hfill Large & 0.1627 \\ + rabin\_karp & 27.90 & 47.55 & 2.3688 \hfill Huge & 0.0078 \\ + anagram\_check & 23.10 & 7.70 & $\infty$ \hfill Huge & 0.0020 \\ \bottomrule \end{tabular} \caption{Statistical comparison between fuzzer and genetic algorithm test case generation in terms of mutation