diff --git a/report/Claudio_Maggioni_report.pdf b/report/Claudio_Maggioni_report.pdf index d201f6fc..9c5e626d 100644 Binary files a/report/Claudio_Maggioni_report.pdf and b/report/Claudio_Maggioni_report.pdf differ diff --git a/report/Claudio_Maggioni_report.tex b/report/Claudio_Maggioni_report.tex index 9d86ba90..9a20d546 100644 --- a/report/Claudio_Maggioni_report.tex +++ b/report/Claudio_Maggioni_report.tex @@ -130,7 +130,7 @@ following values: Figure~\ref{fig:eventTypes} shows the expected transitions between event types. -\begin{figure} +\begin{figure}[h] \centering \resizebox{\textwidth}{!}{% \includegraphics{./figures/event_types.png}} @@ -311,17 +311,83 @@ and performing the desired computation on the obtained chronological event log. Sometimes intermediate results are saved in Spark's parquet format in order to compute and save intermediate results beforehand. -\hypertarget{general-query-script-design}{% -\subsection{General Query script -design}\label{general-query-script-design}} +\subsection{Query script design} -\begin{figure} +In this section we aim to show the general complexity behind the implementations +of query scripts by explaining in detail some sampled scripts to better +appreciate their behaviour. + +\subsubsection{The ``task slowdown'' query script} + +One example of analysis script with average complexity and a pretty +straightforward structure is the pair of scripts \texttt{task\_slowdown.py} and +\texttt{task\_slowdown\_table.py} used to compute the ``task slowdown'' tables +(namely the tables in figure~\ref{fig:taskslowdown}). + +``Slowdown'' is a task-wise measure of wasted execution time for tasks with a +\texttt{FINISH} termination type. It is computed as the total execution time of +the task divided by the execution time actually needed to complete the task +(i.e. the total time of the last execution attempt, successful by definition). + +The analysis requires to compute the mean task slowdown for each task priority +value, and additionally compute the percentage of tasks with successful +terminations per priority. The query therefore needs to compute the execution +time of each execution attempt for each task, determine if each task has +successful termination or not, and finally combine this data to compute +slowdown, mean slowdown and ultimately the final table found in +figure~\ref{fig:taskslowdown}. + +\begin{figure}[h] \centering \includegraphics[width=.75\textwidth]{figures/task_slowdown_query.png} -\caption{Diagram of the query scripts used for the ``task slowdown'' query} +\caption{Diagram of the script used for the ``task slowdown'' + query.}\label{fig:taskslowdownquery} \end{figure} -\textbf{TBD} +Figure~\ref{fig:taskslowdownquery} shows a schematic representation of the query +structure. + +The query first starts reading the \texttt{instance\_events} table, which +contains (among other data) all task event logs containing properties, event +types and timestamps. As already explained in the previous section, the logical +table file is actually stored as several Gzip-compressed JSONL shards. This is +very useful for processing purposes, since Spark is able to parse and load in +memory each shard in parallel, i.e. using all processing cores on the server +used to run the queries. + +After loading the data, a selection and a projection operation are performed in +the preparation phase so as to ``clean up'' the records and fields that are not +needed, leaving only useful information to feed in the ``group by'' phase. In +this query, the selection phase removes all records that do not represent task +events or that contain an unknown task ID or a null event timestamp. In the 2019 +traces it is quite common to find incomplete records, since the log process is +unable to capture the sheer amount of events generated by all jobs in a exact +and deterministic fashion. + +Then, after the preparation stage is complete, the task event records are +grouped in several bins, one per task ID\@. Performing this operation the +collection of unsorted task event types is rearranged to form groups of task +events all relating to a single task. + +These obtained collections of task events are then sorted by timestamp and +processed to compute intermediate data relating to execution attempt times and +task termination counts. After the task events are sorted, the script iterates +over the events in chronological order, storing each execution attempt time and +registering all execution termination types by checking the event type field. +The task termination is then equal to the last execution termination type, +following the definition originally given in the 2015 Ros\'a et al. DSN paper. + +If the task termination is determined to be unsuccessful, the tally counter of +task terminations for the matching task property is increased. Otherwise, all +the task termination attempt time deltas are returned. Tallies and time deltas +are saved in an intermediate time file for fine-grained processing. + +Finally, the \texttt{task\_slowdown\_table.py} processes this intermediate +results to compute the percentage of successful tasks per execution and +computing slowdown values given the previously computed execution attempt time +deltas. Finally, the mean of the computed slowdown values is computed resulting +in the clear and coincise tables found in figure~\ref{fig:taskslowdown}. + \hypertarget{ad-hoc-presentation-of-some-analysis-scripts}{% \subsection{Ad-Hoc presentation of some analysis @@ -599,3 +665,4 @@ developments}\label{conclusions-and-future-work-or-possible-developments}} \textbf{TBD} \end{document} +% vim: set ts=2 sw=2 et tw=80: diff --git a/report/figures/task_slowdown_query.odg b/report/figures/task_slowdown_query.odg index 22eaae57..fdd5b376 100644 Binary files a/report/figures/task_slowdown_query.odg and b/report/figures/task_slowdown_query.odg differ diff --git a/report/figures/task_slowdown_query.png b/report/figures/task_slowdown_query.png index 3d877a27..5b1694f4 100644 Binary files a/report/figures/task_slowdown_query.png and b/report/figures/task_slowdown_query.png differ