report done explaination for task slowdown

2021-05-18 17:37:42 +02:00 · 2021-05-18 17:37:42 +02:00 · f5eb1f30dd
parent 4d3b711ce0
commit f5eb1f30dd
4 changed files with 74 additions and 7 deletions
--- a/report/Claudio_Maggioni_report.pdf
+++ b/report/Claudio_Maggioni_report.pdf
--- a/report/Claudio_Maggioni_report.tex
+++ b/report/Claudio_Maggioni_report.tex
@ -130,7 +130,7 @@ following values:
 Figure~\ref{fig:eventTypes} shows the expected transitions between event
 types.

-\begin{figure}
+\begin{figure}[h]
 \centering
 	\resizebox{\textwidth}{!}{%
 		\includegraphics{./figures/event_types.png}}
@ -311,17 +311,83 @@ and performing the desired computation on the obtained chronological event log.
 Sometimes intermediate results are saved in Spark's parquet format in order to
 compute and save intermediate results beforehand.

-\hypertarget{general-query-script-design}{%
-\subsection{General Query script
-design}\label{general-query-script-design}}
+\subsection{Query script design}

-\begin{figure}
+In this section we aim to show the general complexity behind the implementations
+of query scripts by explaining in detail some sampled scripts to better
+appreciate their behaviour.
+
+\subsubsection{The ``task slowdown'' query script}
+
+One example of analysis script with average complexity and a pretty
+straightforward structure is the pair of scripts \texttt{task\_slowdown.py} and
+\texttt{task\_slowdown\_table.py} used to compute the ``task slowdown'' tables
+(namely the tables in figure~\ref{fig:taskslowdown}).
+
+``Slowdown'' is a task-wise measure of wasted execution time for tasks with a
+\texttt{FINISH} termination type. It is computed as the total execution time of
+the task divided by the execution time actually needed to complete the task
+(i.e. the total time of the last execution attempt, successful by definition).
+
+The analysis requires to compute the mean task slowdown for each task priority
+value, and additionally compute the percentage of tasks with successful
+terminations per priority. The query therefore needs to compute the execution
+time of each execution attempt for each task, determine if each task has
+successful termination or not, and finally combine this data to compute
+slowdown, mean slowdown and ultimately the final table found in
+figure~\ref{fig:taskslowdown}.
+
+\begin{figure}[h]
 \centering
 \includegraphics[width=.75\textwidth]{figures/task_slowdown_query.png}
-\caption{Diagram of the query scripts used for the ``task slowdown'' query}
+\caption{Diagram of the script used for the ``task slowdown''
+  query.}\label{fig:taskslowdownquery}
 \end{figure}

-\textbf{TBD}
+Figure~\ref{fig:taskslowdownquery} shows a schematic representation of the query
+structure.
+
+The query first starts reading the \texttt{instance\_events} table, which
+contains (among other data) all task event logs containing properties, event
+types and timestamps. As already explained in the previous section, the logical
+table file is actually stored as several Gzip-compressed JSONL shards. This is
+very useful for processing purposes, since Spark is able to parse and load in
+memory each shard in parallel, i.e. using all processing cores on the server
+used to run the queries.
+
+After loading the data, a selection and a projection operation are performed in
+the preparation phase so as to ``clean up'' the records and fields that are not
+needed, leaving only useful information to feed in the ``group by'' phase. In
+this query, the selection phase removes all records that do not represent task
+events or that contain an unknown task ID or a null event timestamp. In the 2019
+traces it is quite common to find incomplete records, since the log process is
+unable to capture the sheer amount of events generated by all jobs in a exact
+and deterministic fashion.
+
+Then, after the preparation stage is complete, the task event records are
+grouped in several bins, one per task ID\@. Performing this operation the
+collection of unsorted task event types is rearranged to form groups of task
+events all relating to a single task.
+
+These obtained collections of task events are then sorted by timestamp and
+processed to compute intermediate data relating to execution attempt times and
+task termination counts. After the task events are sorted, the script iterates
+over the events in chronological order, storing each execution attempt time and
+registering all execution termination types by checking the event type field.
+The task termination is then equal to the last execution termination type,
+following the definition originally given in the 2015 Ros\'a et al. DSN paper.
+
+If the task termination is determined to be unsuccessful, the tally counter of
+task terminations for the matching task property is increased. Otherwise, all
+the task termination attempt time deltas are returned. Tallies and time deltas
+are saved in an intermediate time file for fine-grained processing.
+
+Finally, the \texttt{task\_slowdown\_table.py} processes this intermediate
+results to compute the percentage of successful tasks per execution and
+computing slowdown values given the previously computed execution attempt time
+deltas. Finally, the mean of the computed slowdown values is computed resulting
+in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
+

 \hypertarget{ad-hoc-presentation-of-some-analysis-scripts}{%
 \subsection{Ad-Hoc presentation of some analysis
@ -599,3 +665,4 @@ developments}\label{conclusions-and-future-work-or-possible-developments}}
 \textbf{TBD}

 \end{document}
+% vim: set ts=2 sw=2 et tw=80:
--- a/report/figures/task_slowdown_query.odg
+++ b/report/figures/task_slowdown_query.odg
--- a/report/figures/task_slowdown_query.png
+++ b/report/figures/task_slowdown_query.png