report done explaination for task slowdown

2021-05-18 17:37:42 +02:00 · 2021-05-18 17:37:42 +02:00 · 18ce409cde
commit 18ce409cde
parent f4dff80388
4 changed files with 74 additions and 7 deletions
--- a/report/Claudio_Maggioni_report.pdf
+++ b/report/Claudio_Maggioni_report.pdf
--- a/report/Claudio_Maggioni_report.tex
+++ b/report/Claudio_Maggioni_report.tex
@ -130,7 +130,7 @@ following values:
 Figure~\ref{fig:eventTypes} shows the expected transitions between event
 types.
-\begin{figure}
+\begin{figure}[h]
 \centering
 	\resizebox{\textwidth}{!}{%
 		\includegraphics{./figures/event_types.png}}
@ -311,17 +311,83 @@ and performing the desired computation on the obtained chronological event log.
 Sometimes intermediate results are saved in Spark's parquet format in order to
 compute and save intermediate results beforehand.
-\hypertarget{general-query-script-design}{%
+\subsection{Query script design}
 \subsection{General Query script
 design}\label{general-query-script-design}}
-\begin{figure}
+In this section we aim to show the general complexity behind the implementations
 of query scripts by explaining in detail some sampled scripts to better
 appreciate their behaviour.
 \subsubsection{The ``task slowdown'' query script}
 One example of analysis script with average complexity and a pretty
 straightforward structure is the pair of scripts \texttt{task\_slowdown.py} and
 \texttt{task\_slowdown\_table.py} used to compute the ``task slowdown'' tables
 (namely the tables in figure~\ref{fig:taskslowdown}).
 ``Slowdown'' is a task-wise measure of wasted execution time for tasks with a
 \texttt{FINISH} termination type. It is computed as the total execution time of
 the task divided by the execution time actually needed to complete the task
 (i.e. the total time of the last execution attempt, successful by definition).
 The analysis requires to compute the mean task slowdown for each task priority
 value, and additionally compute the percentage of tasks with successful
 terminations per priority. The query therefore needs to compute the execution
 time of each execution attempt for each task, determine if each task has
 successful termination or not, and finally combine this data to compute
 slowdown, mean slowdown and ultimately the final table found in
 figure~\ref{fig:taskslowdown}.
 \begin{figure}[h]
 \centering
 \includegraphics[width=.75\textwidth]{figures/task_slowdown_query.png}
-\caption{Diagram of the query scripts used for the ``task slowdown'' query}
+\caption{Diagram of the script used for the ``task slowdown''
  query.}\label{fig:taskslowdownquery}
 \end{figure}
-\textbf{TBD}
+Figure~\ref{fig:taskslowdownquery} shows a schematic representation of the query
 structure.
 The query first starts reading the \texttt{instance\_events} table, which
 contains (among other data) all task event logs containing properties, event
 types and timestamps. As already explained in the previous section, the logical
 table file is actually stored as several Gzip-compressed JSONL shards. This is
 very useful for processing purposes, since Spark is able to parse and load in
 memory each shard in parallel, i.e. using all processing cores on the server
 used to run the queries.
 After loading the data, a selection and a projection operation are performed in
 the preparation phase so as to ``clean up'' the records and fields that are not
 needed, leaving only useful information to feed in the ``group by'' phase. In
 this query, the selection phase removes all records that do not represent task
 events or that contain an unknown task ID or a null event timestamp. In the 2019
 traces it is quite common to find incomplete records, since the log process is
 unable to capture the sheer amount of events generated by all jobs in a exact
 and deterministic fashion.
 Then, after the preparation stage is complete, the task event records are
 grouped in several bins, one per task ID\@. Performing this operation the
 collection of unsorted task event types is rearranged to form groups of task
 events all relating to a single task.
 These obtained collections of task events are then sorted by timestamp and
 processed to compute intermediate data relating to execution attempt times and
 task termination counts. After the task events are sorted, the script iterates
 over the events in chronological order, storing each execution attempt time and
 registering all execution termination types by checking the event type field.
 The task termination is then equal to the last execution termination type,
 following the definition originally given in the 2015 Ros\'a et al. DSN paper.
 If the task termination is determined to be unsuccessful, the tally counter of
 task terminations for the matching task property is increased. Otherwise, all
 the task termination attempt time deltas are returned. Tallies and time deltas
 are saved in an intermediate time file for fine-grained processing.
 Finally, the \texttt{task\_slowdown\_table.py} processes this intermediate
 results to compute the percentage of successful tasks per execution and
 computing slowdown values given the previously computed execution attempt time
 deltas. Finally, the mean of the computed slowdown values is computed resulting
 in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
 \hypertarget{ad-hoc-presentation-of-some-analysis-scripts}{%
 \subsection{Ad-Hoc presentation of some analysis
@ -599,3 +665,4 @@ developments}\label{conclusions-and-future-work-or-possible-developments}}
 \textbf{TBD}
 \end{document}
 % vim: set ts=2 sw=2 et tw=80:
--- a/report/figures/task_slowdown_query.odg
+++ b/report/figures/task_slowdown_query.odg
--- a/report/figures/task_slowdown_query.png
+++ b/report/figures/task_slowdown_query.png