report done explaination for task slowdown
This commit is contained in:
parent
f4dff80388
commit
18ce409cde
4 changed files with 74 additions and 7 deletions
Binary file not shown.
|
@ -130,7 +130,7 @@ following values:
|
|||
Figure~\ref{fig:eventTypes} shows the expected transitions between event
|
||||
types.
|
||||
|
||||
\begin{figure}
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\resizebox{\textwidth}{!}{%
|
||||
\includegraphics{./figures/event_types.png}}
|
||||
|
@ -311,17 +311,83 @@ and performing the desired computation on the obtained chronological event log.
|
|||
Sometimes intermediate results are saved in Spark's parquet format in order to
|
||||
compute and save intermediate results beforehand.
|
||||
|
||||
\hypertarget{general-query-script-design}{%
|
||||
\subsection{General Query script
|
||||
design}\label{general-query-script-design}}
|
||||
\subsection{Query script design}
|
||||
|
||||
\begin{figure}
|
||||
In this section we aim to show the general complexity behind the implementations
|
||||
of query scripts by explaining in detail some sampled scripts to better
|
||||
appreciate their behaviour.
|
||||
|
||||
\subsubsection{The ``task slowdown'' query script}
|
||||
|
||||
One example of analysis script with average complexity and a pretty
|
||||
straightforward structure is the pair of scripts \texttt{task\_slowdown.py} and
|
||||
\texttt{task\_slowdown\_table.py} used to compute the ``task slowdown'' tables
|
||||
(namely the tables in figure~\ref{fig:taskslowdown}).
|
||||
|
||||
``Slowdown'' is a task-wise measure of wasted execution time for tasks with a
|
||||
\texttt{FINISH} termination type. It is computed as the total execution time of
|
||||
the task divided by the execution time actually needed to complete the task
|
||||
(i.e. the total time of the last execution attempt, successful by definition).
|
||||
|
||||
The analysis requires to compute the mean task slowdown for each task priority
|
||||
value, and additionally compute the percentage of tasks with successful
|
||||
terminations per priority. The query therefore needs to compute the execution
|
||||
time of each execution attempt for each task, determine if each task has
|
||||
successful termination or not, and finally combine this data to compute
|
||||
slowdown, mean slowdown and ultimately the final table found in
|
||||
figure~\ref{fig:taskslowdown}.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\includegraphics[width=.75\textwidth]{figures/task_slowdown_query.png}
|
||||
\caption{Diagram of the query scripts used for the ``task slowdown'' query}
|
||||
\caption{Diagram of the script used for the ``task slowdown''
|
||||
query.}\label{fig:taskslowdownquery}
|
||||
\end{figure}
|
||||
|
||||
\textbf{TBD}
|
||||
Figure~\ref{fig:taskslowdownquery} shows a schematic representation of the query
|
||||
structure.
|
||||
|
||||
The query first starts reading the \texttt{instance\_events} table, which
|
||||
contains (among other data) all task event logs containing properties, event
|
||||
types and timestamps. As already explained in the previous section, the logical
|
||||
table file is actually stored as several Gzip-compressed JSONL shards. This is
|
||||
very useful for processing purposes, since Spark is able to parse and load in
|
||||
memory each shard in parallel, i.e. using all processing cores on the server
|
||||
used to run the queries.
|
||||
|
||||
After loading the data, a selection and a projection operation are performed in
|
||||
the preparation phase so as to ``clean up'' the records and fields that are not
|
||||
needed, leaving only useful information to feed in the ``group by'' phase. In
|
||||
this query, the selection phase removes all records that do not represent task
|
||||
events or that contain an unknown task ID or a null event timestamp. In the 2019
|
||||
traces it is quite common to find incomplete records, since the log process is
|
||||
unable to capture the sheer amount of events generated by all jobs in a exact
|
||||
and deterministic fashion.
|
||||
|
||||
Then, after the preparation stage is complete, the task event records are
|
||||
grouped in several bins, one per task ID\@. Performing this operation the
|
||||
collection of unsorted task event types is rearranged to form groups of task
|
||||
events all relating to a single task.
|
||||
|
||||
These obtained collections of task events are then sorted by timestamp and
|
||||
processed to compute intermediate data relating to execution attempt times and
|
||||
task termination counts. After the task events are sorted, the script iterates
|
||||
over the events in chronological order, storing each execution attempt time and
|
||||
registering all execution termination types by checking the event type field.
|
||||
The task termination is then equal to the last execution termination type,
|
||||
following the definition originally given in the 2015 Ros\'a et al. DSN paper.
|
||||
|
||||
If the task termination is determined to be unsuccessful, the tally counter of
|
||||
task terminations for the matching task property is increased. Otherwise, all
|
||||
the task termination attempt time deltas are returned. Tallies and time deltas
|
||||
are saved in an intermediate time file for fine-grained processing.
|
||||
|
||||
Finally, the \texttt{task\_slowdown\_table.py} processes this intermediate
|
||||
results to compute the percentage of successful tasks per execution and
|
||||
computing slowdown values given the previously computed execution attempt time
|
||||
deltas. Finally, the mean of the computed slowdown values is computed resulting
|
||||
in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
|
||||
|
||||
|
||||
\hypertarget{ad-hoc-presentation-of-some-analysis-scripts}{%
|
||||
\subsection{Ad-Hoc presentation of some analysis
|
||||
|
@ -599,3 +665,4 @@ developments}\label{conclusions-and-future-work-or-possible-developments}}
|
|||
\textbf{TBD}
|
||||
|
||||
\end{document}
|
||||
% vim: set ts=2 sw=2 et tw=80:
|
||||
|
|
Binary file not shown.
Binary file not shown.
Before Width: | Height: | Size: 431 KiB After Width: | Height: | Size: 487 KiB |
Loading…
Reference in a new issue