report done explaination for task slowdown
This commit is contained in:
parent
f4dff80388
commit
18ce409cde
4 changed files with 74 additions and 7 deletions
Binary file not shown.
|
@ -130,7 +130,7 @@ following values:
|
||||||
Figure~\ref{fig:eventTypes} shows the expected transitions between event
|
Figure~\ref{fig:eventTypes} shows the expected transitions between event
|
||||||
types.
|
types.
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
\resizebox{\textwidth}{!}{%
|
\resizebox{\textwidth}{!}{%
|
||||||
\includegraphics{./figures/event_types.png}}
|
\includegraphics{./figures/event_types.png}}
|
||||||
|
@ -311,17 +311,83 @@ and performing the desired computation on the obtained chronological event log.
|
||||||
Sometimes intermediate results are saved in Spark's parquet format in order to
|
Sometimes intermediate results are saved in Spark's parquet format in order to
|
||||||
compute and save intermediate results beforehand.
|
compute and save intermediate results beforehand.
|
||||||
|
|
||||||
\hypertarget{general-query-script-design}{%
|
\subsection{Query script design}
|
||||||
\subsection{General Query script
|
|
||||||
design}\label{general-query-script-design}}
|
|
||||||
|
|
||||||
\begin{figure}
|
In this section we aim to show the general complexity behind the implementations
|
||||||
|
of query scripts by explaining in detail some sampled scripts to better
|
||||||
|
appreciate their behaviour.
|
||||||
|
|
||||||
|
\subsubsection{The ``task slowdown'' query script}
|
||||||
|
|
||||||
|
One example of analysis script with average complexity and a pretty
|
||||||
|
straightforward structure is the pair of scripts \texttt{task\_slowdown.py} and
|
||||||
|
\texttt{task\_slowdown\_table.py} used to compute the ``task slowdown'' tables
|
||||||
|
(namely the tables in figure~\ref{fig:taskslowdown}).
|
||||||
|
|
||||||
|
``Slowdown'' is a task-wise measure of wasted execution time for tasks with a
|
||||||
|
\texttt{FINISH} termination type. It is computed as the total execution time of
|
||||||
|
the task divided by the execution time actually needed to complete the task
|
||||||
|
(i.e. the total time of the last execution attempt, successful by definition).
|
||||||
|
|
||||||
|
The analysis requires to compute the mean task slowdown for each task priority
|
||||||
|
value, and additionally compute the percentage of tasks with successful
|
||||||
|
terminations per priority. The query therefore needs to compute the execution
|
||||||
|
time of each execution attempt for each task, determine if each task has
|
||||||
|
successful termination or not, and finally combine this data to compute
|
||||||
|
slowdown, mean slowdown and ultimately the final table found in
|
||||||
|
figure~\ref{fig:taskslowdown}.
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=.75\textwidth]{figures/task_slowdown_query.png}
|
\includegraphics[width=.75\textwidth]{figures/task_slowdown_query.png}
|
||||||
\caption{Diagram of the query scripts used for the ``task slowdown'' query}
|
\caption{Diagram of the script used for the ``task slowdown''
|
||||||
|
query.}\label{fig:taskslowdownquery}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\textbf{TBD}
|
Figure~\ref{fig:taskslowdownquery} shows a schematic representation of the query
|
||||||
|
structure.
|
||||||
|
|
||||||
|
The query first starts reading the \texttt{instance\_events} table, which
|
||||||
|
contains (among other data) all task event logs containing properties, event
|
||||||
|
types and timestamps. As already explained in the previous section, the logical
|
||||||
|
table file is actually stored as several Gzip-compressed JSONL shards. This is
|
||||||
|
very useful for processing purposes, since Spark is able to parse and load in
|
||||||
|
memory each shard in parallel, i.e. using all processing cores on the server
|
||||||
|
used to run the queries.
|
||||||
|
|
||||||
|
After loading the data, a selection and a projection operation are performed in
|
||||||
|
the preparation phase so as to ``clean up'' the records and fields that are not
|
||||||
|
needed, leaving only useful information to feed in the ``group by'' phase. In
|
||||||
|
this query, the selection phase removes all records that do not represent task
|
||||||
|
events or that contain an unknown task ID or a null event timestamp. In the 2019
|
||||||
|
traces it is quite common to find incomplete records, since the log process is
|
||||||
|
unable to capture the sheer amount of events generated by all jobs in a exact
|
||||||
|
and deterministic fashion.
|
||||||
|
|
||||||
|
Then, after the preparation stage is complete, the task event records are
|
||||||
|
grouped in several bins, one per task ID\@. Performing this operation the
|
||||||
|
collection of unsorted task event types is rearranged to form groups of task
|
||||||
|
events all relating to a single task.
|
||||||
|
|
||||||
|
These obtained collections of task events are then sorted by timestamp and
|
||||||
|
processed to compute intermediate data relating to execution attempt times and
|
||||||
|
task termination counts. After the task events are sorted, the script iterates
|
||||||
|
over the events in chronological order, storing each execution attempt time and
|
||||||
|
registering all execution termination types by checking the event type field.
|
||||||
|
The task termination is then equal to the last execution termination type,
|
||||||
|
following the definition originally given in the 2015 Ros\'a et al. DSN paper.
|
||||||
|
|
||||||
|
If the task termination is determined to be unsuccessful, the tally counter of
|
||||||
|
task terminations for the matching task property is increased. Otherwise, all
|
||||||
|
the task termination attempt time deltas are returned. Tallies and time deltas
|
||||||
|
are saved in an intermediate time file for fine-grained processing.
|
||||||
|
|
||||||
|
Finally, the \texttt{task\_slowdown\_table.py} processes this intermediate
|
||||||
|
results to compute the percentage of successful tasks per execution and
|
||||||
|
computing slowdown values given the previously computed execution attempt time
|
||||||
|
deltas. Finally, the mean of the computed slowdown values is computed resulting
|
||||||
|
in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
|
||||||
|
|
||||||
|
|
||||||
\hypertarget{ad-hoc-presentation-of-some-analysis-scripts}{%
|
\hypertarget{ad-hoc-presentation-of-some-analysis-scripts}{%
|
||||||
\subsection{Ad-Hoc presentation of some analysis
|
\subsection{Ad-Hoc presentation of some analysis
|
||||||
|
@ -599,3 +665,4 @@ developments}\label{conclusions-and-future-work-or-possible-developments}}
|
||||||
\textbf{TBD}
|
\textbf{TBD}
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
|
% vim: set ts=2 sw=2 et tw=80:
|
||||||
|
|
Binary file not shown.
Binary file not shown.
Before Width: | Height: | Size: 431 KiB After Width: | Height: | Size: 487 KiB |
Loading…
Reference in a new issue