report work
This commit is contained in:
parent
cfd02f26f3
commit
53ce702157
4 changed files with 138 additions and 118 deletions
Binary file not shown.
|
@ -620,13 +620,12 @@ This section aims to use some of the tecniques used in section IV of
|
|||
the Ros\'a et al.\ paper\cite{dsn-paper} to find patterns and interpendencies
|
||||
between task and job events by gathering event statistics at those events.
|
||||
|
||||
\subsection{Unsuccessful Task Event Patterns}
|
||||
\input{figures/table_iii} % has table III and table IV in it
|
||||
\subsection{Unsuccessful Task Event Patterns}\label{tabIII-section}
|
||||
\input{figures/table_iii}
|
||||
|
||||
In this analysis we compute the distribution of termination events by type at
|
||||
the task-level events and the conditional probability of a task succesfully
|
||||
terminating given a number of \texttt{EVICT}, \texttt{FAIL} and \texttt{FINISH}
|
||||
termination events during the task execution.
|
||||
the task-level events, namely \texttt{EVICT}, \texttt{FAIL}, \texttt{FINISH}
|
||||
and \texttt{KILL} termination events.
|
||||
|
||||
A comparison of the termination event distribution between the 2011 and 2019
|
||||
traces is shown in figure~\ref{fig:tableIII}. Additionally, a cluster-by-cluster
|
||||
|
@ -688,38 +687,54 @@ corresponding task to terminate in an unsuccessful way: a task with no
|
|||
\texttt{KILL} events have 0.02\%, 0.20\%, 0.44\%, 0.04\%, and
|
||||
0.07\% probabilities of success respectively. The same effect can be observed,
|
||||
albeit in a less drastic fashion, for the \texttt{EVICT} and \texttt{FAIL}
|
||||
curves. The \texttt{EVICT} curve has for 0 to 5
|
||||
curves. The \texttt{EVICT} curve has for tasks with 0 to 5 kill events 19.70\%,
|
||||
15.94\%, 1.94\%, 1.67\%, 0.35\% and 0.00\% success probabilities repectively.
|
||||
The \texttt{FAIL} probability curve has instead 18.55\%, 1.79\%, 14.49\%,
|
||||
2.08\%, 2.40\%, and 1.29\% success probabilities for the same range.
|
||||
|
||||
Refer to figure \ref{fig:figureV}.
|
||||
Considering cluster-to-cluster behaviour in the 2019 traces (as shown in
|
||||
figure~\ref{fig:figureV-csts}), some clusters show quite similar behaviour to
|
||||
the aggregated plot (namely clusters A, F, and H), while some other clusters
|
||||
show very oscillating probability distribution function curves for
|
||||
\texttt{EVICT} and \texttt{FINISH} curves. \texttt{KILL} behaviour is instead
|
||||
homogeneous even on a single cluster basis.
|
||||
|
||||
\textbf{Observations}:
|
||||
|
||||
\begin{itemize}
|
||||
\item
|
||||
Behaviour is very different from cluster to cluster
|
||||
\item
|
||||
There is no easy conclusion, unlike in 2011, on the correlation
|
||||
between succesful probability and \# of events of a specific type.
|
||||
\item
|
||||
Clusters B, C and D in particular have very unsmooth lines that vary a
|
||||
lot for small \# evts differences. This may be due to an uneven
|
||||
distribution of \# evts in the traces.
|
||||
\end{itemize}
|
||||
\subsection{Unsuccessful Job Event Patterns}
|
||||
\input{figures/table_iv}
|
||||
|
||||
\textbf{Observations}:
|
||||
This analysis uses very similar techniques to the ones used in
|
||||
section~\ref{tabIII-section}, but focusing at the job level instead. The aim is
|
||||
to better understand the task-job level relationship and to understand how
|
||||
task-level termination events can influence the termination state of a job.
|
||||
|
||||
\begin{itemize}
|
||||
\item
|
||||
Again the mean number of tasks is significantly higher than the 2011
|
||||
traces, indicating a higher complexity of workloads
|
||||
\item
|
||||
Cluster A has no evicted jobs
|
||||
\item
|
||||
The number of events is however lower than the event means in the 2011
|
||||
traces
|
||||
\end{itemize}
|
||||
A comparison of the analyzed parameters between the 2011 and 2019
|
||||
traces is shown in figure~\ref{fig:tableIV}. Additionally, a cluster-by-cluster
|
||||
breakdown of the same data for the 2019 traces is shown in
|
||||
figure~\ref{fig:tableIV-csts}.
|
||||
|
||||
Considering the distribution of number of tasks in a job, the 2019 traces show a
|
||||
decrease for the mean figure (e.g. for \texttt{FAIL}ed jobs, with a mean 60.5
|
||||
tasks per job in 2011 and a mean 43.126 tasks per job in 2019) and a fluctuation
|
||||
of the 95-th percentile figure (e.g. for \texttt{FAIL}ed jobs it rose from 110
|
||||
to 200, but for \texttt{KILL}ed job the figure decreased from 400 to 178).
|
||||
|
||||
Considering the distribution of the number of task-wise termination events
|
||||
instead, the 2019 traces show values generally one or two orders of magnitude
|
||||
below the ones in 2011. While the behaviour of \texttt{EVICT}ed jobs stays the
|
||||
same, \texttt{FAIL}ed and \texttt{KILL}ed jobs show a dramatic difference in
|
||||
the event distribution, with \texttt{KILL} becoming the most popular event
|
||||
task-wise with mean 12.833 and 11.337 task events per job respectively. Finally,
|
||||
the \texttt{FINISH}ed job category has a new event distribution too, with
|
||||
\texttt{FINISH} task events being the most popular at 1.778 events per job in
|
||||
the 2019 traces.
|
||||
|
||||
The cluster-by-cluster comparison in figure~\ref{fig:tableIV-csts} shows that
|
||||
the number of tasks per job are generally distributed similarly to the
|
||||
aggregated data, with only cluster H having remarkably low mean and 95-th
|
||||
percentiles overall. Event-wise, for \texttt{EVICT}ed, \texttt{FINISH}ed,
|
||||
and \texttt{KILL}ed jobs again the distributions are similar to the aggregated
|
||||
one. For some clusters (namely B, C, and D), the mean number of \texttt{FAIL} and
|
||||
\texttt{KILL} task events for \texttt{FINISH}ed jobs is almost the same.
|
||||
|
||||
\section{Analysis: Potential Causes of Unsuccessful Executions}
|
||||
|
||||
|
|
|
@ -121,90 +121,3 @@ overall mean accompanied by the 95-th percentile of all termination
|
|||
events, followed by a mean of events per event type of each
|
||||
termination event.}\label{fig:tableIII-csts}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[p]
|
||||
\begin{subfigure}{\textwidth}
|
||||
\centering
|
||||
\begin{tabular}{lrrrrr}
|
||||
\toprule
|
||||
\tableIVh%
|
||||
\midrule
|
||||
EVICT & 0.989 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 43.126 (200) & 0.114 & 2.300 & 0.981 & 12.833 \\
|
||||
FINISH & 3.074 (2) & 0.005 & 0.153 & 1.778 & 0.014 \\
|
||||
KILL & 53.919 (178) & 0.235 & 0.103 & 0.288 & 11.337 \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\caption{2011 data}
|
||||
\vspace{0.5cm}
|
||||
\end{subfigure}
|
||||
\begin{subfigure}{\textwidth}
|
||||
\centering
|
||||
\begin{tabular}{lrrrrr}
|
||||
\toprule
|
||||
\tableIVh%
|
||||
\midrule
|
||||
EVICT & 0.989 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 43.126 (200) & 0.114 & 2.300 & 0.981 & 12.833 \\
|
||||
FINISH & 3.074 (2) & 0.005 & 0.153 & 1.778 & 0.014 \\
|
||||
KILL & 53.919 (178) & 0.235 & 0.103 & 0.288 & 11.337 \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\caption{2019 data}
|
||||
\end{subfigure}
|
||||
\caption{tbd}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[p]
|
||||
\tableIV{A}{
|
||||
EVICT & -- & -- & -- & -- & -- \\
|
||||
FAIL & 90.793 (499) & 0.695 & 0.684 & 0.086 & 1.850 \\
|
||||
FINISH & 1.187 (1) & 0.005 & 0.001 & 1.073 & 0.024 \\
|
||||
KILL & 16.533 (10) & 1.045 & 0.074 & 0.461 & 1.189 \\
|
||||
}
|
||||
\tableIV{B}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 74.368 (374) & 2.003 & 1.994 & 0.267 & 4.944 \\
|
||||
FINISH & 6.304 (10) & 0.022 & 0.008 & 2.349 & 0.013 \\
|
||||
KILL & 69.853 (234) & 1.696 & 0.158 & 0.614 & 3.009 \\
|
||||
}
|
||||
\tableIV{C}{
|
||||
EVICT & 1.000 (1) & 1.001 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 41.982 (200) & 3.484 & 0.998 & 0.376 & 3.998 \\
|
||||
FINISH & 1.991 (1) & 0.022 & 0.017 & 1.565 & 0.017 \\
|
||||
KILL & 110.681 (652) & 0.627 & 0.059 & 0.656 & 2.267 \\
|
||||
}
|
||||
\tableIV{D}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 43.356 (250) & 6.112 & 0.949 & 0.531 & 6.498 \\
|
||||
FINISH & 2.109 (2) & 0.268 & 0.013 & 1.723 & 0.019 \\
|
||||
KILL & 89.648 (283) & 1.013 & 0.054 & 0.283 & 3.256 \\
|
||||
}
|
||||
\tableIV{E}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 23.081 (25) & 0.247 & 0.666 & 0.717 & 1.588 \\
|
||||
FINISH & 7.776 (2) & 0.019 & 0.029 & 1.934 & 0.021 \\
|
||||
KILL & 88.790 (309) & 0.706 & 0.029 & 0.461 & 7.572 \\
|
||||
}
|
||||
\tableIV{F}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 17.161 (8) & 0.621 & 0.546 & 0.426 & 7.559 \\
|
||||
FINISH & 2.941 (2) & 0.015 & 0.051 & 1.670 & 0.162 \\
|
||||
KILL & 103.889 (361) & 0.183 & 0.064 & 0.417 & 5.824 \\
|
||||
}
|
||||
\tableIV{G}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 51.835 (250) & 0.556 & 3.335 & 0.608 & 20.352 \\
|
||||
FINISH & 8.519 (36) & 0.002 & 0.630 & 1.760 & 0.005 \\
|
||||
KILL & 37.055 (100) & 5.687 & 0.065 & 0.080 & 19.166 \\
|
||||
}
|
||||
\tableIV{H}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 20.504 (1) & 0.114 & 2.300 & 0.981 & 12.833 \\
|
||||
FINISH & 4.278 (14) & 0.005 & 0.153 & 1.778 & 0.014 \\
|
||||
KILL & 11.023 (3) & 0.235 & 0.103 & 0.288 & 11.337 \\
|
||||
}
|
||||
\caption{tbd}
|
||||
\end{figure}
|
||||
|
||||
|
||||
|
|
92
report/figures/table_iv.tex
Normal file
92
report/figures/table_iv.tex
Normal file
|
@ -0,0 +1,92 @@
|
|||
\begin{figure}[p]
|
||||
\begin{subfigure}{\textwidth}
|
||||
\centering
|
||||
\begin{tabular}{lrrrrr}
|
||||
\toprule
|
||||
\tableIVh%
|
||||
\midrule
|
||||
EVICT & 1 (1) & 1 & 0 & 0 & 0 \\
|
||||
FAIL & 60.5 (110) & $139.0$ & $788.5$ & $49.2$ & $9.5$ \\
|
||||
FINISH & 2.7 (1) & $0.4$ & $0.1$ & $5 \cdot 10^{-4}$ & $2.7$ \\
|
||||
KILL & 86.8 (400) & $13.3$ & $20.9$ & $26.9$ & $62.7$ \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\caption{2011 data}
|
||||
\vspace{0.5cm}
|
||||
\end{subfigure}
|
||||
\begin{subfigure}{\textwidth}
|
||||
\centering
|
||||
\begin{tabular}{lrrrrr}
|
||||
\toprule
|
||||
\tableIVh%
|
||||
\midrule
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 43.126 (200) & 0.114 & 2.300 & 0.981 & 12.833 \\
|
||||
FINISH & 3.074 (2) & 0.005 & 0.153 & 1.778 & 0.014 \\
|
||||
KILL & 53.919 (178) & 0.235 & 0.103 & 0.288 & 11.337 \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\caption{2019 data}
|
||||
\end{subfigure}
|
||||
\caption{Mean number of tasks and event distribution per job type for between
|
||||
2011 and 2019 (all clusters aggregated) traces. The tables show and
|
||||
mean and 95-th percentile for the number of tasks in a job, and
|
||||
additionally show the mean of job-wise total of task termination events.}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[p]
|
||||
\tableIV{A}{
|
||||
EVICT & -- & -- & -- & -- & -- \\
|
||||
FAIL & 90.793 (499) & 0.695 & 0.684 & 0.086 & 1.850 \\
|
||||
FINISH & 1.187 (1) & 0.005 & 0.001 & 1.073 & 0.024 \\
|
||||
KILL & 16.533 (10) & 1.045 & 0.074 & 0.461 & 1.189 \\
|
||||
}
|
||||
\tableIV{B}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 74.368 (374) & 2.003 & 1.994 & 0.267 & 4.944 \\
|
||||
FINISH & 6.304 (10) & 0.022 & 0.008 & 2.349 & 0.013 \\
|
||||
KILL & 69.853 (234) & 1.696 & 0.158 & 0.614 & 3.009 \\
|
||||
}
|
||||
\tableIV{C}{
|
||||
EVICT & 1.000 (1) & 1.001 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 41.982 (200) & 3.484 & 0.998 & 0.376 & 3.998 \\
|
||||
FINISH & 1.991 (1) & 0.022 & 0.017 & 1.565 & 0.017 \\
|
||||
KILL & 110.681 (652) & 0.627 & 0.059 & 0.656 & 2.267 \\
|
||||
}
|
||||
\tableIV{D}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 43.356 (250) & 6.112 & 0.949 & 0.531 & 6.498 \\
|
||||
FINISH & 2.109 (2) & 0.268 & 0.013 & 1.723 & 0.019 \\
|
||||
KILL & 89.648 (283) & 1.013 & 0.054 & 0.283 & 3.256 \\
|
||||
}
|
||||
\tableIV{E}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 23.081 (25) & 0.247 & 0.666 & 0.717 & 1.588 \\
|
||||
FINISH & 7.776 (2) & 0.019 & 0.029 & 1.934 & 0.021 \\
|
||||
KILL & 88.790 (309) & 0.706 & 0.029 & 0.461 & 7.572 \\
|
||||
}
|
||||
\tableIV{F}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 17.161 (8) & 0.621 & 0.546 & 0.426 & 7.559 \\
|
||||
FINISH & 2.941 (2) & 0.015 & 0.051 & 1.670 & 0.162 \\
|
||||
KILL & 103.889 (361) & 0.183 & 0.064 & 0.417 & 5.824 \\
|
||||
}
|
||||
\tableIV{G}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 51.835 (250) & 0.556 & 3.335 & 0.608 & 20.352 \\
|
||||
FINISH & 8.519 (36) & 0.002 & 0.630 & 1.760 & 0.005 \\
|
||||
KILL & 37.055 (100) & 5.687 & 0.065 & 0.080 & 19.166 \\
|
||||
}
|
||||
\tableIV{H}{
|
||||
EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\
|
||||
FAIL & 20.504 (1) & 0.114 & 2.300 & 0.981 & 12.833 \\
|
||||
FINISH & 4.278 (14) & 0.005 & 0.153 & 1.778 & 0.014 \\
|
||||
KILL & 11.023 (3) & 0.235 & 0.103 & 0.288 & 11.337 \\
|
||||
}
|
||||
\caption{Mean number of tasks and event distribution per job type for each
|
||||
cluster in the 2019 traces. The tables show and
|
||||
mean and 95-th percentile for the number of tasks in a job, and
|
||||
additionally show the mean of job-wise total of task termination events.}
|
||||
\end{figure}
|
||||
|
||||
|
Loading…
Reference in a new issue