diff --git a/report/Claudio_Maggioni_report.pdf b/report/Claudio_Maggioni_report.pdf index f029243c..c2a05749 100644 Binary files a/report/Claudio_Maggioni_report.pdf and b/report/Claudio_Maggioni_report.pdf differ diff --git a/report/Claudio_Maggioni_report.tex b/report/Claudio_Maggioni_report.tex index 936246a7..3386baec 100644 --- a/report/Claudio_Maggioni_report.tex +++ b/report/Claudio_Maggioni_report.tex @@ -620,13 +620,12 @@ This section aims to use some of the tecniques used in section IV of the Ros\'a et al.\ paper\cite{dsn-paper} to find patterns and interpendencies between task and job events by gathering event statistics at those events. -\subsection{Unsuccessful Task Event Patterns} -\input{figures/table_iii} % has table III and table IV in it +\subsection{Unsuccessful Task Event Patterns}\label{tabIII-section} +\input{figures/table_iii} In this analysis we compute the distribution of termination events by type at -the task-level events and the conditional probability of a task succesfully -terminating given a number of \texttt{EVICT}, \texttt{FAIL} and \texttt{FINISH} -termination events during the task execution. +the task-level events, namely \texttt{EVICT}, \texttt{FAIL}, \texttt{FINISH} +and \texttt{KILL} termination events. A comparison of the termination event distribution between the 2011 and 2019 traces is shown in figure~\ref{fig:tableIII}. Additionally, a cluster-by-cluster @@ -688,38 +687,54 @@ corresponding task to terminate in an unsuccessful way: a task with no \texttt{KILL} events have 0.02\%, 0.20\%, 0.44\%, 0.04\%, and 0.07\% probabilities of success respectively. The same effect can be observed, albeit in a less drastic fashion, for the \texttt{EVICT} and \texttt{FAIL} -curves. The \texttt{EVICT} curve has for 0 to 5 +curves. The \texttt{EVICT} curve has for tasks with 0 to 5 kill events 19.70\%, +15.94\%, 1.94\%, 1.67\%, 0.35\% and 0.00\% success probabilities repectively. +The \texttt{FAIL} probability curve has instead 18.55\%, 1.79\%, 14.49\%, +2.08\%, 2.40\%, and 1.29\% success probabilities for the same range. -Refer to figure \ref{fig:figureV}. +Considering cluster-to-cluster behaviour in the 2019 traces (as shown in +figure~\ref{fig:figureV-csts}), some clusters show quite similar behaviour to +the aggregated plot (namely clusters A, F, and H), while some other clusters +show very oscillating probability distribution function curves for +\texttt{EVICT} and \texttt{FINISH} curves. \texttt{KILL} behaviour is instead +homogeneous even on a single cluster basis. -\textbf{Observations}: - -\begin{itemize} -\item - Behaviour is very different from cluster to cluster -\item - There is no easy conclusion, unlike in 2011, on the correlation - between succesful probability and \# of events of a specific type. -\item - Clusters B, C and D in particular have very unsmooth lines that vary a - lot for small \# evts differences. This may be due to an uneven - distribution of \# evts in the traces. -\end{itemize} \subsection{Unsuccessful Job Event Patterns} +\input{figures/table_iv} -\textbf{Observations}: +This analysis uses very similar techniques to the ones used in +section~\ref{tabIII-section}, but focusing at the job level instead. The aim is +to better understand the task-job level relationship and to understand how +task-level termination events can influence the termination state of a job. -\begin{itemize} -\item - Again the mean number of tasks is significantly higher than the 2011 - traces, indicating a higher complexity of workloads -\item - Cluster A has no evicted jobs -\item - The number of events is however lower than the event means in the 2011 - traces -\end{itemize} +A comparison of the analyzed parameters between the 2011 and 2019 +traces is shown in figure~\ref{fig:tableIV}. Additionally, a cluster-by-cluster +breakdown of the same data for the 2019 traces is shown in +figure~\ref{fig:tableIV-csts}. +Considering the distribution of number of tasks in a job, the 2019 traces show a +decrease for the mean figure (e.g. for \texttt{FAIL}ed jobs, with a mean 60.5 +tasks per job in 2011 and a mean 43.126 tasks per job in 2019) and a fluctuation + of the 95-th percentile figure (e.g. for \texttt{FAIL}ed jobs it rose from 110 + to 200, but for \texttt{KILL}ed job the figure decreased from 400 to 178). + +Considering the distribution of the number of task-wise termination events +instead, the 2019 traces show values generally one or two orders of magnitude +below the ones in 2011. While the behaviour of \texttt{EVICT}ed jobs stays the +same, \texttt{FAIL}ed and \texttt{KILL}ed jobs show a dramatic difference in +the event distribution, with \texttt{KILL} becoming the most popular event +task-wise with mean 12.833 and 11.337 task events per job respectively. Finally, +the \texttt{FINISH}ed job category has a new event distribution too, with +\texttt{FINISH} task events being the most popular at 1.778 events per job in +the 2019 traces. + +The cluster-by-cluster comparison in figure~\ref{fig:tableIV-csts} shows that +the number of tasks per job are generally distributed similarly to the +aggregated data, with only cluster H having remarkably low mean and 95-th +percentiles overall. Event-wise, for \texttt{EVICT}ed, \texttt{FINISH}ed, +and \texttt{KILL}ed jobs again the distributions are similar to the aggregated +one. For some clusters (namely B, C, and D), the mean number of \texttt{FAIL} and +\texttt{KILL} task events for \texttt{FINISH}ed jobs is almost the same. \section{Analysis: Potential Causes of Unsuccessful Executions} diff --git a/report/figures/table_iii.tex b/report/figures/table_iii.tex index d3403c0c..5b6a14fb 100644 --- a/report/figures/table_iii.tex +++ b/report/figures/table_iii.tex @@ -121,90 +121,3 @@ overall mean accompanied by the 95-th percentile of all termination events, followed by a mean of events per event type of each termination event.}\label{fig:tableIII-csts} \end{figure} - -\begin{figure}[p] -\begin{subfigure}{\textwidth} -\centering -\begin{tabular}{lrrrrr} -\toprule -\tableIVh% -\midrule - EVICT & 0.989 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ - FAIL & 43.126 (200) & 0.114 & 2.300 & 0.981 & 12.833 \\ -FINISH & 3.074 (2) & 0.005 & 0.153 & 1.778 & 0.014 \\ - KILL & 53.919 (178) & 0.235 & 0.103 & 0.288 & 11.337 \\ -\bottomrule -\end{tabular} -\caption{2011 data} -\vspace{0.5cm} -\end{subfigure} -\begin{subfigure}{\textwidth} -\centering -\begin{tabular}{lrrrrr} -\toprule -\tableIVh% -\midrule - EVICT & 0.989 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ - FAIL & 43.126 (200) & 0.114 & 2.300 & 0.981 & 12.833 \\ -FINISH & 3.074 (2) & 0.005 & 0.153 & 1.778 & 0.014 \\ - KILL & 53.919 (178) & 0.235 & 0.103 & 0.288 & 11.337 \\ -\bottomrule -\end{tabular} -\caption{2019 data} -\end{subfigure} -\caption{tbd} -\end{figure} - -\begin{figure}[p] -\tableIV{A}{ -EVICT & -- & -- & -- & -- & -- \\ - FAIL & 90.793 (499) & 0.695 & 0.684 & 0.086 & 1.850 \\ -FINISH & 1.187 (1) & 0.005 & 0.001 & 1.073 & 0.024 \\ - KILL & 16.533 (10) & 1.045 & 0.074 & 0.461 & 1.189 \\ -} -\tableIV{B}{ - EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ - FAIL & 74.368 (374) & 2.003 & 1.994 & 0.267 & 4.944 \\ -FINISH & 6.304 (10) & 0.022 & 0.008 & 2.349 & 0.013 \\ - KILL & 69.853 (234) & 1.696 & 0.158 & 0.614 & 3.009 \\ -} -\tableIV{C}{ - EVICT & 1.000 (1) & 1.001 & 0.000 & 0.000 & 0.000 \\ - FAIL & 41.982 (200) & 3.484 & 0.998 & 0.376 & 3.998 \\ -FINISH & 1.991 (1) & 0.022 & 0.017 & 1.565 & 0.017 \\ - KILL & 110.681 (652) & 0.627 & 0.059 & 0.656 & 2.267 \\ -} -\tableIV{D}{ - EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ - FAIL & 43.356 (250) & 6.112 & 0.949 & 0.531 & 6.498 \\ -FINISH & 2.109 (2) & 0.268 & 0.013 & 1.723 & 0.019 \\ - KILL & 89.648 (283) & 1.013 & 0.054 & 0.283 & 3.256 \\ -} -\tableIV{E}{ - EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ - FAIL & 23.081 (25) & 0.247 & 0.666 & 0.717 & 1.588 \\ -FINISH & 7.776 (2) & 0.019 & 0.029 & 1.934 & 0.021 \\ - KILL & 88.790 (309) & 0.706 & 0.029 & 0.461 & 7.572 \\ -} -\tableIV{F}{ - EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ - FAIL & 17.161 (8) & 0.621 & 0.546 & 0.426 & 7.559 \\ -FINISH & 2.941 (2) & 0.015 & 0.051 & 1.670 & 0.162 \\ - KILL & 103.889 (361) & 0.183 & 0.064 & 0.417 & 5.824 \\ -} -\tableIV{G}{ - EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ - FAIL & 51.835 (250) & 0.556 & 3.335 & 0.608 & 20.352 \\ -FINISH & 8.519 (36) & 0.002 & 0.630 & 1.760 & 0.005 \\ - KILL & 37.055 (100) & 5.687 & 0.065 & 0.080 & 19.166 \\ -} -\tableIV{H}{ - EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ - FAIL & 20.504 (1) & 0.114 & 2.300 & 0.981 & 12.833 \\ -FINISH & 4.278 (14) & 0.005 & 0.153 & 1.778 & 0.014 \\ - KILL & 11.023 (3) & 0.235 & 0.103 & 0.288 & 11.337 \\ -} - \caption{tbd} -\end{figure} - - diff --git a/report/figures/table_iv.tex b/report/figures/table_iv.tex new file mode 100644 index 00000000..4f6669c8 --- /dev/null +++ b/report/figures/table_iv.tex @@ -0,0 +1,92 @@ +\begin{figure}[p] +\begin{subfigure}{\textwidth} +\centering +\begin{tabular}{lrrrrr} +\toprule +\tableIVh% +\midrule +EVICT & 1 (1) & 1 & 0 & 0 & 0 \\ +FAIL & 60.5 (110) & $139.0$ & $788.5$ & $49.2$ & $9.5$ \\ +FINISH & 2.7 (1) & $0.4$ & $0.1$ & $5 \cdot 10^{-4}$ & $2.7$ \\ +KILL & 86.8 (400) & $13.3$ & $20.9$ & $26.9$ & $62.7$ \\ +\bottomrule +\end{tabular} +\caption{2011 data} +\vspace{0.5cm} +\end{subfigure} +\begin{subfigure}{\textwidth} +\centering +\begin{tabular}{lrrrrr} +\toprule +\tableIVh% +\midrule + EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ + FAIL & 43.126 (200) & 0.114 & 2.300 & 0.981 & 12.833 \\ +FINISH & 3.074 (2) & 0.005 & 0.153 & 1.778 & 0.014 \\ + KILL & 53.919 (178) & 0.235 & 0.103 & 0.288 & 11.337 \\ +\bottomrule +\end{tabular} +\caption{2019 data} +\end{subfigure} +\caption{Mean number of tasks and event distribution per job type for between + 2011 and 2019 (all clusters aggregated) traces. The tables show and + mean and 95-th percentile for the number of tasks in a job, and + additionally show the mean of job-wise total of task termination events.} +\end{figure} + +\begin{figure}[p] +\tableIV{A}{ +EVICT & -- & -- & -- & -- & -- \\ + FAIL & 90.793 (499) & 0.695 & 0.684 & 0.086 & 1.850 \\ +FINISH & 1.187 (1) & 0.005 & 0.001 & 1.073 & 0.024 \\ + KILL & 16.533 (10) & 1.045 & 0.074 & 0.461 & 1.189 \\ +} +\tableIV{B}{ + EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ + FAIL & 74.368 (374) & 2.003 & 1.994 & 0.267 & 4.944 \\ +FINISH & 6.304 (10) & 0.022 & 0.008 & 2.349 & 0.013 \\ + KILL & 69.853 (234) & 1.696 & 0.158 & 0.614 & 3.009 \\ +} +\tableIV{C}{ + EVICT & 1.000 (1) & 1.001 & 0.000 & 0.000 & 0.000 \\ + FAIL & 41.982 (200) & 3.484 & 0.998 & 0.376 & 3.998 \\ +FINISH & 1.991 (1) & 0.022 & 0.017 & 1.565 & 0.017 \\ + KILL & 110.681 (652) & 0.627 & 0.059 & 0.656 & 2.267 \\ +} +\tableIV{D}{ + EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ + FAIL & 43.356 (250) & 6.112 & 0.949 & 0.531 & 6.498 \\ +FINISH & 2.109 (2) & 0.268 & 0.013 & 1.723 & 0.019 \\ + KILL & 89.648 (283) & 1.013 & 0.054 & 0.283 & 3.256 \\ +} +\tableIV{E}{ + EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ + FAIL & 23.081 (25) & 0.247 & 0.666 & 0.717 & 1.588 \\ +FINISH & 7.776 (2) & 0.019 & 0.029 & 1.934 & 0.021 \\ + KILL & 88.790 (309) & 0.706 & 0.029 & 0.461 & 7.572 \\ +} +\tableIV{F}{ + EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ + FAIL & 17.161 (8) & 0.621 & 0.546 & 0.426 & 7.559 \\ +FINISH & 2.941 (2) & 0.015 & 0.051 & 1.670 & 0.162 \\ + KILL & 103.889 (361) & 0.183 & 0.064 & 0.417 & 5.824 \\ +} +\tableIV{G}{ + EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ + FAIL & 51.835 (250) & 0.556 & 3.335 & 0.608 & 20.352 \\ +FINISH & 8.519 (36) & 0.002 & 0.630 & 1.760 & 0.005 \\ + KILL & 37.055 (100) & 5.687 & 0.065 & 0.080 & 19.166 \\ +} +\tableIV{H}{ + EVICT & 1.000 (1) & 1.000 & 0.000 & 0.000 & 0.000 \\ + FAIL & 20.504 (1) & 0.114 & 2.300 & 0.981 & 12.833 \\ +FINISH & 4.278 (14) & 0.005 & 0.153 & 1.778 & 0.014 \\ + KILL & 11.023 (3) & 0.235 & 0.103 & 0.288 & 11.337 \\ +} +\caption{Mean number of tasks and event distribution per job type for each + cluster in the 2019 traces. The tables show and + mean and 95-th percentile for the number of tasks in a job, and + additionally show the mean of job-wise total of task termination events.} +\end{figure} + +