diff --git a/report/Claudio_Maggioni_report.pdf b/report/Claudio_Maggioni_report.pdf index 1ddcd899..c2804306 100644 Binary files a/report/Claudio_Maggioni_report.pdf and b/report/Claudio_Maggioni_report.pdf differ diff --git a/report/Claudio_Maggioni_report.tex b/report/Claudio_Maggioni_report.tex index 56e239c4..27802b6a 100644 --- a/report/Claudio_Maggioni_report.tex +++ b/report/Claudio_Maggioni_report.tex @@ -37,7 +37,7 @@ \advisor[Universit\`a della Svizzera Italiana, Switzerland]{Prof.}{Walter}{Binder} \assistant[Universit\`a della Svizzera Italiana, -Switzerland]{Dr.}{Andrea}{Ros\'a} +Switzerland]{Dr.}{Andrea}{Ros\`a} \end{committee} \abstract{The thesis aims at comparing two different traces coming from large @@ -65,7 +65,7 @@ avoid wasting resources and avoid failures. In 2011 Google released a month long data trace of their own cluster management system~\cite{google-marso-11} \textit{Borg}, containing a lot of data regarding scheduling, priority management, and failures of a real production workload. -This data was the foundation of the 2015 Ros\'a et al.\ paper +This data was the foundation of the 2015 Ros\`a et al.\ paper \textit{Understanding the Dark Side of Big Data Clusters: An Analysis beyond Failures}~\cite{dsn-paper}, which in its many conclusions highlighted the need for better cluster management highlighting the high amount of failures found in @@ -116,7 +116,7 @@ exploiting the power of parallel computing, following most of the time a MapReduce-like structure. %\subsection{Contribution} -This project aims to repeat the analysis performed in 2015 DSN Ros\'a et al.\ +This project aims to repeat the analysis performed in 2015 DSN Ros\`a et al.\ paper~\cite{dsn-paper} to highlight similarities and differences in Google Borg workload and the behaviour and patterns of executions within it. Thanks to this analysis, we aim to understand even better the causes of failures and how to @@ -207,7 +207,7 @@ bugs~\cite{9}~\cite{10}~\cite{11}~\cite{12}. However, the community has not yet performed any research on the new Borg traces analysing unsuccessful executions, their possible causes, and the relationships between tasks and jobs. Therefore, the only current research in -this field is this very report, providing and update to the the 2015 Ros\'a et +this field is this very report, providing and update to the the 2015 Ros\`a et al.\ paper~\cite{dsn-paper} focusing on the new trace. \section{Background}\label{sec3} @@ -517,7 +517,7 @@ task termination counts. After the task events are sorted, the script iterates over the events in chronological order, storing each execution attempt time and registering all execution termination types by checking the event type field. The task termination is then equal to the last execution termination type, -following the definition originally given in the 2015 Ros\'a et al. DSN paper. +following the definition originally given in the 2015 Ros\`a et al. DSN paper. If the task termination is determined to be unsuccessful, the tally counter of task terminations for the matching task property is increased. Otherwise, all @@ -533,7 +533,7 @@ in the clear and coincise tables found in Figure~\ref{fig:taskslowdown}. \section{Analysis: Performance Input of Unsuccessful Executions}\label{sec5} Our first investigation focuses on replicating the analysis done by the paper of -Ros\'a et al.\ paper~\cite{dsn-paper} regarding usage of machine time +Ros\`a et al.\ paper~\cite{dsn-paper} regarding usage of machine time and resources. In this section we perform several analyses focusing on how machine time and @@ -639,7 +639,7 @@ Refer to Figure~\ref{fig:taskslowdown} for a comparison between the 2011 and means are computed on a cluster-by-cluster basis for 2019 data in Figure~\ref{fig:taskslowdown-csts}. -In 2015 Ros\'a et al.~\cite{dsn-paper} measured mean task slowdown per each task +In 2015 Ros\`a et al.~\cite{dsn-paper} measured mean task slowdown per each task priority value, which at the time were numeric values between 0 and 11. However, in 2019 traces, task priorities are given as a numeric value between 0 and 500. Therefore, to allow an easier comparison, mean task slowdown values are computed @@ -740,7 +740,7 @@ traces. \section{Analysis: Patterns of Task and Job Events}\label{sec6} This section aims to use some of the tecniques used in section IV of -the Ros\'a et al.\ paper~\cite{dsn-paper} to find patterns and interpendencies +the Ros\`a et al.\ paper~\cite{dsn-paper} to find patterns and interpendencies between task and job events by gathering event statistics at those events. In particular, Section~\ref{tabIII-section} explores how the success of a task is inter-correlated with its own event patterns, which @@ -873,15 +873,16 @@ Additionally, it is noteworthy that cluster A has no \texttt{EVICT}ed jobs. \section{Analysis: Potential Causes of Unsuccessful Executions}\label{sec7} -This section re-applies the tecniques used in Section V of the Ros\'a et al.\ -paper~\cite{dsn-paper} to find patterns and interpendencies -between task and job events by gathering event statistics at those events. In -particular, Section~\ref{tabIII-section} explores how tasks of the success of a -task is inter-correlated with its own event patterns, which -Section~\ref{figV-section} explores even further by computing task success -probabilities based on the number of task termination events of a specific type. -Finally, Section~\ref{tabIV-section} aims to find similar correlations, but at -the job level. +This section re-applies the tecniques used in Section V of the Ros\`a et al.\ +paper~\cite{dsn-paper} to find causes for unsuccessful events related to +task-level parameters (analyzed in Section~\ref{fig7-section}), +usage of machine resources by tasks (analyzed in Section~\ref{fig8-section}), +and job-level parameters (analyzed in Section~\ref{fig9-section}). In all the +analyses we use the ``event rate'' metric, which represents the relative +percentage of termination type events over a certain task/job parameter +configuration. We compute this metric for all the possible terminations (i.e.\ +\texttt{EVICT}, \texttt{FAIL}, \texttt{FINISH} and \texttt{KILL}) in order to +find correlations with the several trace parameters. \subsection{Task Event Rates vs.\ Task Priority, Event Execution Time, and Machine Concurrency.}\label{fig7-section} \input{figures/figure_7} @@ -911,7 +912,7 @@ From this analysis we can make the following observations: Figure~\ref{fig:figureVII-b-csts}) for the 2019 traces are quite different than 2011 ones, here it seems there is a good correlation between short task execution times - and finish event rates, instead of the ``U shape'' curve found in the Ros\'a + and finish event rates, instead of the ``U shape'' curve found in the Ros\`a et al.\ 2015 DSN paper~\cite{dsn-paper}; \item The behaviour among different clusters for the event execution time diff --git a/report/usiinfbachelorproject.cls b/report/usiinfbachelorproject.cls index 821a930a..6fab459c 100644 --- a/report/usiinfbachelorproject.cls +++ b/report/usiinfbachelorproject.cls @@ -229,7 +229,7 @@ {\newpage } {\textwidth 5cm} -%%% put ToC, LoF, LoT and Index entries in the ToC use of \phantomsection is required for dealing with the hyperref package and depends on the nohyper option +%%% put ToC, LoF, LoT and Index entries in the ToC use of \phantomsection is required for dealing with the ryperref package and depends on the nohyper option %%% other useful packages @@ -241,7 +241,8 @@ \RequirePackage{amsmath} %%% switch on hyperref support \ifthenelse{\boolean{@hypermode}}{% -\RequirePackage[unicode,plainpages=false,pdfpagelabels,breaklinks]{hyperref} +\RequirePackage[svgnames]{xcolor} +\RequirePackage[colorlinks=true,linkcolor=Maroon,allcolors=Maroon,unicode,plainpages=false,pdfpagelabels,breaklinks]{hyperref} \RequirePackage[all]{hypcap} }{} @@ -256,7 +257,7 @@ \textsf{Advisor's approval}{} (\DTLforeach*[\DTLiseq{\type}{r}]{committee}% {\actitle=title,\first=first,\last=last,\type=type}{% - \DTLiffirstrow{}{, }\textsf{\print@blank{\actitle}\first \ \last}, \textsf{Dr. Andrea Ros\'a}):% + \DTLiffirstrow{}{, }\textsf{\print@blank{\actitle}\first \ \last}, \textsf{Dr. Andrea Ros\`a}):% \hspace{4cm} & \textsf{Date: } }