Merge branch 'master' of tea.maggioni.xyz:maggicl/bachelorThesis

report
2021-06-17 16:02:59 +02:00 · 2021-06-17 15:59:53 +02:00
2 changed files with 95 additions and 70 deletions
--- a/report/Claudio_Maggioni_report.pdf
+++ b/report/Claudio_Maggioni_report.pdf
--- a/report/Claudio_Maggioni_report.tex
+++ b/report/Claudio_Maggioni_report.tex
@ -89,7 +89,20 @@ old analysis to understand even better the causes of failures and how to prevent
 them. Additionally, this report provides an overview of the data engineering
 techniques used to perform the queries and analyses on the 2019 traces.
-\section{State of the art}
+\subsection{Outline}
 The report is structured as follows. Section~\ref{sec2} contains information about the
 current state of the art for Google Borg cluster traces. Section~\ref{sec3}
 provides an overview including technical background information on the data to
 analyze and its storage format. Section~\ref{sec4} will discuss about the
 project requirements and the data science methods used to perform the analysis.
 Section~\ref{sec5}, Section~\ref{sec6} and Section~\ref{sec7} show the result
 obtained while analyzing, respectively the performance input of
 unsuccessful executions, the patterns of task and job events, and the potential
 causes of unsuccessful executions. Finally, Section~\ref{sec8} contains the
 conclusions.
 \section{State of the art}\label{sec2}
 \begin{figure}[t]
 \begin{center}
 \begin{tabular}{cc}
@ -142,7 +155,7 @@ machines.
 \input{figures/machine_configs}
-\section{Background information}
+\section{Background information}\label{sec3}
  \textit{Borg} is Google's own cluster management software able to run
  thousands of different jobs. Among the various cluster management services it
@ -275,7 +288,7 @@ science technologies like Apache Spark were used to achieve efficient
 and parallelized computations. This approach is discussed with further
 detail in the following section.
-\section{Project Requirements and Analysis Methodology}
+\section{Project Requirements and Analysis Methodology}\label{sec4}
 The aim of this project is to repeat the analysis performed in 2015 on the
 dataset Google has released in 2019 in order to find similarities and
@ -460,7 +473,7 @@ computing slowdown values given the previously computed execution attempt time
 deltas. Finally, the mean of the computed slowdown values is computed resulting
 in the clear and coincise tables found in Figure~\ref{fig:taskslowdown}.
-\section{Analysis: Performance Input of Unsuccessful Executions}
+\section{Analysis: Performance Input of Unsuccessful Executions}\label{sec5}
 Our first investigation focuses on replicating the analysis done by the paper of
 Ros\'a et al.\ paper\cite{dsn-paper} regarding usage of machine time
@ -667,7 +680,7 @@ With more than 98\% of both CPU and memory resources used by
 non-successful tasks, it is clear the spatial resource waste is high in the 2019
 traces.
-\section{Analysis: Patterns of Task and Job Events}
+\section{Analysis: Patterns of Task and Job Events}\label{sec6}
 This section aims to use some of the tecniques used in section IV of
 the Ros\'a et al.\ paper\cite{dsn-paper} to find patterns and interpendencies
@ -801,84 +814,96 @@ one. For some clusters (namely B, C, and  D), the mean number of \texttt{FAIL} a
 \texttt{KILL} task events for \texttt{FINISH}ed jobs is almost the same.
 Additionally, it is noteworthy that cluster A has no \texttt{EVICT}ed jobs.
-% \section{Analysis: Potential Causes of Unsuccessful Executions}
+\section{Analysis: Potential Causes of Unsuccessful Executions}\label{sec7}
-% The aim of this section is to analyze several task-level and job-level
+This section re-applies the tecniques used in section V of the Ros\'a et al.\
-% parameters in order to find correlations with the success of an execution. By
+paper\cite{dsn-paper} to find patterns and interpendencies
-% using the tecniques used in Section V of the Rosa\' et al.\
+between task and job events by gathering event statistics at those events. In
-% paper\cite{dsn-paper} we analyze
+particular, Section~\ref{tabIII-section} explores how tasks of the success of a
-% task events' metadata, the use of CPU and Memory resources at the task level,
+task is inter-correlated with its own event patterns, which
-% and job metadata respectively in Section~\ref{fig7-section},
+Section~\ref{figV-section} explores even further by computing task success
-% Section~\ref{fig8-section} and Section~\ref{fig9-section}.
+probabilities based on the number of task termination events of a specific type.
 Finally, Section~\ref{tabIV-section} aims to find similar correlations, but at
 the job level.
-% \subsection{Event rates vs.\ task priority, event execution time, and machine
+\section{Analysis: Potential Causes of Unsuccessful Executions}
 % concurrency.}\label{fig7-section}
-% \input{figures/figure_7}
+The aim of this section is to analyze several task-level and job-level
 parameters in order to find correlations with the success of an execution. By
 using the tecniques used in Section V of the Rosa\' et al.\
 paper\cite{dsn-paper} we analyze
 task events' metadata, the use of CPU and Memory resources at the task level,
 and job metadata respectively in Section~\ref{fig7-section},
 Section~\ref{fig8-section} and Section~\ref{fig9-section}.
-% Refer to figures \ref{fig:figureVII-a}, \ref{fig:figureVII-b}, and
+\subsection{Event rates vs.\ task priority, event execution time, and machine
-% \ref{fig:figureVII-c}.
+concurrency.}\label{fig7-section}
-% \textbf{Observations}:
+\input{figures/figure_7}
-% \begin{itemize}
+Refer to figures \ref{fig:figureVII-a}, \ref{fig:figureVII-b}, and
-% \item
+\ref{fig:figureVII-c}.
 %   No smooth curves in this figure either, unlike 2011 traces
 % \item
 %   The behaviour of curves for 7a (priority) is almost the opposite of
 %   2011, i.e. in-between priorities have higher kill rates while
 %   priorities at the extremum have lower kill rates. This could also be
 %   due bt the inherent distribution of job terminations;
 % \item
 %   Event execution time curves are quite different than 2011, here it
 %   seems there is a good correlation between short task execution times
 %   and finish event rates, instead of the U shape curve in 2015 DSN
 % \item
 %   In figure \ref{fig:figureVII-b} cluster behaviour seems quite uniform
 % \item
 %   Machine concurrency seems to play little role in the event termination
 %   distribution, as for all concurrency factors the kill rate is at 90\%.
 % \end{itemize}
-% \subsection{Event Rates vs. Requested Resources, Resource Reservation, and
+\textbf{Observations}:
 % Resource Utilization}\label{fig8-section}
 % \input{figures/figure_8}
-% Refer to Figure~\ref{fig:figureVIII-a}, Figure~\ref{fig:figureVIII-a-csts}
+\begin{itemize}
-% Figure~\ref{fig:figureVIII-b}, Figure~\ref{fig:figureVIII-b-csts}
+\item
-% Figure~\ref{fig:figureVIII-c}, Figure~\ref{fig:figureVIII-c-csts}
+  No smooth curves in this figure either, unlike 2011 traces
-% Figure~\ref{fig:figureVIII-d}, Figure~\ref{fig:figureVIII-d-csts}
+\item
-% Figure~\ref{fig:figureVIII-e}, Figure~\ref{fig:figureVIII-e-csts}
+  The behaviour of curves for 7a (priority) is almost the opposite of
-% Figure~\ref{fig:figureVIII-f}, and Figure~\ref{fig:figureVIII-f-csts}.
+  2011, i.e. in-between priorities have higher kill rates while
  priorities at the extremum have lower kill rates. This could also be
  due bt the inherent distribution of job terminations;
 \item
  Event execution time curves are quite different than 2011, here it
  seems there is a good correlation between short task execution times
  and finish event rates, instead of the U shape curve in 2015 DSN
 \item
  In figure \ref{fig:figureVII-b} cluster behaviour seems quite uniform
 \item
  Machine concurrency seems to play little role in the event termination
  distribution, as for all concurrency factors the kill rate is at 90\%.
 \end{itemize}
-% \subsection{Job Rates vs. Job Size, Job Execution Time, and Machine Locality
+\subsection{Event Rates vs. Requested Resources, Resource Reservation, and
-% }\label{fig9-section}
+Resource Utilization}\label{fig8-section}
-% \input{figures/figure_9}
+\input{figures/figure_8}
-% Refer to figures \ref{fig:figureIX-a}, \ref{fig:figureIX-b}, and
+Refer to Figure~\ref{fig:figureVIII-a}, Figure~\ref{fig:figureVIII-a-csts}
-% \ref{fig:figureIX-c}.
+Figure~\ref{fig:figureVIII-b}, Figure~\ref{fig:figureVIII-b-csts}
 Figure~\ref{fig:figureVIII-c}, Figure~\ref{fig:figureVIII-c-csts}
 Figure~\ref{fig:figureVIII-d}, Figure~\ref{fig:figureVIII-d-csts}
 Figure~\ref{fig:figureVIII-e}, Figure~\ref{fig:figureVIII-e-csts}
 Figure~\ref{fig:figureVIII-f}, and Figure~\ref{fig:figureVIII-f-csts}.
-% \textbf{Observations}:
+\subsection{Job Rates vs. Job Size, Job Execution Time, and Machine Locality
 }\label{fig9-section}
 \input{figures/figure_9}
-% \begin{itemize}
+Refer to figures \ref{fig:figureIX-a}, \ref{fig:figureIX-b}, and
-% \item
+\ref{fig:figureIX-c}.
 %   Behaviour between cluster varies a lot
 % \item
 %   There are no ``smooth'' gradients in the various curves unlike in the
 %   2011 traces
 % \item
 %   Killed jobs have higher event rates in general, and overall dominate
 %   all event rates measures
 % \item
 %   There still seems to be a correlation between short execution job
 %   times and successfull final termination, and likewise for kills and
 %   higher job terminations
 % \item
 %   Across all clusters, a machine locality factor of 1 seems to lead to
 %   the highest success event rate
 % \end{itemize}
-\section{Conclusions, Future Work and Possible Developments}
+\textbf{Observations}:
 \begin{itemize}
 \item
  Behaviour between cluster varies a lot
 \item
  There are no ``smooth'' gradients in the various curves unlike in the
  2011 traces
 \item
  Killed jobs have higher event rates in general, and overall dominate
  all event rates measures
 \item
  There still seems to be a correlation between short execution job
  times and successfull final termination, and likewise for kills and
  higher job terminations
 \item
  Across all clusters, a machine locality factor of 1 seems to lead to
  the highest success event rate
 \end{itemize}
 \section{Conclusions, Future Work and Possible Developments}\label{sec8}
 \textbf{TBD}
 \newpage
Author	SHA1	Message	Date
Claudio Maggioni	d1ae92f239	Merge branch 'master' of tea.maggioni.xyz:maggicl/bachelorThesis	2021-06-17 16:02:59 +02:00
Claudio Maggioni	b58c2aaa52	report	2021-06-17 15:59:53 +02:00