report work

2021-05-31 16:38:58 +02:00 · 2021-05-31 16:38:58 +02:00 · 0c908dbeca
parent 3616c8d8ba
commit 0c908dbeca
6 changed files with 57 additions and 40 deletions
--- a/report/Claudio_Maggioni_report.pdf
+++ b/report/Claudio_Maggioni_report.pdf
--- a/report/Claudio_Maggioni_report.tex
+++ b/report/Claudio_Maggioni_report.tex
@ -68,7 +68,7 @@ scheduling, priority management, and failures of a real production workload.
 This data was 2009
 This data was the foundation of the 2015 Ros\'a et al.\ paper
 \textit{Understanding the Dark Side of Big Data Clusters: An Analysis beyond
-Failures}\cite{vino-paper}, which in its many conclusions highlighted the need
+Failures}\cite{dsn-paper}, which in its many conclusions highlighted the need
 for better cluster management highlighting the high amount of failures found in
 the traces.

@ -103,7 +103,7 @@ techniques used to perform the queries and analyses on the 2019 traces.

 In 2015, Dr.~Andrea Rosà et al.\ published a
 research paper titled \textit{Understanding the Dark Side of Big Data Clusters:
-An Analysis beyond Failures}\cite{vino-paper} in which they performed several
+An Analysis beyond Failures}\cite{dsn-paper} in which they performed several
 analysis on unsuccessful executions in the Google's 2011 Borg cluster traces
 with the aim of identifying their resource waste, their impacts on the
 performance of the application, and any causes that may lie behind such
@ -145,7 +145,7 @@ In general events can be of two kinds, there are events that are relative to the
 status of the schedule, and there are other events that are relative to the
 status of a task itself.

-\begin{figure}[h]
+\begin{figure}[t]
 \begin{center}
 \begin{tabular}{p{3cm}p{12cm}}
 \toprule
@ -167,7 +167,7 @@ status of a task itself.
 Figure~\ref{fig:eventTypes} shows the expected transitions between event
 types.

-\begin{figure}[h]
+\begin{figure}[t]
 \centering
 	\resizebox{\textwidth}{!}{%
 		\includegraphics{./figures/event_types.png}}
@ -253,8 +253,8 @@ comes from 8 Borg cells spanning 8 different datacenters located in different
 geographical positions, all focused on computational oriented workloads. The
 data collection time span matches the entire month of May 2019.

-Due to the inherent complexity in analyzing traces of this size, novel
-bleeding-edge data engineering tecniques were adopted to performed the required
+Due to the inherent complexity in analyzing traces of this size, non-trivial
+data engineering tecniques were adopted to performed the required
 computations. We used the framework Apache Spark to perform efficient and
 parallel Map-Reduce computations. In this section, we discuss the technical
 details behind our approach.
@ -324,9 +324,9 @@ possibility and insert back the omitted record attributes.
 \subsubsection{The queries}

 Most queries use only two or three fields in each trace records, while the
-original table records often are made of a couple of dozen fields. In order to save
-memory during the query, a projection is often applied to the data by the means
-of a \texttt{.map()} operation over the entire trace set, performed using
+original table records often are made of a couple of dozen fields. In order to
+save memory during the query, a projection is often applied to the data by the
+means of a \texttt{.map()} operation over the entire trace set, performed using
 Spark's RDD API.

 Another operation that is often necessary to perform prior to the Map-Reduce
@ -375,9 +375,9 @@ successful termination or not, and finally combine this data to compute
 slowdown, mean slowdown and ultimately the final table found in
 figure~\ref{fig:taskslowdown}.

-\begin{figure}[h]
-\centering
-\includegraphics[width=.75\textwidth]{figures/task_slowdown_query.png}
+\begin{figure}[t]
+\hspace{-0.075\textwidth}
+\includegraphics[width=1.15\textwidth]{figures/task_slowdown_query.png}
 \caption{Diagram of the script used for the ``task slowdown''
  query.}\label{fig:taskslowdownquery}
 \end{figure}
@ -429,7 +429,7 @@ in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
 \section{Analysis: Performance Input of Unsuccessful Executions}

 Our first investigation focuses on replicating the methodologies used in the
-2015 DSN Ros\'a et al.\ paper\cite{vino-paper} regarding usage of machine time
+2015 DSN Ros\'a et al.\ paper\cite{dsn-paper} regarding usage of machine time
 and resources.

 In this section we perform several analyses focusing on how machine time and
@ -516,7 +516,7 @@ Refer to figure~\ref{fig:taskslowdown} for a comparison between the 2011 and
 means are computed on a cluster-by-cluster basis for 2019 data in
 figure~\ref{fig:taskslowdown-csts}.

-In 2015 Ros\'a et al.\cite{vino-paper} measured mean task slowdown per each task
+In 2015 Ros\'a et al.\cite{dsn-paper} measured mean task slowdown per each task
 priority value, which at the time were $[0,11]$ numeric values. However,
 in 2019 traces, task priorities are given as a $[0,500]$ numeric value.
 Therefore, to allow for an easier comparison, mean task slowdown values are
@ -614,12 +614,29 @@ With more than 98\% of both CPU and memory resources used by
 non-successful tasks, it is clear the spatial resource waste is high in the 2019
 traces.

-\section{Analysis: Pattern and Models for Task and Job Events}
+\section{Analysis: Patterns of Task and Job Events}
+
+This section aims to use some of the tecniques used in section IV of
+the Ros\'a et al.\ paper\cite{dsn-paper} to find patterns and interpendencies
+between task and job events by gathering event statistics at those events.

 \subsection{Unsuccessful Task Event Patterns}
 \input{figures/table_iii} % has table III and table IV in it

-Refer to figure \ref{fig:tableIII}.
+In this analysis we compute the distribution of termination events by type at
+the task-level events and the conditional probability of a task succesfully
+terminating given a number of \texttt{EVICT}, \texttt{FAIL} and \texttt{FINISH}
+termination events during the task execution.
+
+A comparison of the termination event distribution between the 2011 and 2019
+traces is shown in figure~\ref{fig:tableIII}. Additionally, a cluster-by-cluster
+breakdown of the same data for the 2019 traces is shown in
+figure~\ref{fig:tableIII-csts}.
+
+Each table from these figure shows the mean and the 95-th percentile of the
+number of termination events per task, broke down by task termination. In
+addition, the table shows the mean number of \texttt{EVICT}, \texttt{FAIL},
+\texttt{FINISH}, and \texttt{KILL} for each task event termination.

 \textbf{Observations}:

@ -636,22 +653,7 @@ Refer to figure \ref{fig:tableIII}.
  2019 traces.
 \end{itemize}

-\subsection{Unsuccessful Job Event Patterns}
-
-\textbf{Observations}:
-
-\begin{itemize}
-\item
-  Again the mean number of tasks is significantly higher than the 2011
-  traces, indicating a higher complexity of workloads
-\item
-  Cluster A has no evicted jobs
-\item
-  The number of events is however lower than the event means in the 2011
-  traces
-\end{itemize}
-
-\subsection{Conditional Probability of Task Success}
+\subsubsection{Conditional Probability of Task Success}
 \input{figures/figure_5}

 Refer to figure \ref{fig:figureV}.
@ -669,6 +671,21 @@ Refer to figure \ref{fig:figureV}.
  lot for small \# evts differences. This may be due to an uneven
  distribution of \# evts in the traces.
 \end{itemize}
+\subsection{Unsuccessful Job Event Patterns}
+
+\textbf{Observations}:
+
+\begin{itemize}
+\item
+  Again the mean number of tasks is significantly higher than the 2011
+  traces, indicating a higher complexity of workloads
+\item
+  Cluster A has no evicted jobs
+\item
+  The number of events is however lower than the event means in the 2011
+  traces
+\end{itemize}
+

 \section{Analysis: Potential Causes of Unsuccessful Executions}

--- a/report/figures/machine_configs.tex
+++ b/report/figures/machine_configs.tex
@ -231,5 +231,5 @@ Unknown & Unknown & 1720 & 2.933251\% \\
 0.591797 & 0.666992 & 500 & 0.852689\% \\
 0.958984 & 1.000000 & 200 & 0.341076\% \\
 }{\\\\\\\\\\}
-\caption{Overview of machine configurations in terms of CPU and RAM resources for each cluster in the 2019 traces. Refer to figure~\ref{fig:machineconfig} for a column legend.}\label{fig:machineconfigs-csts}
+\caption{Overview of machine configurations in terms of CPU and RAM resources for each cluster in the 2019 traces. Refer to figure~\ref{fig:machineconfigs} for a column legend.}\label{fig:machineconfigs-csts}
 \end{figure}
--- a/report/figures/spatial_resource_waste.tex
+++ b/report/figures/spatial_resource_waste.tex
@ -9,7 +9,7 @@
 \begin{figure}[p]
 \spatialresourcewaste[0.5\textwidth]{used-2011}
 \spatialresourcewaste[0.5\textwidth]{used-all}
-	\caption{Percentages of CPU and RAM resources used by tasks w.r.t. task termination type in 2011 and 2019 traces (total of clusters A to D). The x axis is the type of resource, y-axis is the percentage of resource used and color represents task termination. Numeric values are displayed below the graph as a table.}\label{fig:spatialresourcewaste-requested}
+	\caption{Percentages of CPU and RAM resources used by tasks w.r.t.\ task termination type in 2011 and 2019 traces (total of clusters A to D). The x axis is the type of resource, y-axis is the percentage of resource used and color represents task termination. Numeric values are displayed below the graph as a table.}\label{fig:spatialresourcewaste-actual}
 \end{figure}

 \begin{figure}[p]
@ -17,16 +17,16 @@
 \spatialresourcewaste{used-b}
 \spatialresourcewaste{used-c}
 \spatialresourcewaste{used-d}
-	\caption{Percentages of CPU and RAM resources used by tasks w.r.t. task termination type for clusters A to D in 2019 traces. Refer to figure~\ref{fig:spatialresourcewaste-requested} for plot explaination.}\label{fig:spatialresourcewaste-actual-csts}
+	\caption{Percentages of CPU and RAM resources used by tasks w.r.t.\ task termination type for clusters A to D in 2019 traces. Refer to figure~\ref{fig:spatialresourcewaste-actual} for plot explaination.}\label{fig:spatialresourcewaste-actual-csts}
 \end{figure}

 \begin{figure}[p]
 \spatialresourcewaste[0.5\textwidth]{requested-2011}
 \spatialresourcewaste[0.5\textwidth]{requested-all}
-	\caption{Percentages of CPU and RAM resources requested by tasks w.r.t. task termination type in 2011 and 2019 traces. The x axis is the type of resource, y-axis is the percentage of resource used and color represents task termination. Numeric values are displayed below the graph as a table.}\label{fig:spatialresourcewaste-actual}
+	\caption{Percentages of CPU and RAM resources requested by tasks w.r.t.\ task termination type in 2011 and 2019 traces. The x axis is the type of resource, y-axis is the percentage of resource used and color represents task termination. Numeric values are displayed below the graph as a table.}\label{fig:spatialresourcewaste-requested}
 \end{figure}

-\begin{figure}[p]
+\begin{figure}
 \spatialresourcewaste{requested-a}
 \spatialresourcewaste{requested-b}
 \spatialresourcewaste{requested-c}
@ -35,5 +35,5 @@
 \spatialresourcewaste{requested-f}
 \spatialresourcewaste{requested-g}
 \spatialresourcewaste{requested-h}
-	\caption{Percentages of CPU and RAM resources requested by tasks w.r.t. task termination type for in 2019 traces. Refer to figure~\ref{fig:spatialresourcewaste-requested} for plot explaination.}\label{fig:spatialresourcewaste-actual-csts}
+	\caption{Percentages of CPU and RAM resources requested by tasks w.r.t.\ task termination type for in 2019 traces. Refer to figure~\ref{fig:spatialresourcewaste-requested} for plot explaination.}\label{fig:spatialresourcewaste-requested-csts}
 \end{figure}
--- a/report/figures/table_iii.tex
+++ b/report/figures/table_iii.tex
@ -63,7 +63,7 @@ FINISH &    2.962 (2) &  0.022 &  0.012 & 2.915 &  0.013 \\
 	tables show an
 	overall mean accompanied by the 95-th percentile of all termination
 	events, followed by the mean of events per event type of each
-	termination event.}
+	termination event.}\label{fig:tableIII}
 \end{figure}

 \begin{figure}[p]
--- a/report/references.bib
+++ b/report/references.bib
@ -14,7 +14,7 @@ booktitle	= {EuroSys'20},
 address	= {Heraklion, Crete}
 }

-@INPROCEEDINGS{vino-paper,
+@INPROCEEDINGS{dsn-paper,
  author={Rosà, Andrea and Chen, Lydia Y. and Binder, Walter},
  booktitle={2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks},
  title={Understanding the Dark Side of Big Data Clusters: An Analysis beyond Failures},