report work

2021-05-31 16:38:58 +02:00 · 2021-05-31 16:38:58 +02:00 · 96db36d8d6
commit 96db36d8d6
parent 2d1b357500
6 changed files with 57 additions and 40 deletions
--- a/report/Claudio_Maggioni_report.pdf
+++ b/report/Claudio_Maggioni_report.pdf
--- a/report/Claudio_Maggioni_report.tex
+++ b/report/Claudio_Maggioni_report.tex
@ -68,7 +68,7 @@ scheduling, priority management, and failures of a real production workload.
 This data was 2009
 This data was the foundation of the 2015 Ros\'a et al.\ paper
 \textit{Understanding the Dark Side of Big Data Clusters: An Analysis beyond
-Failures}\cite{vino-paper}, which in its many conclusions highlighted the need
+Failures}\cite{dsn-paper}, which in its many conclusions highlighted the need
 for better cluster management highlighting the high amount of failures found in
 the traces.
@ -103,7 +103,7 @@ techniques used to perform the queries and analyses on the 2019 traces.
 In 2015, Dr.~Andrea Rosà et al.\ published a
 research paper titled \textit{Understanding the Dark Side of Big Data Clusters:
-An Analysis beyond Failures}\cite{vino-paper} in which they performed several
+An Analysis beyond Failures}\cite{dsn-paper} in which they performed several
 analysis on unsuccessful executions in the Google's 2011 Borg cluster traces
 with the aim of identifying their resource waste, their impacts on the
 performance of the application, and any causes that may lie behind such
@ -145,7 +145,7 @@ In general events can be of two kinds, there are events that are relative to the
 status of the schedule, and there are other events that are relative to the
 status of a task itself.
-\begin{figure}[h]
+\begin{figure}[t]
 \begin{center}
 \begin{tabular}{p{3cm}p{12cm}}
 \toprule
@ -167,7 +167,7 @@ status of a task itself.
 Figure~\ref{fig:eventTypes} shows the expected transitions between event
 types.
-\begin{figure}[h]
+\begin{figure}[t]
 \centering
 	\resizebox{\textwidth}{!}{%
 		\includegraphics{./figures/event_types.png}}
@ -253,8 +253,8 @@ comes from 8 Borg cells spanning 8 different datacenters located in different
 geographical positions, all focused on computational oriented workloads. The
 data collection time span matches the entire month of May 2019.
-Due to the inherent complexity in analyzing traces of this size, novel
+Due to the inherent complexity in analyzing traces of this size, non-trivial
-bleeding-edge data engineering tecniques were adopted to performed the required
+data engineering tecniques were adopted to performed the required
 computations. We used the framework Apache Spark to perform efficient and
 parallel Map-Reduce computations. In this section, we discuss the technical
 details behind our approach.
@ -324,9 +324,9 @@ possibility and insert back the omitted record attributes.
 \subsubsection{The queries}
 Most queries use only two or three fields in each trace records, while the
-original table records often are made of a couple of dozen fields. In order to save
+original table records often are made of a couple of dozen fields. In order to
-memory during the query, a projection is often applied to the data by the means
+save memory during the query, a projection is often applied to the data by the
-of a \texttt{.map()} operation over the entire trace set, performed using
+means of a \texttt{.map()} operation over the entire trace set, performed using
 Spark's RDD API.
 Another operation that is often necessary to perform prior to the Map-Reduce
@ -375,9 +375,9 @@ successful termination or not, and finally combine this data to compute
 slowdown, mean slowdown and ultimately the final table found in
 figure~\ref{fig:taskslowdown}.
-\begin{figure}[h]
+\begin{figure}[t]
-\centering
+\hspace{-0.075\textwidth}
-\includegraphics[width=.75\textwidth]{figures/task_slowdown_query.png}
+\includegraphics[width=1.15\textwidth]{figures/task_slowdown_query.png}
 \caption{Diagram of the script used for the ``task slowdown''
  query.}\label{fig:taskslowdownquery}
 \end{figure}
@ -429,7 +429,7 @@ in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
 \section{Analysis: Performance Input of Unsuccessful Executions}
 Our first investigation focuses on replicating the methodologies used in the
-2015 DSN Ros\'a et al.\ paper\cite{vino-paper} regarding usage of machine time
+2015 DSN Ros\'a et al.\ paper\cite{dsn-paper} regarding usage of machine time
 and resources.
 In this section we perform several analyses focusing on how machine time and
@ -516,7 +516,7 @@ Refer to figure~\ref{fig:taskslowdown} for a comparison between the 2011 and
 means are computed on a cluster-by-cluster basis for 2019 data in
 figure~\ref{fig:taskslowdown-csts}.
-In 2015 Ros\'a et al.\cite{vino-paper} measured mean task slowdown per each task
+In 2015 Ros\'a et al.\cite{dsn-paper} measured mean task slowdown per each task
 priority value, which at the time were $[0,11]$ numeric values. However,
 in 2019 traces, task priorities are given as a $[0,500]$ numeric value.
 Therefore, to allow for an easier comparison, mean task slowdown values are
@ -614,12 +614,29 @@ With more than 98\% of both CPU and memory resources used by
 non-successful tasks, it is clear the spatial resource waste is high in the 2019
 traces.
-\section{Analysis: Pattern and Models for Task and Job Events}
+\section{Analysis: Patterns of Task and Job Events}
 This section aims to use some of the tecniques used in section IV of
 the Ros\'a et al.\ paper\cite{dsn-paper} to find patterns and interpendencies
 between task and job events by gathering event statistics at those events.
 \subsection{Unsuccessful Task Event Patterns}
 \input{figures/table_iii} % has table III and table IV in it
-Refer to figure \ref{fig:tableIII}.
+In this analysis we compute the distribution of termination events by type at
 the task-level events and the conditional probability of a task succesfully
 terminating given a number of \texttt{EVICT}, \texttt{FAIL} and \texttt{FINISH}
 termination events during the task execution.
 A comparison of the termination event distribution between the 2011 and 2019
 traces is shown in figure~\ref{fig:tableIII}. Additionally, a cluster-by-cluster
 breakdown of the same data for the 2019 traces is shown in
 figure~\ref{fig:tableIII-csts}.
 Each table from these figure shows the mean and the 95-th percentile of the
 number of termination events per task, broke down by task termination. In
 addition, the table shows the mean number of \texttt{EVICT}, \texttt{FAIL},
 \texttt{FINISH}, and \texttt{KILL} for each task event termination.
 \textbf{Observations}:
@ -636,22 +653,7 @@ Refer to figure \ref{fig:tableIII}.
  2019 traces.
 \end{itemize}
-\subsection{Unsuccessful Job Event Patterns}
+\subsubsection{Conditional Probability of Task Success}
 \textbf{Observations}:
 \begin{itemize}
 \item
  Again the mean number of tasks is significantly higher than the 2011
  traces, indicating a higher complexity of workloads
 \item
  Cluster A has no evicted jobs
 \item
  The number of events is however lower than the event means in the 2011
  traces
 \end{itemize}
 \subsection{Conditional Probability of Task Success}
 \input{figures/figure_5}
 Refer to figure \ref{fig:figureV}.
@ -669,6 +671,21 @@ Refer to figure \ref{fig:figureV}.
  lot for small \# evts differences. This may be due to an uneven
  distribution of \# evts in the traces.
 \end{itemize}
 \subsection{Unsuccessful Job Event Patterns}
 \textbf{Observations}:
 \begin{itemize}
 \item
  Again the mean number of tasks is significantly higher than the 2011
  traces, indicating a higher complexity of workloads
 \item
  Cluster A has no evicted jobs
 \item
  The number of events is however lower than the event means in the 2011
  traces
 \end{itemize}
 \section{Analysis: Potential Causes of Unsuccessful Executions}
--- a/report/figures/machine_configs.tex
+++ b/report/figures/machine_configs.tex
@ -231,5 +231,5 @@ Unknown & Unknown & 1720 & 2.933251\% \\
 0.591797 & 0.666992 & 500 & 0.852689\% \\
 0.958984 & 1.000000 & 200 & 0.341076\% \\
 }{\\\\\\\\\\}
-\caption{Overview of machine configurations in terms of CPU and RAM resources for each cluster in the 2019 traces. Refer to figure~\ref{fig:machineconfig} for a column legend.}\label{fig:machineconfigs-csts}
+\caption{Overview of machine configurations in terms of CPU and RAM resources for each cluster in the 2019 traces. Refer to figure~\ref{fig:machineconfigs} for a column legend.}\label{fig:machineconfigs-csts}
 \end{figure}
--- a/report/figures/spatial_resource_waste.tex
+++ b/report/figures/spatial_resource_waste.tex
@ -9,7 +9,7 @@
 \begin{figure}[p]
 \spatialresourcewaste[0.5\textwidth]{used-2011}
 \spatialresourcewaste[0.5\textwidth]{used-all}
-	\caption{Percentages of CPU and RAM resources used by tasks w.r.t. task termination type in 2011 and 2019 traces (total of clusters A to D). The x axis is the type of resource, y-axis is the percentage of resource used and color represents task termination. Numeric values are displayed below the graph as a table.}\label{fig:spatialresourcewaste-requested}
+	\caption{Percentages of CPU and RAM resources used by tasks w.r.t.\ task termination type in 2011 and 2019 traces (total of clusters A to D). The x axis is the type of resource, y-axis is the percentage of resource used and color represents task termination. Numeric values are displayed below the graph as a table.}\label{fig:spatialresourcewaste-actual}
 \end{figure}
 \begin{figure}[p]
@ -17,16 +17,16 @@
 \spatialresourcewaste{used-b}
 \spatialresourcewaste{used-c}
 \spatialresourcewaste{used-d}
-	\caption{Percentages of CPU and RAM resources used by tasks w.r.t. task termination type for clusters A to D in 2019 traces. Refer to figure~\ref{fig:spatialresourcewaste-requested} for plot explaination.}\label{fig:spatialresourcewaste-actual-csts}
+	\caption{Percentages of CPU and RAM resources used by tasks w.r.t.\ task termination type for clusters A to D in 2019 traces. Refer to figure~\ref{fig:spatialresourcewaste-actual} for plot explaination.}\label{fig:spatialresourcewaste-actual-csts}
 \end{figure}
 \begin{figure}[p]
 \spatialresourcewaste[0.5\textwidth]{requested-2011}
 \spatialresourcewaste[0.5\textwidth]{requested-all}
-	\caption{Percentages of CPU and RAM resources requested by tasks w.r.t. task termination type in 2011 and 2019 traces. The x axis is the type of resource, y-axis is the percentage of resource used and color represents task termination. Numeric values are displayed below the graph as a table.}\label{fig:spatialresourcewaste-actual}
+	\caption{Percentages of CPU and RAM resources requested by tasks w.r.t.\ task termination type in 2011 and 2019 traces. The x axis is the type of resource, y-axis is the percentage of resource used and color represents task termination. Numeric values are displayed below the graph as a table.}\label{fig:spatialresourcewaste-requested}
 \end{figure}
-\begin{figure}[p]
+\begin{figure}
 \spatialresourcewaste{requested-a}
 \spatialresourcewaste{requested-b}
 \spatialresourcewaste{requested-c}
@ -35,5 +35,5 @@
 \spatialresourcewaste{requested-f}
 \spatialresourcewaste{requested-g}
 \spatialresourcewaste{requested-h}
-	\caption{Percentages of CPU and RAM resources requested by tasks w.r.t. task termination type for in 2019 traces. Refer to figure~\ref{fig:spatialresourcewaste-requested} for plot explaination.}\label{fig:spatialresourcewaste-actual-csts}
+	\caption{Percentages of CPU and RAM resources requested by tasks w.r.t.\ task termination type for in 2019 traces. Refer to figure~\ref{fig:spatialresourcewaste-requested} for plot explaination.}\label{fig:spatialresourcewaste-requested-csts}
 \end{figure}
--- a/report/figures/table_iii.tex
+++ b/report/figures/table_iii.tex
@ -63,7 +63,7 @@ FINISH &    2.962 (2) &  0.022 &  0.012 & 2.915 &  0.013 \\
 	tables show an
 	overall mean accompanied by the 95-th percentile of all termination
 	events, followed by the mean of events per event type of each
-	termination event.}
+	termination event.}\label{fig:tableIII}
 \end{figure}
 \begin{figure}[p]
--- a/report/references.bib
+++ b/report/references.bib
@ -14,7 +14,7 @@ booktitle	= {EuroSys'20},
 address	= {Heraklion, Crete}
 }
-@INPROCEEDINGS{vino-paper,
+@INPROCEEDINGS{dsn-paper,
  author={Rosà, Andrea and Chen, Lydia Y. and Binder, Walter},
  booktitle={2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks},
  title={Understanding the Dark Side of Big Data Clusters: An Analysis beyond Failures},