diff --git a/report/Claudio_Maggioni_report.pdf b/report/Claudio_Maggioni_report.pdf index 32d4fa93..92ba70f9 100644 Binary files a/report/Claudio_Maggioni_report.pdf and b/report/Claudio_Maggioni_report.pdf differ diff --git a/report/Claudio_Maggioni_report.tex b/report/Claudio_Maggioni_report.tex index 146eba54..625eee10 100644 --- a/report/Claudio_Maggioni_report.tex +++ b/report/Claudio_Maggioni_report.tex @@ -449,10 +449,7 @@ computing slowdown values given the previously computed execution attempt time deltas. Finally, the mean of the computed slowdown values is computed resulting in the clear and coincise tables found in figure~\ref{fig:taskslowdown}. - - \section{Analysis: Performance Input of Unsuccessful Executions} -\input{figures/machine_time_waste} Our first investigation focuses on replicating the methodologies used in the 2015 DSN Ros\'a et al.\ paper\cite{vino-paper} regarding usage of machine time @@ -465,6 +462,7 @@ from the 2019 traces to the ones that were obtained in 2015 to understand the workload evolution inside Borg between 2011 and 2019. \subsection{Temporal Impact: Machine Time Waste} +\input{figures/machine_time_waste} This analysis explores how machine time is distributed over task events and submissions. By partitioning the collection of all terminating tasks by their @@ -565,35 +563,59 @@ higher machine time spent for unsuccesful executions (as observed in the previous analysis) and increase slowdown rate for this class is not particularly surprising. -\textbf{TBD} - The \% of finishing jobs is relatively low comparing with the 2011 - traces. +The amount of non-successful task terminations in the 2019 traces is also rather +high when compared to 2011 data, as it can evinced by the low percentage of +\texttt{FINISH}ed tasks across priority tiers. +Another noteworthy difference is in the mean response times for all and last +executions: while the mean response is overall shorter in time in the 2019 +traces by an order of magnitude, the new traces show an overall significantly +higher mean response time than in the 2011 data. + +Across 2019 single clusters (as in figure~\ref{fig:taskslowdown-csts}), the data +shows a mostly uniform behaviour, other than for some noteworthy mean slowdown +spikes. Indeed, cluster A has 82.97 mean slowdown in the ``Free'' tier, +cluster G has 19.06 and 14.57 mean slowdown in the ``BEB'' and ``Production'' +tier respectively, and Cluster D has 12.04 mean slowdown in its ``Free'' tier. + +\subsection{Spatial Impact: Resource Waste} \input{figures/spatial_resource_waste} + +In this analyzis we aim to understand how physical resources of machines +in the Borg cluster are used to complete tasks. In particular, we compare how +CPU and Memory resource allocation and usage are distributed among tasks based +on their termination +type. + +Due to limited computational resources w.r.t.\ the data analysis process, the +resource usage for clusters E to H in the 2019 traces is missing. However, a +comparison between 2011 resource usage and the aggregated resource usage of +clusters A to D in the 2019 traces can be found in +figure~\ref{fig:spatialresourcewaste-actual}. Additionally, a +cluster-by-cluster breakdown for the 2019 data can be found in +figure~\ref{fig:spatialresourcewaste-actual-csts}. + +From these figures it is clear that, compared to the relatively even +distribution of used resources in the 2011 traces, the distribution of resources +in the 2019 Borg clusters became strikingly uneven, registering a combined +86.29\% of +CPU resource usage and 84.86\% memory usage for \texttt{KILL}ed tasks. Instead, +all other task termination types have a significantly lower resource usage: +\texttt{EVICT}ed, \texttt{FAIL}ed and \texttt{FINISH}ed tasks register respectively +8.53\%, 3.17\% and 2.02\% CPU usage and 9.03\%, 4.45\%, and 1.66\% memory usage. +This resource distribution can also be found in the data from individual +clusters in figure~\ref{fig:spatialresourcewaste-actual-csts}, with always more +than 80\% of resources devoted to \texttt{KILL}ed tasks. + +With more than 98\% of CPU and memory resources used by ultimately +non-successful tasks, it is clear the spatial resource waste is high in the 2019 +traces. + +\textbf{TBD figure~\ref{fig:spatialresourcewaste-requested}} + \input{figures/table_iii} % has table III and table IV in it \input{figures/figure_5} -\hypertarget{reserved-and-actual-resource-usage-of-tasks}{% -\subsection{Reserved and actual resource usage of -tasks}\label{reserved-and-actual-resource-usage-of-tasks}} - - -Refer to figures \ref{fig:spatialresourcewaste-actual} and -\ref{fig:spatialresourcewaste-requested}. - -\textbf{Observations}: - -\begin{itemize} -\item - Most (mesasured and requested) resources are used by killed job, even - more than in the 2011 traces. -\item - Behaviour is rather homogeneous across datacenters, with the exception - of cluster G where a lot of LOST-terminated tasks acquired 70\% of - both CPU and RAM -\end{itemize} - - Refer to figure \ref{fig:tableIII}. \textbf{Observations}: