report work

2021-05-27 15:20:08 +02:00 · 2021-05-27 15:20:08 +02:00 · d6780ffa6c
commit d6780ffa6c
parent e200cea3ab
2 changed files with 49 additions and 27 deletions
--- a/report/Claudio_Maggioni_report.pdf
+++ b/report/Claudio_Maggioni_report.pdf
--- a/report/Claudio_Maggioni_report.tex
+++ b/report/Claudio_Maggioni_report.tex
@ -449,10 +449,7 @@ computing slowdown values given the previously computed execution attempt time
 deltas. Finally, the mean of the computed slowdown values is computed resulting
 in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
 \section{Analysis: Performance Input of Unsuccessful Executions}
 \input{figures/machine_time_waste}
 Our first investigation focuses on replicating the methodologies used in the
 2015 DSN Ros\'a et al.\ paper\cite{vino-paper} regarding usage of machine time
@ -465,6 +462,7 @@ from the 2019 traces to the ones that were obtained in 2015 to understand the
 workload evolution inside Borg between 2011 and 2019.
 \subsection{Temporal Impact: Machine Time Waste}
 \input{figures/machine_time_waste}
 This analysis explores how machine time is distributed over task events and
 submissions. By partitioning the collection of all terminating tasks by their
@ -565,35 +563,59 @@ higher machine time spent for unsuccesful executions (as observed in the
 previous analysis) and increase slowdown rate for this class is not particularly
 surprising.
-\textbf{TBD}
+The amount of non-successful task terminations in the 2019 traces is also rather
-  The \% of finishing jobs is relatively low comparing with the 2011
+high when compared to 2011 data, as it can evinced by the low percentage of
-  traces.
+\texttt{FINISH}ed tasks across priority tiers.
 Another noteworthy difference is in the mean response times for all and last
 executions: while the mean response is overall shorter in time in the 2019
 traces by an order of magnitude, the new traces show an overall significantly
 higher mean response time than in the 2011 data.
 Across 2019 single clusters (as in figure~\ref{fig:taskslowdown-csts}), the data
 shows a mostly uniform behaviour, other than for some noteworthy mean slowdown
 spikes. Indeed, cluster A has 82.97 mean slowdown in the ``Free'' tier,
 cluster G has 19.06 and 14.57 mean slowdown in the ``BEB'' and ``Production''
 tier respectively, and Cluster D has 12.04 mean slowdown in its ``Free'' tier.
 \subsection{Spatial Impact: Resource Waste}
 \input{figures/spatial_resource_waste}
 In this analyzis we aim to understand how physical resources of machines
 in the Borg cluster are used to complete tasks. In particular, we compare how
 CPU and Memory resource allocation and usage are distributed among tasks based
 on their termination
 type.
 Due to limited computational resources w.r.t.\ the data analysis process, the
 resource usage for clusters E to H in the 2019 traces is missing. However, a
 comparison between 2011 resource usage and the aggregated resource usage of
 clusters A to D in the 2019 traces can be found in
 figure~\ref{fig:spatialresourcewaste-actual}. Additionally, a
 cluster-by-cluster breakdown for the 2019 data can be found in
 figure~\ref{fig:spatialresourcewaste-actual-csts}.
 From these figures it is clear that, compared to the relatively even
 distribution of used resources in the 2011 traces, the distribution of resources
 in the 2019 Borg clusters became strikingly uneven, registering a combined
 86.29\% of
 CPU resource usage and 84.86\% memory usage for \texttt{KILL}ed tasks. Instead,
 all other task termination types have a significantly lower resource usage:
 \texttt{EVICT}ed, \texttt{FAIL}ed and \texttt{FINISH}ed tasks register respectively
 8.53\%, 3.17\% and 2.02\% CPU usage and 9.03\%, 4.45\%, and 1.66\% memory usage.
 This resource distribution can also be found in the data from individual
 clusters in figure~\ref{fig:spatialresourcewaste-actual-csts}, with always more
 than 80\% of resources devoted to \texttt{KILL}ed tasks.
 With more than 98\% of CPU and memory resources used by ultimately
 non-successful tasks, it is clear the spatial resource waste is high in the 2019
 traces.
 \textbf{TBD figure~\ref{fig:spatialresourcewaste-requested}}
 \input{figures/table_iii} % has table III and table IV in it
 \input{figures/figure_5}
 \hypertarget{reserved-and-actual-resource-usage-of-tasks}{%
 \subsection{Reserved and actual resource usage of
 tasks}\label{reserved-and-actual-resource-usage-of-tasks}}
 Refer to figures \ref{fig:spatialresourcewaste-actual} and
 \ref{fig:spatialresourcewaste-requested}.
 \textbf{Observations}:
 \begin{itemize}
 \item
  Most (mesasured and requested) resources are used by killed job, even
  more than in the 2011 traces.
 \item
  Behaviour is rather homogeneous across datacenters, with the exception
  of cluster G where a lot of LOST-terminated tasks acquired 70\% of
  both CPU and RAM
 \end{itemize}
 Refer to figure \ref{fig:tableIII}.
 \textbf{Observations}: