report work
This commit is contained in:
parent
e200cea3ab
commit
d6780ffa6c
2 changed files with 49 additions and 27 deletions
Binary file not shown.
|
@ -449,10 +449,7 @@ computing slowdown values given the previously computed execution attempt time
|
|||
deltas. Finally, the mean of the computed slowdown values is computed resulting
|
||||
in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
|
||||
|
||||
|
||||
|
||||
\section{Analysis: Performance Input of Unsuccessful Executions}
|
||||
\input{figures/machine_time_waste}
|
||||
|
||||
Our first investigation focuses on replicating the methodologies used in the
|
||||
2015 DSN Ros\'a et al.\ paper\cite{vino-paper} regarding usage of machine time
|
||||
|
@ -465,6 +462,7 @@ from the 2019 traces to the ones that were obtained in 2015 to understand the
|
|||
workload evolution inside Borg between 2011 and 2019.
|
||||
|
||||
\subsection{Temporal Impact: Machine Time Waste}
|
||||
\input{figures/machine_time_waste}
|
||||
|
||||
This analysis explores how machine time is distributed over task events and
|
||||
submissions. By partitioning the collection of all terminating tasks by their
|
||||
|
@ -565,35 +563,59 @@ higher machine time spent for unsuccesful executions (as observed in the
|
|||
previous analysis) and increase slowdown rate for this class is not particularly
|
||||
surprising.
|
||||
|
||||
\textbf{TBD}
|
||||
The \% of finishing jobs is relatively low comparing with the 2011
|
||||
traces.
|
||||
The amount of non-successful task terminations in the 2019 traces is also rather
|
||||
high when compared to 2011 data, as it can evinced by the low percentage of
|
||||
\texttt{FINISH}ed tasks across priority tiers.
|
||||
|
||||
Another noteworthy difference is in the mean response times for all and last
|
||||
executions: while the mean response is overall shorter in time in the 2019
|
||||
traces by an order of magnitude, the new traces show an overall significantly
|
||||
higher mean response time than in the 2011 data.
|
||||
|
||||
Across 2019 single clusters (as in figure~\ref{fig:taskslowdown-csts}), the data
|
||||
shows a mostly uniform behaviour, other than for some noteworthy mean slowdown
|
||||
spikes. Indeed, cluster A has 82.97 mean slowdown in the ``Free'' tier,
|
||||
cluster G has 19.06 and 14.57 mean slowdown in the ``BEB'' and ``Production''
|
||||
tier respectively, and Cluster D has 12.04 mean slowdown in its ``Free'' tier.
|
||||
|
||||
\subsection{Spatial Impact: Resource Waste}
|
||||
\input{figures/spatial_resource_waste}
|
||||
|
||||
In this analyzis we aim to understand how physical resources of machines
|
||||
in the Borg cluster are used to complete tasks. In particular, we compare how
|
||||
CPU and Memory resource allocation and usage are distributed among tasks based
|
||||
on their termination
|
||||
type.
|
||||
|
||||
Due to limited computational resources w.r.t.\ the data analysis process, the
|
||||
resource usage for clusters E to H in the 2019 traces is missing. However, a
|
||||
comparison between 2011 resource usage and the aggregated resource usage of
|
||||
clusters A to D in the 2019 traces can be found in
|
||||
figure~\ref{fig:spatialresourcewaste-actual}. Additionally, a
|
||||
cluster-by-cluster breakdown for the 2019 data can be found in
|
||||
figure~\ref{fig:spatialresourcewaste-actual-csts}.
|
||||
|
||||
From these figures it is clear that, compared to the relatively even
|
||||
distribution of used resources in the 2011 traces, the distribution of resources
|
||||
in the 2019 Borg clusters became strikingly uneven, registering a combined
|
||||
86.29\% of
|
||||
CPU resource usage and 84.86\% memory usage for \texttt{KILL}ed tasks. Instead,
|
||||
all other task termination types have a significantly lower resource usage:
|
||||
\texttt{EVICT}ed, \texttt{FAIL}ed and \texttt{FINISH}ed tasks register respectively
|
||||
8.53\%, 3.17\% and 2.02\% CPU usage and 9.03\%, 4.45\%, and 1.66\% memory usage.
|
||||
This resource distribution can also be found in the data from individual
|
||||
clusters in figure~\ref{fig:spatialresourcewaste-actual-csts}, with always more
|
||||
than 80\% of resources devoted to \texttt{KILL}ed tasks.
|
||||
|
||||
With more than 98\% of CPU and memory resources used by ultimately
|
||||
non-successful tasks, it is clear the spatial resource waste is high in the 2019
|
||||
traces.
|
||||
|
||||
\textbf{TBD figure~\ref{fig:spatialresourcewaste-requested}}
|
||||
|
||||
\input{figures/table_iii} % has table III and table IV in it
|
||||
\input{figures/figure_5}
|
||||
|
||||
\hypertarget{reserved-and-actual-resource-usage-of-tasks}{%
|
||||
\subsection{Reserved and actual resource usage of
|
||||
tasks}\label{reserved-and-actual-resource-usage-of-tasks}}
|
||||
|
||||
|
||||
Refer to figures \ref{fig:spatialresourcewaste-actual} and
|
||||
\ref{fig:spatialresourcewaste-requested}.
|
||||
|
||||
\textbf{Observations}:
|
||||
|
||||
\begin{itemize}
|
||||
\item
|
||||
Most (mesasured and requested) resources are used by killed job, even
|
||||
more than in the 2011 traces.
|
||||
\item
|
||||
Behaviour is rather homogeneous across datacenters, with the exception
|
||||
of cluster G where a lot of LOST-terminated tasks acquired 70\% of
|
||||
both CPU and RAM
|
||||
\end{itemize}
|
||||
|
||||
|
||||
Refer to figure \ref{fig:tableIII}.
|
||||
|
||||
\textbf{Observations}:
|
||||
|
|
Loading…
Reference in a new issue