report work
This commit is contained in:
parent
e200cea3ab
commit
d6780ffa6c
2 changed files with 49 additions and 27 deletions
Binary file not shown.
|
@ -449,10 +449,7 @@ computing slowdown values given the previously computed execution attempt time
|
||||||
deltas. Finally, the mean of the computed slowdown values is computed resulting
|
deltas. Finally, the mean of the computed slowdown values is computed resulting
|
||||||
in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
|
in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{Analysis: Performance Input of Unsuccessful Executions}
|
\section{Analysis: Performance Input of Unsuccessful Executions}
|
||||||
\input{figures/machine_time_waste}
|
|
||||||
|
|
||||||
Our first investigation focuses on replicating the methodologies used in the
|
Our first investigation focuses on replicating the methodologies used in the
|
||||||
2015 DSN Ros\'a et al.\ paper\cite{vino-paper} regarding usage of machine time
|
2015 DSN Ros\'a et al.\ paper\cite{vino-paper} regarding usage of machine time
|
||||||
|
@ -465,6 +462,7 @@ from the 2019 traces to the ones that were obtained in 2015 to understand the
|
||||||
workload evolution inside Borg between 2011 and 2019.
|
workload evolution inside Borg between 2011 and 2019.
|
||||||
|
|
||||||
\subsection{Temporal Impact: Machine Time Waste}
|
\subsection{Temporal Impact: Machine Time Waste}
|
||||||
|
\input{figures/machine_time_waste}
|
||||||
|
|
||||||
This analysis explores how machine time is distributed over task events and
|
This analysis explores how machine time is distributed over task events and
|
||||||
submissions. By partitioning the collection of all terminating tasks by their
|
submissions. By partitioning the collection of all terminating tasks by their
|
||||||
|
@ -565,35 +563,59 @@ higher machine time spent for unsuccesful executions (as observed in the
|
||||||
previous analysis) and increase slowdown rate for this class is not particularly
|
previous analysis) and increase slowdown rate for this class is not particularly
|
||||||
surprising.
|
surprising.
|
||||||
|
|
||||||
\textbf{TBD}
|
The amount of non-successful task terminations in the 2019 traces is also rather
|
||||||
The \% of finishing jobs is relatively low comparing with the 2011
|
high when compared to 2011 data, as it can evinced by the low percentage of
|
||||||
|
\texttt{FINISH}ed tasks across priority tiers.
|
||||||
|
|
||||||
|
Another noteworthy difference is in the mean response times for all and last
|
||||||
|
executions: while the mean response is overall shorter in time in the 2019
|
||||||
|
traces by an order of magnitude, the new traces show an overall significantly
|
||||||
|
higher mean response time than in the 2011 data.
|
||||||
|
|
||||||
|
Across 2019 single clusters (as in figure~\ref{fig:taskslowdown-csts}), the data
|
||||||
|
shows a mostly uniform behaviour, other than for some noteworthy mean slowdown
|
||||||
|
spikes. Indeed, cluster A has 82.97 mean slowdown in the ``Free'' tier,
|
||||||
|
cluster G has 19.06 and 14.57 mean slowdown in the ``BEB'' and ``Production''
|
||||||
|
tier respectively, and Cluster D has 12.04 mean slowdown in its ``Free'' tier.
|
||||||
|
|
||||||
|
\subsection{Spatial Impact: Resource Waste}
|
||||||
|
\input{figures/spatial_resource_waste}
|
||||||
|
|
||||||
|
In this analyzis we aim to understand how physical resources of machines
|
||||||
|
in the Borg cluster are used to complete tasks. In particular, we compare how
|
||||||
|
CPU and Memory resource allocation and usage are distributed among tasks based
|
||||||
|
on their termination
|
||||||
|
type.
|
||||||
|
|
||||||
|
Due to limited computational resources w.r.t.\ the data analysis process, the
|
||||||
|
resource usage for clusters E to H in the 2019 traces is missing. However, a
|
||||||
|
comparison between 2011 resource usage and the aggregated resource usage of
|
||||||
|
clusters A to D in the 2019 traces can be found in
|
||||||
|
figure~\ref{fig:spatialresourcewaste-actual}. Additionally, a
|
||||||
|
cluster-by-cluster breakdown for the 2019 data can be found in
|
||||||
|
figure~\ref{fig:spatialresourcewaste-actual-csts}.
|
||||||
|
|
||||||
|
From these figures it is clear that, compared to the relatively even
|
||||||
|
distribution of used resources in the 2011 traces, the distribution of resources
|
||||||
|
in the 2019 Borg clusters became strikingly uneven, registering a combined
|
||||||
|
86.29\% of
|
||||||
|
CPU resource usage and 84.86\% memory usage for \texttt{KILL}ed tasks. Instead,
|
||||||
|
all other task termination types have a significantly lower resource usage:
|
||||||
|
\texttt{EVICT}ed, \texttt{FAIL}ed and \texttt{FINISH}ed tasks register respectively
|
||||||
|
8.53\%, 3.17\% and 2.02\% CPU usage and 9.03\%, 4.45\%, and 1.66\% memory usage.
|
||||||
|
This resource distribution can also be found in the data from individual
|
||||||
|
clusters in figure~\ref{fig:spatialresourcewaste-actual-csts}, with always more
|
||||||
|
than 80\% of resources devoted to \texttt{KILL}ed tasks.
|
||||||
|
|
||||||
|
With more than 98\% of CPU and memory resources used by ultimately
|
||||||
|
non-successful tasks, it is clear the spatial resource waste is high in the 2019
|
||||||
traces.
|
traces.
|
||||||
|
|
||||||
\input{figures/spatial_resource_waste}
|
\textbf{TBD figure~\ref{fig:spatialresourcewaste-requested}}
|
||||||
|
|
||||||
\input{figures/table_iii} % has table III and table IV in it
|
\input{figures/table_iii} % has table III and table IV in it
|
||||||
\input{figures/figure_5}
|
\input{figures/figure_5}
|
||||||
|
|
||||||
\hypertarget{reserved-and-actual-resource-usage-of-tasks}{%
|
|
||||||
\subsection{Reserved and actual resource usage of
|
|
||||||
tasks}\label{reserved-and-actual-resource-usage-of-tasks}}
|
|
||||||
|
|
||||||
|
|
||||||
Refer to figures \ref{fig:spatialresourcewaste-actual} and
|
|
||||||
\ref{fig:spatialresourcewaste-requested}.
|
|
||||||
|
|
||||||
\textbf{Observations}:
|
|
||||||
|
|
||||||
\begin{itemize}
|
|
||||||
\item
|
|
||||||
Most (mesasured and requested) resources are used by killed job, even
|
|
||||||
more than in the 2011 traces.
|
|
||||||
\item
|
|
||||||
Behaviour is rather homogeneous across datacenters, with the exception
|
|
||||||
of cluster G where a lot of LOST-terminated tasks acquired 70\% of
|
|
||||||
both CPU and RAM
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
|
|
||||||
Refer to figure \ref{fig:tableIII}.
|
Refer to figure \ref{fig:tableIII}.
|
||||||
|
|
||||||
\textbf{Observations}:
|
\textbf{Observations}:
|
||||||
|
|
Loading…
Reference in a new issue