Merge branch 'master' of tea.maggioni.xyz:maggicl/bachelorThesis
This commit is contained in:
commit
2752ad249f
3 changed files with 154 additions and 112 deletions
Binary file not shown.
|
@ -44,10 +44,10 @@ Switzerland]{Dr.}{Andrea}{Ros\'a}
|
|||
datacenters, focusing in particular on unsuccessful executions of jobs and
|
||||
tasks submitted by users. The objective of this project is to compare the
|
||||
resource waste caused by unsuccessful executions, their impact on application
|
||||
performance, and their root causes. We will show the strong negative impact on
|
||||
CPU and RAM usage and on task slowdown. We will analyze patterns of
|
||||
performance, and their root causes. We show the strong negative impact on
|
||||
CPU and RAM usage and on task slowdown. We analyze patterns of
|
||||
unsuccessful jobs and tasks, particularly focusing on their interdependency.
|
||||
Moreover, we will uncover their root causes by inspecting key workload and
|
||||
Moreover, we uncover their root causes by inspecting key workload and
|
||||
system attributes such asmachine locality and concurrency level.}
|
||||
|
||||
\begin{document}
|
||||
|
@ -82,19 +82,11 @@ and stored in JSONL format)\cite{google-drive-marso}, requiring a considerable
|
|||
amount of computational power to analyze them and the implementation of special
|
||||
data engineering techniques for analysis of the data.
|
||||
|
||||
\input{figures/machine_configs}
|
||||
|
||||
An overview of the machine configurations in the cluster analyzed with the 2011
|
||||
traces and in the 8 clusters composing the 2019 traces can be found in
|
||||
figure~\ref{fig:machineconfigs}. Additionally, in
|
||||
figure~\ref{fig:machineconfigs-csts}, the same machine configuration data is
|
||||
provided for the 2019 traces providing a cluster-by-cluster distribution of the
|
||||
machines.
|
||||
|
||||
This project aims to repeat the analysis performed in 2015 to highlight
|
||||
similarities and differences in workload this decade brought, and expanding the
|
||||
old analysis to understand even better the causes of failures and how to prevent
|
||||
them. Additionally, this report will provide an overview on the data engineering
|
||||
them. Additionally, this report provides an overview of the data engineering
|
||||
techniques used to perform the queries and analyses on the 2019 traces.
|
||||
|
||||
\subsection{Outline}
|
||||
|
@ -111,7 +103,23 @@ conclusions.
|
|||
|
||||
\section{State of the art}\label{sec2}
|
||||
|
||||
\textbf{TBD (introduce only 2015 dsn paper)}
|
||||
\begin{figure}[t]
|
||||
\begin{center}
|
||||
\begin{tabular}{cc}
|
||||
\textbf{Cluster} & \textbf{Timezone} \\ \hline
|
||||
A & America/New York \\
|
||||
B & America/Chicago \\
|
||||
C & America/New York \\
|
||||
D & America/New York \\
|
||||
E & Europe/Helsinki \\
|
||||
F & America/Chicago \\
|
||||
G & Asia/Singapore \\
|
||||
H & Europe/Brussels \\
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\caption{Approximate geographical location obtained from the datacenter's
|
||||
timezone of each cluster in the 2019 Google Borg traces.}\label{fig:clusters}
|
||||
\end{figure}
|
||||
|
||||
In 2015, Dr.~Andrea Rosà et al.\ published a
|
||||
research paper titled \textit{Understanding the Dark Side of Big Data Clusters:
|
||||
|
@ -123,6 +131,30 @@ failures. The salient conclusion of that research is that actually lots of
|
|||
computations performed by Google would eventually end in failure, then leading
|
||||
to large amounts of computational power being wasted.
|
||||
|
||||
However, with the release of the new 2019 traces, the results and conclusions
|
||||
found by that paper could be potentially outdated in the current large-scale
|
||||
computing world. The new traces not only provide updated data on Borg's
|
||||
workload, but provide more data as well: the new traces contain data from 8
|
||||
different Borg ``cells'' (i.e.\ clusters) in datacenters across the world,
|
||||
from now on referred as ``Cluster A'' to ``Cluster H''.
|
||||
|
||||
The geographical
|
||||
location of each cluster can be consulted in Figure~\ref{fig:clusters}. The
|
||||
information in that table was provided by the 2019 traces
|
||||
documentation\cite{google-drive-marso}.
|
||||
|
||||
The new 2019 traces provide richer data even on a cluster by cluster basis. For
|
||||
example, the amount and variety of server configurations per cluster increased
|
||||
significantly from 2011.
|
||||
An overview of the machine configurations in the cluster analyzed with the 2011
|
||||
traces and in the 8 clusters composing the 2019 traces can be found in
|
||||
Figure~\ref{fig:machineconfigs}. Additionally, in
|
||||
Figure~\ref{fig:machineconfigs-csts}, the same machine configuration data is
|
||||
provided for the 2019 traces providing a cluster-by-cluster distribution of the
|
||||
machines.
|
||||
|
||||
\input{figures/machine_configs}
|
||||
|
||||
\section{Background information}\label{sec3}
|
||||
|
||||
\textit{Borg} is Google's own cluster management software able to run
|
||||
|
@ -143,7 +175,7 @@ to large amounts of computational power being wasted.
|
|||
% encoded and stored in the trace as rows of various tables. Among the
|
||||
% information events provide, the field ``type'' provides information on the
|
||||
% execution status of the job or task. This field can have several values,
|
||||
% which are illustrated in figure~\ref{fig:eventtypes}.
|
||||
% which are illustrated in Figure~\ref{fig:eventtypes}.
|
||||
|
||||
\subsection{Traces}
|
||||
|
||||
|
@ -173,7 +205,7 @@ status of a task itself.
|
|||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\caption{Overview of job and task event types.}\label{fig:eventtypes}
|
||||
\caption{Overview of job and task termination event types.}\label{fig:eventtypes}
|
||||
\end{figure}
|
||||
|
||||
Figure~\ref{fig:eventTypes} shows the expected transitions between event
|
||||
|
@ -238,6 +270,7 @@ The scope of this thesis focuses on the tables
|
|||
\texttt{machine\_configs}, \texttt{instance\_events} and
|
||||
\texttt{collection\_events}.
|
||||
|
||||
|
||||
\hypertarget{remark-on-traces-size}{%
|
||||
\subsection{Remark on traces size}\label{remark-on-traces-size}}
|
||||
|
||||
|
@ -296,22 +329,24 @@ The chosen programming language for writing analysis scripts was Python.
|
|||
Spark has very powerful native Python bindings in the form of the
|
||||
\emph{PySpark} API, which were used to implement the various queries.
|
||||
|
||||
|
||||
\hypertarget{query-architecture}{%
|
||||
\subsection{Query architecture}\label{query-architecture}}
|
||||
|
||||
\subsubsection{Overview}
|
||||
|
||||
In general, each query written to execute the analysis
|
||||
follows a general Map-Reduce template.
|
||||
follows a Map-Reduce template. Traces are first read, then parsed, and then
|
||||
filtered by performing selections,
|
||||
projections and computing new derived fields.
|
||||
|
||||
Traces are first read, then parsed, and then filtered by performing selections,
|
||||
projections and computing new derived fields. After this preparation phase, the
|
||||
After this preparation phase, the
|
||||
trace records are often passed through a \texttt{groupby()} operation, which by
|
||||
choosing one or many record fields sorts all the records into several ``bins''
|
||||
containing records with matching values for the selected fields. Then, a map
|
||||
operation is applied to each bin in order to derive some aggregated property
|
||||
value for each grouping. Finally, a reduce operation is applied to either
|
||||
value for each grouping.
|
||||
|
||||
Finally, a reduce operation is applied to either
|
||||
further aggregate those computed properties or to generate an aggregated data
|
||||
structure for storage purposes.
|
||||
|
||||
|
@ -372,12 +407,12 @@ appreciate their behaviour.
|
|||
One example of analysis script with average complexity and a pretty
|
||||
straightforward structure is the pair of scripts \texttt{task\_slowdown.py} and
|
||||
\texttt{task\_slowdown\_table.py} used to compute the ``task slowdown'' tables
|
||||
(namely the tables in figure~\ref{fig:taskslowdown}).
|
||||
(namely the tables in Figure~\ref{fig:taskslowdown}).
|
||||
|
||||
``Slowdown'' is a task-wise measure of wasted execution time for tasks with a
|
||||
\texttt{FINISH} termination type. It is computed as the total execution time of
|
||||
the task divided by the execution time actually needed to complete the task
|
||||
(i.e. the total time of the last execution attempt, successful by definition).
|
||||
(i.e.\ the total time of the last execution attempt, successful by definition).
|
||||
|
||||
The analysis requires to compute the mean task slowdown for each task priority
|
||||
value, and additionally compute the percentage of tasks with successful
|
||||
|
@ -385,7 +420,7 @@ terminations per priority. The query therefore needs to compute the execution
|
|||
time of each execution attempt for each task, determine if each task has
|
||||
successful termination or not, and finally combine this data to compute
|
||||
slowdown, mean slowdown and ultimately the final table found in
|
||||
figure~\ref{fig:taskslowdown}.
|
||||
Figure~\ref{fig:taskslowdown}.
|
||||
|
||||
\begin{figure}[t]
|
||||
\hspace{-0.075\textwidth}
|
||||
|
@ -402,7 +437,7 @@ contains (among other data) all task event logs containing properties, event
|
|||
types and timestamps. As already explained in the previous section, the logical
|
||||
table file is actually stored as several Gzip-compressed JSONL shards. This is
|
||||
very useful for processing purposes, since Spark is able to parse and load in
|
||||
memory each shard in parallel, i.e. using all processing cores on the server
|
||||
memory each shard in parallel, i.e.\ using all processing cores on the server
|
||||
used to run the queries.
|
||||
|
||||
After loading the data, a selection and a projection operation are performed in
|
||||
|
@ -436,18 +471,18 @@ Finally, the \texttt{task\_slowdown\_table.py} processes this intermediate
|
|||
results to compute the percentage of successful tasks per execution and
|
||||
computing slowdown values given the previously computed execution attempt time
|
||||
deltas. Finally, the mean of the computed slowdown values is computed resulting
|
||||
in the clear and coincise tables found in figure~\ref{fig:taskslowdown}.
|
||||
in the clear and coincise tables found in Figure~\ref{fig:taskslowdown}.
|
||||
|
||||
\section{Analysis: Performance Input of Unsuccessful Executions}\label{sec5}
|
||||
|
||||
Our first investigation focuses on replicating the methodologies used in the
|
||||
2015 DSN Ros\'a et al.\ paper\cite{dsn-paper} regarding usage of machine time
|
||||
Our first investigation focuses on replicating the analysis done by the paper of
|
||||
Ros\'a et al.\ paper\cite{dsn-paper} regarding usage of machine time
|
||||
and resources.
|
||||
|
||||
In this section we perform several analyses focusing on how machine time and
|
||||
resources are wasted, by means of a temporal vs. spatial resource analysis from
|
||||
resources are wasted, by means of a temporal vs.\ spatial resource analysis from
|
||||
the perspective of single tasks as well as jobs. We then compare the results
|
||||
from the 2019 traces to the ones that were obtained in 2015 to understand the
|
||||
from the 2019 traces to the ones that were obtained before to understand the
|
||||
workload evolution inside Borg between 2011 and 2019.
|
||||
|
||||
We discover that the spatial and temporal impact of unsuccessful
|
||||
|
@ -458,22 +493,38 @@ termination event.
|
|||
\subsection{Temporal Impact: Machine Time Waste}
|
||||
\input{figures/machine_time_waste}
|
||||
|
||||
This analysis explores how machine time is distributed over task events and
|
||||
submissions. By partitioning the collection of all terminating tasks by their
|
||||
The goal of this analysis is to understand how much time is spent in doing
|
||||
useless computations by exploring how machine time is distributed over task
|
||||
events and submissions.
|
||||
|
||||
Before delving into the analysis itself, we define three kinds of events in a
|
||||
task's lifecycle:
|
||||
|
||||
\begin{description}
|
||||
\item[submission:] when a task is added or re-added to the Borg
|
||||
system queue, waiting to be scheduled;
|
||||
\item[scheduling:] when a task is removed from the Borg queue and
|
||||
its actual execution of potentially useful computations starts;
|
||||
\item[termination:] when a task terminates its computations either
|
||||
successfully or unsuccessfully.
|
||||
\end{description}
|
||||
|
||||
By partitioning the set of all terminating tasks by their
|
||||
termination event, the analysis aims to measure the total time spent by tasks in
|
||||
3 different execution phases:
|
||||
|
||||
\begin{description}
|
||||
\item[resubmission time:] the total of all time deltas between every task
|
||||
termination event and the immediately succeding task submission event, i.e.
|
||||
\item[resubmission time:] the total of all time intervals between every task
|
||||
termination event and the immediately succeding task submission event, i.e.\
|
||||
the total time spent by tasks waiting to be resubmitted in Borg after a
|
||||
termination;
|
||||
\item[queue time:] the total of all time deltas between every task submission
|
||||
event and the following task scheduling event, i.e. the total time spent by
|
||||
\item[queue time:] the total of all time intervals between every task submission
|
||||
event and the following task scheduling event, i.e.\ the total time spent by
|
||||
tasks queuing before execution;
|
||||
\item[running time:] the total of all time deltas between every task scheduling
|
||||
event and the following task termination event, i.e. the total time spent by
|
||||
tasks ``executing'' (i.e. performing useful computations) in the clusters.
|
||||
\item[running time:] the total of all time intervals between every task
|
||||
scheduling event and the following task termination event, i.e.\ the total
|
||||
time spent by tasks ``executing'' (i.e.\ performing potentially useful
|
||||
computations) in the clusters.
|
||||
\end{description}
|
||||
|
||||
In the 2019 traces, an additional ``Unknown'' measure is counted. This measure
|
||||
|
@ -482,17 +533,16 @@ events do not allow to safely assume in which execution phase a task may be.
|
|||
Unknown measures are mostly caused by faults and missed event writes in the task
|
||||
event log that was used to generate the traces.
|
||||
|
||||
The analysis results are depicted in figure~\ref{fig:machinetimewaste-rel} as a
|
||||
The analysis results are depicted in Figure~\ref{fig:machinetimewaste-rel} as a
|
||||
comparison between the 2011 and 2019 traces, aggregating the data from all
|
||||
clusters. Additionally, in figure~\ref{fig:machinetimewaste-rel-csts}
|
||||
clusters. Additionally, in Figure~\ref{fig:machinetimewaste-rel-csts}
|
||||
cluster-by-cluster breakdown result is provided for the 2019 traces.
|
||||
|
||||
The striking difference between 2011 and 2019 data is in the machine time
|
||||
distribution per task termination type. In the 2019 traces, 94.38\% of global
|
||||
machine time is spent on tasks that are eventually \texttt{KILL}ed.
|
||||
\texttt{FINISH}, \texttt{EVICT} and \texttt{FAIL} tasks respectively register
|
||||
totals of 4.20\%, 1.18\% and 0.25\% machine time, maintaining a analogous
|
||||
distribution between them to their distribution in the 2011 traces.
|
||||
totals of 4.20\%, 1.18\% and 0.25\% machine time.
|
||||
|
||||
Considering instead the distribution between execution phase times, the
|
||||
comparison shows very similar behaviour between the two traces, having the
|
||||
|
@ -508,32 +558,36 @@ w.r.t.\ of accuracy of task event logging.
|
|||
|
||||
Considering instead the behaviour of each single cluster in the 2019 traces, no
|
||||
significant difference beween them can be observed. The only notable difference
|
||||
lies between the ``Running time``-``Unknown time'' ratio in \texttt{KILL}ed
|
||||
lies between the ``Running time''-``Unknown time'' ratio in \texttt{KILL}ed
|
||||
tasks, which is at its highest in cluster A (at 30.78\% by 58.71\% of global
|
||||
machine time) and at its lowest in cluster H (at 8.06\% by 84.77\% of global
|
||||
machine time).
|
||||
|
||||
The takeaway from this analysis is that in the 2019 traces a lot of computation
|
||||
time is wasted in the execution of tasks that are eventually \texttt{KILL}ed,
|
||||
i.e.\ unsuccessful.
|
||||
|
||||
\subsection{Average Slowdown per Task}
|
||||
\input{figures/task_slowdown}
|
||||
|
||||
This analysis aims to measure the figure of ``slowdown'', which is defined as
|
||||
the ratio between the response time (i.e\. queue time and running time) of the
|
||||
last execution of a given task and the total response time across all
|
||||
executions of said task. This metric is especially useful to analyze the impact
|
||||
of unsuccesful executions on each task total execution time w.r.t.\ the intrinsic
|
||||
workload (i.e.\ computational time) of tasks.
|
||||
This analysis aims to measure the average of an ad-hoc defined parameter we call
|
||||
``slowdown''. We define it as the ratio between the total response time across
|
||||
all executions of the task and the response time (i.e.\ queue time and running
|
||||
time) of the last execution of said task. This metric is especially useful to
|
||||
analyze the impact of unsuccesful executions on each task total execution time
|
||||
w.r.t.\ the intrinsic workload (i.e.\ computational time) of tasks.
|
||||
|
||||
Refer to figure~\ref{fig:taskslowdown} for a comparison between the 2011 and
|
||||
Refer to Figure~\ref{fig:taskslowdown} for a comparison between the 2011 and
|
||||
2019 mean task slowdown measures broke down by task priority. Additionally, said
|
||||
means are computed on a cluster-by-cluster basis for 2019 data in
|
||||
figure~\ref{fig:taskslowdown-csts}.
|
||||
Figure~\ref{fig:taskslowdown-csts}.
|
||||
|
||||
In 2015 Ros\'a et al.\cite{dsn-paper} measured mean task slowdown per each task
|
||||
priority value, which at the time were $[0,11]$ numeric values. However,
|
||||
in 2019 traces, task priorities are given as a $[0,500]$ numeric value.
|
||||
Therefore, to allow for an easier comparison, mean task slowdown values are
|
||||
computed by task priority tier over the 2019 data. Priority tiers are
|
||||
semantically relevant priority ranges defined in the Tirmazi et al.
|
||||
priority value, which at the time were numeric values between 0 and 11. However,
|
||||
in 2019 traces, task priorities are given as a numeric value between 0 and 500.
|
||||
Therefore, to allow an easier comparison, mean task slowdown values are computed
|
||||
by task priority tier over the 2019 data. Priority tiers are semantically
|
||||
relevant priority ranges defined in the Tirmazi et al.\
|
||||
2020\cite{google-marso-19} that introduced the 2019 traces. Equivalent priority
|
||||
tiers are also provided next to the 2011 priority values in the table covering
|
||||
the 2015 analysis.
|
||||
|
@ -547,9 +601,9 @@ though this column shows the mean response time across all executions.
|
|||
\textbf{Mean slowdown} instead provides the mean slowdown value for each task
|
||||
priority/tier.
|
||||
|
||||
Comparing the tables in figure~\ref{fig:taskslowdown} we observe that the
|
||||
maximum mean slowdown measure for 2019 data (i.e.\ 7.84, for the BEB tier) is almost
|
||||
double of the maximum measure in 2011 data (i.e.\ 3.39, for priority $3$
|
||||
Comparing the tables in Figure~\ref{fig:taskslowdown} we observe that the
|
||||
maximum mean slowdown measure for 2019 data (i.e.\ 7.84, for the BEB tier) is
|
||||
almost double of the maximum measure in 2011 data (i.e.\ 3.39, for priority $3$
|
||||
corresponding to the BEB tier). The ``Best effort batch'' tier, as the name
|
||||
suggest, is a lower priority tier where failures are more tolerated. Therefore,
|
||||
due to the increased concurrency in the 2019 clusters compared to 2011 and the
|
||||
|
@ -566,7 +620,7 @@ executions: while the mean response is overall shorter in time in the 2019
|
|||
traces by an order of magnitude, the new traces show an overall significantly
|
||||
higher mean response time than in the 2011 data.
|
||||
|
||||
Across 2019 single clusters (as in figure~\ref{fig:taskslowdown-csts}), the data
|
||||
Across 2019 single clusters (as in Figure~\ref{fig:taskslowdown-csts}), the data
|
||||
shows a mostly uniform behaviour, other than for some noteworthy mean slowdown
|
||||
spikes. Indeed, cluster A has 82.97 mean slowdown in the ``Free'' tier,
|
||||
cluster G has 19.06 and 14.57 mean slowdown in the ``BEB'' and ``Production''
|
||||
|
@ -585,9 +639,9 @@ Due to limited computational resources w.r.t.\ the data analysis process, the
|
|||
resource usage for clusters E to H in the 2019 traces is missing. However, a
|
||||
comparison between 2011 resource usage and the aggregated resource usage of
|
||||
clusters A to D in the 2019 traces can be found in
|
||||
figure~\ref{fig:spatialresourcewaste-actual}. Additionally, a
|
||||
Figure~\ref{fig:spatialresourcewaste-actual}. Additionally, a
|
||||
cluster-by-cluster breakdown for the 2019 data can be found in
|
||||
figure~\ref{fig:spatialresourcewaste-actual-csts}.
|
||||
Figure~\ref{fig:spatialresourcewaste-actual-csts}.
|
||||
|
||||
From these figures it is clear that, compared to the relatively even
|
||||
distribution of used resources in the 2011 traces, the distribution of resources
|
||||
|
@ -598,14 +652,14 @@ all other task termination types have a significantly lower resource usage:
|
|||
\texttt{EVICT}ed, \texttt{FAIL}ed and \texttt{FINISH}ed tasks register respectively
|
||||
8.53\%, 3.17\% and 2.02\% CPU usage and 9.03\%, 4.45\%, and 1.66\% memory usage.
|
||||
This resource distribution can also be found in the data from individual
|
||||
clusters in figure~\ref{fig:spatialresourcewaste-actual-csts}, with always more
|
||||
clusters in Figure~\ref{fig:spatialresourcewaste-actual-csts}, with always more
|
||||
than 80\% of resources devoted to \texttt{KILL}ed tasks.
|
||||
|
||||
Considering now requested resources instead of used ones, a comparison between
|
||||
2011 and the aggregation of all A-H clusters of the 2019 traces can be found in
|
||||
figure~\ref{fig:spatialresourcewaste-requested}. Additionally, a
|
||||
Figure~\ref{fig:spatialresourcewaste-requested}. Additionally, a
|
||||
cluster-by-cluster breakdown for single 2019 clusters can be found in
|
||||
figure~\ref{fig:spatialresourcewaste-requested-csts}.
|
||||
Figure~\ref{fig:spatialresourcewaste-requested-csts}.
|
||||
|
||||
Here \texttt{KILL}ed jobs dominate even more the distribution of resources,
|
||||
reaching a global 97.21\% of CPU allocation and a global 96.89\% of memory
|
||||
|
@ -615,7 +669,7 @@ respective CPU allocation figures of 2.73\%, 0.06\% and 0.0012\% and memory
|
|||
allocation figures of 3.04\%, 0.06\% and 0.012\%.
|
||||
|
||||
Behaviour across clusters (as
|
||||
evinced in figure~\ref{fig:spatialresourcewaste-requested-csts}) in terms of
|
||||
evinced in Figure~\ref{fig:spatialresourcewaste-requested-csts}) in terms of
|
||||
requested resources is pretty homogeneous, with the exception of cluster A
|
||||
having a relatively high 2.85\% CPU and 3.42\% memory resource requests from
|
||||
\texttt{EVICT}ed tasks and cluster E having a noteworthy 1.67\% CPU and 1.31\%
|
||||
|
@ -651,9 +705,9 @@ the task-level events, namely \texttt{EVICT}, \texttt{FAIL}, \texttt{FINISH}
|
|||
and \texttt{KILL} termination events.
|
||||
|
||||
A comparison of the termination event distribution between the 2011 and 2019
|
||||
traces is shown in figure~\ref{fig:tableIII}. Additionally, a cluster-by-cluster
|
||||
traces is shown in Figure~\ref{fig:tableIII}. Additionally, a cluster-by-cluster
|
||||
breakdown of the same data for the 2019 traces is shown in
|
||||
figure~\ref{fig:tableIII-csts}.
|
||||
Figure~\ref{fig:tableIII-csts}.
|
||||
|
||||
Each table from these figure shows the mean and the 95-th percentile of the
|
||||
number of termination events per task, broke down by task termination. In
|
||||
|
@ -677,7 +731,7 @@ jobs and their \texttt{EVICT} events (1.876 on average per task with a 8.763
|
|||
event overall average).
|
||||
|
||||
Considering cluster-by-cluster behaviour in the 2019 traces (as reported in
|
||||
figure~\ref{fig:tableIII-csts}) the general observations still hold for each
|
||||
Figure~\ref{fig:tableIII-csts}) the general observations still hold for each
|
||||
cluster, albeit with event count averages having different magnitudes. Notably,
|
||||
cluster E registers the highest per-event average, with \texttt{FAIL}ed tasks
|
||||
experiencing 111.471 \texttt{FAIL} events out of \texttt{112.384}.
|
||||
|
@ -692,11 +746,11 @@ given number of unsuccessful events could affect the termination of the task it
|
|||
belongs to.
|
||||
|
||||
Conditional probabilities of each unsuccessful event type are shown in the form
|
||||
of a plot in figure~\ref{fig:figureV}, comparing the 2011 traces with the
|
||||
overall data from the 2019 ones, and in figure~\ref{fig:figureV-csts}, as a
|
||||
of a plot in Figure~\ref{fig:figureV}, comparing the 2011 traces with the
|
||||
overall data from the 2019 ones, and in Figure~\ref{fig:figureV-csts}, as a
|
||||
cluster-by-cluster breakdown of the same data for the 2019 traces.
|
||||
|
||||
In figure~\ref{fig:figureV} the 2011 and 2019 plots differ in their x-axis:
|
||||
In Figure~\ref{fig:figureV} the 2011 and 2019 plots differ in their x-axis:
|
||||
for 2011 data conditional probabilities are computed for a maximum event coun
|
||||
t of 30, while for 2019 data are computed for up to 50 events of a specific
|
||||
kind. Nevertheless, another quite striking difference between the two plots can
|
||||
|
@ -716,7 +770,7 @@ The \texttt{FAIL} probability curve has instead 18.55\%, 1.79\%, 14.49\%,
|
|||
2.08\%, 2.40\%, and 1.29\% success probabilities for the same range.
|
||||
|
||||
Considering cluster-to-cluster behaviour in the 2019 traces (as shown in
|
||||
figure~\ref{fig:figureV-csts}), some clusters show quite similar behaviour to
|
||||
Figure~\ref{fig:figureV-csts}), some clusters show quite similar behaviour to
|
||||
the aggregated plot (namely clusters A, F, and H), while some other clusters
|
||||
show very oscillating probability distribution function curves for
|
||||
\texttt{EVICT} and \texttt{FINISH} curves. \texttt{KILL} behaviour is instead
|
||||
|
@ -725,15 +779,15 @@ homogeneous even on a single cluster basis.
|
|||
\subsection{Unsuccessful Job Event Patterns}\label{tabIV-section}
|
||||
\input{figures/table_iv}
|
||||
|
||||
This analysis uses very similar techniques to the ones used in
|
||||
The analysis uses very similar techniques to the ones used in
|
||||
Section~\ref{tabIII-section}, but focusing at the job level instead. The aim is
|
||||
to better understand the task-job level relationship and to understand how
|
||||
task-level termination events can influence the termination state of a job.
|
||||
|
||||
A comparison of the analyzed parameters between the 2011 and 2019
|
||||
traces is shown in figure~\ref{fig:tableIV}. Additionally, a cluster-by-cluster
|
||||
traces is shown in Figure~\ref{fig:tableIV}. Additionally, a cluster-by-cluster
|
||||
breakdown of the same data for the 2019 traces is shown in
|
||||
figure~\ref{fig:tableIV-csts}.
|
||||
Figure~\ref{fig:tableIV-csts}.
|
||||
|
||||
Considering the distribution of number of tasks in a job, the 2019 traces show a
|
||||
decrease for the mean figure (e.g.\ for \texttt{FAIL}ed jobs, with a mean 60.5
|
||||
|
@ -751,7 +805,7 @@ the \texttt{FINISH}ed job category has a new event distribution too, with
|
|||
\texttt{FINISH} task events being the most popular at 1.778 events per job in
|
||||
the 2019 traces.
|
||||
|
||||
The cluster-by-cluster comparison in figure~\ref{fig:tableIV-csts} shows that
|
||||
The cluster-by-cluster comparison in Figure~\ref{fig:tableIV-csts} shows that
|
||||
the number of tasks per job are generally distributed similarly to the
|
||||
aggregated data, with only cluster H having remarkably low mean and 95-th
|
||||
percentiles overall. Event-wise, for \texttt{EVICT}ed, \texttt{FINISH}ed,
|
||||
|
@ -772,31 +826,18 @@ probabilities based on the number of task termination events of a specific type.
|
|||
Finally, Section~\ref{tabIV-section} aims to find similar correlations, but at
|
||||
the job level.
|
||||
|
||||
\section{Analysis: Potential Causes of Unsuccessful Executions}
|
||||
|
||||
In this section, we search for the root causes of different unsuccessful jobs
|
||||
and events, and derive their implications on system design. Our analysis resorts
|
||||
to a black-box approach due to the limited information available on the system.
|
||||
We consider two levels of statistics, i.e., events vs. jobs, where the former
|
||||
directly impacts spatial and temporal waste, whereas the latter is directly
|
||||
correlated to the performance perceived by users. For the event analysis, we
|
||||
focus on task priority, event execution time, machine concurrency, and requested
|
||||
resources. Moreover, to see the impact of resource efficiency on tasks
|
||||
executions, we correlate events with resource reservation and utilization on
|
||||
machines. As for the job analysis, we study the job size, machine locality, and
|
||||
job execution time.
|
||||
The aim of this section is to analyze several task-level and job-level
|
||||
parameters in order to find correlations with the success of an execution. By
|
||||
using the tecniques used in Section V of the Rosa\' et al.\
|
||||
paper\cite{dsn-paper} we analyze
|
||||
task events' metadata, the use of CPU and Memory resources at the task level,
|
||||
and job metadata respectively in Section~\ref{fig7-section},
|
||||
Section~\ref{fig8-section} and Section~\ref{fig9-section}.
|
||||
|
||||
In the following analysis, we present how different event/job types happen, with
|
||||
respect to different ranges of attributes. For each type $i$, we compute the
|
||||
metric of event (job) rate, defined as the number of type $i$ events (jobs)
|
||||
divided by the total number of events (jobs). Event/job rates are computed for
|
||||
each range of attributes. For example, one can compute the eviction rate for
|
||||
priorities in the range $[0,1]$ as the number of eviction events that involved
|
||||
priorities [0,1] divided by the total number of events for priorities $[0,1] .$
|
||||
One can also view event/job rates as the probability that events/jobs end with
|
||||
certain types of outcomes.
|
||||
|
||||
\subsection{Event rates vs. task priority, event execution time, and machine
|
||||
concurrency.}
|
||||
\subsection{Event rates vs.\ task priority, event execution time, and machine
|
||||
concurrency.}\label{fig7-section}
|
||||
|
||||
\input{figures/figure_7}
|
||||
|
||||
|
@ -825,17 +866,18 @@ Refer to figures \ref{fig:figureVII-a}, \ref{fig:figureVII-b}, and
|
|||
\end{itemize}
|
||||
|
||||
\subsection{Event Rates vs. Requested Resources, Resource Reservation, and
|
||||
Resource Utilization}
|
||||
Resource Utilization}\label{fig8-section}
|
||||
\input{figures/figure_8}
|
||||
|
||||
Refer to figure~\ref{fig:figureVIII-a}, figure~\ref{fig:figureVIII-a-csts}
|
||||
figure~\ref{fig:figureVIII-b}, figure~\ref{fig:figureVIII-b-csts}
|
||||
figure~\ref{fig:figureVIII-c}, figure~\ref{fig:figureVIII-c-csts}
|
||||
figure~\ref{fig:figureVIII-d}, figure~\ref{fig:figureVIII-d-csts}
|
||||
figure~\ref{fig:figureVIII-e}, figure~\ref{fig:figureVIII-e-csts}
|
||||
figure~\ref{fig:figureVIII-f}, and figure~\ref{fig:figureVIII-f-csts}.
|
||||
Refer to Figure~\ref{fig:figureVIII-a}, Figure~\ref{fig:figureVIII-a-csts}
|
||||
Figure~\ref{fig:figureVIII-b}, Figure~\ref{fig:figureVIII-b-csts}
|
||||
Figure~\ref{fig:figureVIII-c}, Figure~\ref{fig:figureVIII-c-csts}
|
||||
Figure~\ref{fig:figureVIII-d}, Figure~\ref{fig:figureVIII-d-csts}
|
||||
Figure~\ref{fig:figureVIII-e}, Figure~\ref{fig:figureVIII-e-csts}
|
||||
Figure~\ref{fig:figureVIII-f}, and Figure~\ref{fig:figureVIII-f-csts}.
|
||||
|
||||
\subsection{Job Rates vs. Job Size, Job Execution Time, and Machine Locality}
|
||||
\subsection{Job Rates vs. Job Size, Job Execution Time, and Machine Locality
|
||||
}\label{fig9-section}
|
||||
\input{figures/figure_9}
|
||||
|
||||
Refer to figures \ref{fig:figureIX-a}, \ref{fig:figureIX-b}, and
|
||||
|
|
|
@ -9,8 +9,8 @@
|
|||
\begin{figure}[p]
|
||||
\machinetimewaste[1]{2011 data}{cluster_2011.pgf}
|
||||
\machinetimewaste[1]{2019 data}{cluster_all.pgf}
|
||||
\caption{Relative task time (in milliseconds) spent in each execution phase
|
||||
w.r.t. task termination in 2011 and 2019 traces. X axis shows task termination type,
|
||||
\caption{Relative task time spent in each execution phase
|
||||
w.r.t.\ task termination in 2011 and 2019 (all clusters aggregated) traces. The x-axis shows task termination type,
|
||||
Y axis shows total time \% spent. Colors break down the time in execution phases. ``Unknown'' execution times are
|
||||
2019 specific and correspond to event time transitions that are not consider ``typical'' by Google.}\label{fig:machinetimewaste-rel}
|
||||
\end{figure}
|
||||
|
@ -24,6 +24,6 @@ Y axis shows total time \% spent. Colors break down the time in execution phases
|
|||
\machinetimewaste{Cluster F}{cluster_f.pgf}
|
||||
\machinetimewaste{Cluster G}{cluster_g.pgf}
|
||||
\machinetimewaste{Cluster H}{cluster_h.pgf}
|
||||
\caption{Relative task time (in milliseconds) spent in each execution phase w.r.t. clusters in the
|
||||
2019 trace. Refer to figure~\ref{fig:machinetimewaste-rel} for axes description.}\label{fig:machinetimewaste-rel-csts}
|
||||
\caption{Relative task time spent in each execution phase w.r.t. clusters in the
|
||||
2019 trace. Refer to Figure~\ref{fig:machinetimewaste-rel} for axes description.}\label{fig:machinetimewaste-rel-csts}
|
||||
\end{figure}
|
||||
|
|
Loading…
Reference in a new issue