table iv mostly done

This commit is contained in:
Claudio Maggioni (maggicl) 2021-06-13 21:59:10 +02:00
parent cd33279754
commit ee48ba87b4
3 changed files with 200 additions and 139 deletions

Binary file not shown.

View file

@ -44,10 +44,10 @@ Switzerland]{Dr.}{Andrea}{Ros\'a}
datacenters, focusing in particular on unsuccessful executions of jobs and datacenters, focusing in particular on unsuccessful executions of jobs and
tasks submitted by users. The objective of this project is to compare the tasks submitted by users. The objective of this project is to compare the
resource waste caused by unsuccessful executions, their impact on application resource waste caused by unsuccessful executions, their impact on application
performance, and their root causes. We will show the strong negative impact on performance, and their root causes. We show the strong negative impact on
CPU and RAM usage and on task slowdown. We will analyze patterns of CPU and RAM usage and on task slowdown. We analyze patterns of
unsuccessful jobs and tasks, particularly focusing on their interdependency. unsuccessful jobs and tasks, particularly focusing on their interdependency.
Moreover, we will uncover their root causes by inspecting key workload and Moreover, we uncover their root causes by inspecting key workload and
system attributes such asmachine locality and concurrency level.} system attributes such asmachine locality and concurrency level.}
\begin{document} \begin{document}
@ -82,24 +82,31 @@ and stored in JSONL format)\cite{google-drive-marso}, requiring a considerable
amount of computational power to analyze them and the implementation of special amount of computational power to analyze them and the implementation of special
data engineering techniques for analysis of the data. data engineering techniques for analysis of the data.
\input{figures/machine_configs}
An overview of the machine configurations in the cluster analyzed with the 2011
traces and in the 8 clusters composing the 2019 traces can be found in
figure~\ref{fig:machineconfigs}. Additionally, in
figure~\ref{fig:machineconfigs-csts}, the same machine configuration data is
provided for the 2019 traces providing a cluster-by-cluster distribution of the
machines.
This project aims to repeat the analysis performed in 2015 to highlight This project aims to repeat the analysis performed in 2015 to highlight
similarities and differences in workload this decade brought, and expanding the similarities and differences in workload this decade brought, and expanding the
old analysis to understand even better the causes of failures and how to prevent old analysis to understand even better the causes of failures and how to prevent
them. Additionally, this report will provide an overview on the data engineering them. Additionally, this report provides an overview of the data engineering
techniques used to perform the queries and analyses on the 2019 traces. techniques used to perform the queries and analyses on the 2019 traces.
\section{State of the art} \section{State of the art}
\begin{figure}[t]
\textbf{TBD (introduce only 2015 dsn paper)} \begin{center}
\begin{tabular}{cc}
\textbf{Cluster} & \textbf{Timezone} \\ \hline
A & America/New York \\
B & America/Chicago \\
C & America/New York \\
D & America/New York \\
E & Europe/Helsinki \\
F & America/Chicago \\
G & Asia/Singapore \\
H & Europe/Brussels \\
\end{tabular}
\end{center}
\caption{Approximate geographical location obtained from the datacenter's
timezone of each cluster in the 2019 Google Borg traces.}\label{fig:clusters}
\end{figure}
In 2015, Dr.~Andrea Rosà et al.\ published a In 2015, Dr.~Andrea Rosà et al.\ published a
research paper titled \textit{Understanding the Dark Side of Big Data Clusters: research paper titled \textit{Understanding the Dark Side of Big Data Clusters:
@ -111,6 +118,30 @@ failures. The salient conclusion of that research is that actually lots of
computations performed by Google would eventually end in failure, then leading computations performed by Google would eventually end in failure, then leading
to large amounts of computational power being wasted. to large amounts of computational power being wasted.
However, with the release of the new 2019 traces, the results and conclusions
found by that paper could be potentially outdated in the current large-scale
computing world. The new traces not only provide updated data on Borg's
workload, but provide more data as well: the new traces contain data from 8
different Borg ``cells'' (i.e.\ clusters) in datacenters across the world,
from now on referred as ``Cluster A'' to ``Cluster H''.
The geographical
location of each cluster can be consulted in Figure~\ref{fig:clusters}. The
information in that table was provided by the 2019 traces
documentation\cite{google-drive-marso}.
The new 2019 traces provide richer data even on a cluster by cluster basis. For
example, the amount and variety of server configurations per cluster increased
significantly from 2011.
An overview of the machine configurations in the cluster analyzed with the 2011
traces and in the 8 clusters composing the 2019 traces can be found in
Figure~\ref{fig:machineconfigs}. Additionally, in
Figure~\ref{fig:machineconfigs-csts}, the same machine configuration data is
provided for the 2019 traces providing a cluster-by-cluster distribution of the
machines.
\input{figures/machine_configs}
\section{Background information} \section{Background information}
\textit{Borg} is Google's own cluster management software able to run \textit{Borg} is Google's own cluster management software able to run
@ -131,7 +162,7 @@ to large amounts of computational power being wasted.
% encoded and stored in the trace as rows of various tables. Among the % encoded and stored in the trace as rows of various tables. Among the
% information events provide, the field ``type'' provides information on the % information events provide, the field ``type'' provides information on the
% execution status of the job or task. This field can have several values, % execution status of the job or task. This field can have several values,
% which are illustrated in figure~\ref{fig:eventtypes}. % which are illustrated in Figure~\ref{fig:eventtypes}.
\subsection{Traces} \subsection{Traces}
@ -161,7 +192,7 @@ status of a task itself.
\bottomrule \bottomrule
\end{tabular} \end{tabular}
\end{center} \end{center}
\caption{Overview of job and task event types.}\label{fig:eventtypes} \caption{Overview of job and task termination event types.}\label{fig:eventtypes}
\end{figure} \end{figure}
Figure~\ref{fig:eventTypes} shows the expected transitions between event Figure~\ref{fig:eventTypes} shows the expected transitions between event
@ -226,6 +257,7 @@ The scope of this thesis focuses on the tables
\texttt{machine\_configs}, \texttt{instance\_events} and \texttt{machine\_configs}, \texttt{instance\_events} and
\texttt{collection\_events}. \texttt{collection\_events}.
\hypertarget{remark-on-traces-size}{% \hypertarget{remark-on-traces-size}{%
\subsection{Remark on traces size}\label{remark-on-traces-size}} \subsection{Remark on traces size}\label{remark-on-traces-size}}
@ -284,22 +316,24 @@ The chosen programming language for writing analysis scripts was Python.
Spark has very powerful native Python bindings in the form of the Spark has very powerful native Python bindings in the form of the
\emph{PySpark} API, which were used to implement the various queries. \emph{PySpark} API, which were used to implement the various queries.
\hypertarget{query-architecture}{% \hypertarget{query-architecture}{%
\subsection{Query architecture}\label{query-architecture}} \subsection{Query architecture}\label{query-architecture}}
\subsubsection{Overview} \subsubsection{Overview}
In general, each query written to execute the analysis In general, each query written to execute the analysis
follows a general Map-Reduce template. follows a Map-Reduce template. Traces are first read, then parsed, and then
filtered by performing selections,
projections and computing new derived fields.
Traces are first read, then parsed, and then filtered by performing selections, After this preparation phase, the
projections and computing new derived fields. After this preparation phase, the
trace records are often passed through a \texttt{groupby()} operation, which by trace records are often passed through a \texttt{groupby()} operation, which by
choosing one or many record fields sorts all the records into several ``bins'' choosing one or many record fields sorts all the records into several ``bins''
containing records with matching values for the selected fields. Then, a map containing records with matching values for the selected fields. Then, a map
operation is applied to each bin in order to derive some aggregated property operation is applied to each bin in order to derive some aggregated property
value for each grouping. Finally, a reduce operation is applied to either value for each grouping.
Finally, a reduce operation is applied to either
further aggregate those computed properties or to generate an aggregated data further aggregate those computed properties or to generate an aggregated data
structure for storage purposes. structure for storage purposes.
@ -360,12 +394,12 @@ appreciate their behaviour.
One example of analysis script with average complexity and a pretty One example of analysis script with average complexity and a pretty
straightforward structure is the pair of scripts \texttt{task\_slowdown.py} and straightforward structure is the pair of scripts \texttt{task\_slowdown.py} and
\texttt{task\_slowdown\_table.py} used to compute the ``task slowdown'' tables \texttt{task\_slowdown\_table.py} used to compute the ``task slowdown'' tables
(namely the tables in figure~\ref{fig:taskslowdown}). (namely the tables in Figure~\ref{fig:taskslowdown}).
``Slowdown'' is a task-wise measure of wasted execution time for tasks with a ``Slowdown'' is a task-wise measure of wasted execution time for tasks with a
\texttt{FINISH} termination type. It is computed as the total execution time of \texttt{FINISH} termination type. It is computed as the total execution time of
the task divided by the execution time actually needed to complete the task the task divided by the execution time actually needed to complete the task
(i.e. the total time of the last execution attempt, successful by definition). (i.e.\ the total time of the last execution attempt, successful by definition).
The analysis requires to compute the mean task slowdown for each task priority The analysis requires to compute the mean task slowdown for each task priority
value, and additionally compute the percentage of tasks with successful value, and additionally compute the percentage of tasks with successful
@ -373,7 +407,7 @@ terminations per priority. The query therefore needs to compute the execution
time of each execution attempt for each task, determine if each task has time of each execution attempt for each task, determine if each task has
successful termination or not, and finally combine this data to compute successful termination or not, and finally combine this data to compute
slowdown, mean slowdown and ultimately the final table found in slowdown, mean slowdown and ultimately the final table found in
figure~\ref{fig:taskslowdown}. Figure~\ref{fig:taskslowdown}.
\begin{figure}[t] \begin{figure}[t]
\hspace{-0.075\textwidth} \hspace{-0.075\textwidth}
@ -390,7 +424,7 @@ contains (among other data) all task event logs containing properties, event
types and timestamps. As already explained in the previous section, the logical types and timestamps. As already explained in the previous section, the logical
table file is actually stored as several Gzip-compressed JSONL shards. This is table file is actually stored as several Gzip-compressed JSONL shards. This is
very useful for processing purposes, since Spark is able to parse and load in very useful for processing purposes, since Spark is able to parse and load in
memory each shard in parallel, i.e. using all processing cores on the server memory each shard in parallel, i.e.\ using all processing cores on the server
used to run the queries. used to run the queries.
After loading the data, a selection and a projection operation are performed in After loading the data, a selection and a projection operation are performed in
@ -424,18 +458,18 @@ Finally, the \texttt{task\_slowdown\_table.py} processes this intermediate
results to compute the percentage of successful tasks per execution and results to compute the percentage of successful tasks per execution and
computing slowdown values given the previously computed execution attempt time computing slowdown values given the previously computed execution attempt time
deltas. Finally, the mean of the computed slowdown values is computed resulting deltas. Finally, the mean of the computed slowdown values is computed resulting
in the clear and coincise tables found in figure~\ref{fig:taskslowdown}. in the clear and coincise tables found in Figure~\ref{fig:taskslowdown}.
\section{Analysis: Performance Input of Unsuccessful Executions} \section{Analysis: Performance Input of Unsuccessful Executions}
Our first investigation focuses on replicating the methodologies used in the Our first investigation focuses on replicating the analysis done by the paper of
2015 DSN Ros\'a et al.\ paper\cite{dsn-paper} regarding usage of machine time Ros\'a et al.\ paper\cite{dsn-paper} regarding usage of machine time
and resources. and resources.
In this section we perform several analyses focusing on how machine time and In this section we perform several analyses focusing on how machine time and
resources are wasted, by means of a temporal vs. spatial resource analysis from resources are wasted, by means of a temporal vs.\ spatial resource analysis from
the perspective of single tasks as well as jobs. We then compare the results the perspective of single tasks as well as jobs. We then compare the results
from the 2019 traces to the ones that were obtained in 2015 to understand the from the 2019 traces to the ones that were obtained before to understand the
workload evolution inside Borg between 2011 and 2019. workload evolution inside Borg between 2011 and 2019.
We discover that the spatial and temporal impact of unsuccessful We discover that the spatial and temporal impact of unsuccessful
@ -446,22 +480,38 @@ termination event.
\subsection{Temporal Impact: Machine Time Waste} \subsection{Temporal Impact: Machine Time Waste}
\input{figures/machine_time_waste} \input{figures/machine_time_waste}
This analysis explores how machine time is distributed over task events and The goal of this analysis is to understand how much time is spent in doing
submissions. By partitioning the collection of all terminating tasks by their useless computations by exploring how machine time is distributed over task
events and submissions.
Before delving into the analysis itself, we define three kinds of events in a
task's lifecycle:
\begin{description}
\item[submission:] when a task is added or re-added to the Borg
system queue, waiting to be scheduled;
\item[scheduling:] when a task is removed from the Borg queue and
its actual execution of potentially useful computations starts;
\item[termination:] when a task terminates its computations either
successfully or unsuccessfully.
\end{description}
By partitioning the set of all terminating tasks by their
termination event, the analysis aims to measure the total time spent by tasks in termination event, the analysis aims to measure the total time spent by tasks in
3 different execution phases: 3 different execution phases:
\begin{description} \begin{description}
\item[resubmission time:] the total of all time deltas between every task \item[resubmission time:] the total of all time intervals between every task
termination event and the immediately succeding task submission event, i.e. termination event and the immediately succeding task submission event, i.e.\
the total time spent by tasks waiting to be resubmitted in Borg after a the total time spent by tasks waiting to be resubmitted in Borg after a
termination; termination;
\item[queue time:] the total of all time deltas between every task submission \item[queue time:] the total of all time intervals between every task submission
event and the following task scheduling event, i.e. the total time spent by event and the following task scheduling event, i.e.\ the total time spent by
tasks queuing before execution; tasks queuing before execution;
\item[running time:] the total of all time deltas between every task scheduling \item[running time:] the total of all time intervals between every task
event and the following task termination event, i.e. the total time spent by scheduling event and the following task termination event, i.e.\ the total
tasks ``executing'' (i.e. performing useful computations) in the clusters. time spent by tasks ``executing'' (i.e.\ performing potentially useful
computations) in the clusters.
\end{description} \end{description}
In the 2019 traces, an additional ``Unknown'' measure is counted. This measure In the 2019 traces, an additional ``Unknown'' measure is counted. This measure
@ -470,17 +520,16 @@ events do not allow to safely assume in which execution phase a task may be.
Unknown measures are mostly caused by faults and missed event writes in the task Unknown measures are mostly caused by faults and missed event writes in the task
event log that was used to generate the traces. event log that was used to generate the traces.
The analysis results are depicted in figure~\ref{fig:machinetimewaste-rel} as a The analysis results are depicted in Figure~\ref{fig:machinetimewaste-rel} as a
comparison between the 2011 and 2019 traces, aggregating the data from all comparison between the 2011 and 2019 traces, aggregating the data from all
clusters. Additionally, in figure~\ref{fig:machinetimewaste-rel-csts} clusters. Additionally, in Figure~\ref{fig:machinetimewaste-rel-csts}
cluster-by-cluster breakdown result is provided for the 2019 traces. cluster-by-cluster breakdown result is provided for the 2019 traces.
The striking difference between 2011 and 2019 data is in the machine time The striking difference between 2011 and 2019 data is in the machine time
distribution per task termination type. In the 2019 traces, 94.38\% of global distribution per task termination type. In the 2019 traces, 94.38\% of global
machine time is spent on tasks that are eventually \texttt{KILL}ed. machine time is spent on tasks that are eventually \texttt{KILL}ed.
\texttt{FINISH}, \texttt{EVICT} and \texttt{FAIL} tasks respectively register \texttt{FINISH}, \texttt{EVICT} and \texttt{FAIL} tasks respectively register
totals of 4.20\%, 1.18\% and 0.25\% machine time, maintaining a analogous totals of 4.20\%, 1.18\% and 0.25\% machine time.
distribution between them to their distribution in the 2011 traces.
Considering instead the distribution between execution phase times, the Considering instead the distribution between execution phase times, the
comparison shows very similar behaviour between the two traces, having the comparison shows very similar behaviour between the two traces, having the
@ -496,32 +545,36 @@ w.r.t.\ of accuracy of task event logging.
Considering instead the behaviour of each single cluster in the 2019 traces, no Considering instead the behaviour of each single cluster in the 2019 traces, no
significant difference beween them can be observed. The only notable difference significant difference beween them can be observed. The only notable difference
lies between the ``Running time``-``Unknown time'' ratio in \texttt{KILL}ed lies between the ``Running time''-``Unknown time'' ratio in \texttt{KILL}ed
tasks, which is at its highest in cluster A (at 30.78\% by 58.71\% of global tasks, which is at its highest in cluster A (at 30.78\% by 58.71\% of global
machine time) and at its lowest in cluster H (at 8.06\% by 84.77\% of global machine time) and at its lowest in cluster H (at 8.06\% by 84.77\% of global
machine time). machine time).
The takeaway from this analysis is that in the 2019 traces a lot of computation
time is wasted in the execution of tasks that are eventually \texttt{KILL}ed,
i.e.\ unsuccessful.
\subsection{Average Slowdown per Task} \subsection{Average Slowdown per Task}
\input{figures/task_slowdown} \input{figures/task_slowdown}
This analysis aims to measure the figure of ``slowdown'', which is defined as This analysis aims to measure the average of an ad-hoc defined parameter we call
the ratio between the response time (i.e\. queue time and running time) of the ``slowdown''. We define it as the ratio between the total response time across
last execution of a given task and the total response time across all all executions of the task and the response time (i.e.\ queue time and running
executions of said task. This metric is especially useful to analyze the impact time) of the last execution of said task. This metric is especially useful to
of unsuccesful executions on each task total execution time w.r.t.\ the intrinsic analyze the impact of unsuccesful executions on each task total execution time
workload (i.e.\ computational time) of tasks. w.r.t.\ the intrinsic workload (i.e.\ computational time) of tasks.
Refer to figure~\ref{fig:taskslowdown} for a comparison between the 2011 and Refer to Figure~\ref{fig:taskslowdown} for a comparison between the 2011 and
2019 mean task slowdown measures broke down by task priority. Additionally, said 2019 mean task slowdown measures broke down by task priority. Additionally, said
means are computed on a cluster-by-cluster basis for 2019 data in means are computed on a cluster-by-cluster basis for 2019 data in
figure~\ref{fig:taskslowdown-csts}. Figure~\ref{fig:taskslowdown-csts}.
In 2015 Ros\'a et al.\cite{dsn-paper} measured mean task slowdown per each task In 2015 Ros\'a et al.\cite{dsn-paper} measured mean task slowdown per each task
priority value, which at the time were $[0,11]$ numeric values. However, priority value, which at the time were numeric values between 0 and 11. However,
in 2019 traces, task priorities are given as a $[0,500]$ numeric value. in 2019 traces, task priorities are given as a numeric value between 0 and 500.
Therefore, to allow for an easier comparison, mean task slowdown values are Therefore, to allow an easier comparison, mean task slowdown values are computed
computed by task priority tier over the 2019 data. Priority tiers are by task priority tier over the 2019 data. Priority tiers are semantically
semantically relevant priority ranges defined in the Tirmazi et al. relevant priority ranges defined in the Tirmazi et al.\
2020\cite{google-marso-19} that introduced the 2019 traces. Equivalent priority 2020\cite{google-marso-19} that introduced the 2019 traces. Equivalent priority
tiers are also provided next to the 2011 priority values in the table covering tiers are also provided next to the 2011 priority values in the table covering
the 2015 analysis. the 2015 analysis.
@ -535,9 +588,9 @@ though this column shows the mean response time across all executions.
\textbf{Mean slowdown} instead provides the mean slowdown value for each task \textbf{Mean slowdown} instead provides the mean slowdown value for each task
priority/tier. priority/tier.
Comparing the tables in figure~\ref{fig:taskslowdown} we observe that the Comparing the tables in Figure~\ref{fig:taskslowdown} we observe that the
maximum mean slowdown measure for 2019 data (i.e.\ 7.84, for the BEB tier) is almost maximum mean slowdown measure for 2019 data (i.e.\ 7.84, for the BEB tier) is
double of the maximum measure in 2011 data (i.e.\ 3.39, for priority $3$ almost double of the maximum measure in 2011 data (i.e.\ 3.39, for priority $3$
corresponding to the BEB tier). The ``Best effort batch'' tier, as the name corresponding to the BEB tier). The ``Best effort batch'' tier, as the name
suggest, is a lower priority tier where failures are more tolerated. Therefore, suggest, is a lower priority tier where failures are more tolerated. Therefore,
due to the increased concurrency in the 2019 clusters compared to 2011 and the due to the increased concurrency in the 2019 clusters compared to 2011 and the
@ -554,7 +607,7 @@ executions: while the mean response is overall shorter in time in the 2019
traces by an order of magnitude, the new traces show an overall significantly traces by an order of magnitude, the new traces show an overall significantly
higher mean response time than in the 2011 data. higher mean response time than in the 2011 data.
Across 2019 single clusters (as in figure~\ref{fig:taskslowdown-csts}), the data Across 2019 single clusters (as in Figure~\ref{fig:taskslowdown-csts}), the data
shows a mostly uniform behaviour, other than for some noteworthy mean slowdown shows a mostly uniform behaviour, other than for some noteworthy mean slowdown
spikes. Indeed, cluster A has 82.97 mean slowdown in the ``Free'' tier, spikes. Indeed, cluster A has 82.97 mean slowdown in the ``Free'' tier,
cluster G has 19.06 and 14.57 mean slowdown in the ``BEB'' and ``Production'' cluster G has 19.06 and 14.57 mean slowdown in the ``BEB'' and ``Production''
@ -573,9 +626,9 @@ Due to limited computational resources w.r.t.\ the data analysis process, the
resource usage for clusters E to H in the 2019 traces is missing. However, a resource usage for clusters E to H in the 2019 traces is missing. However, a
comparison between 2011 resource usage and the aggregated resource usage of comparison between 2011 resource usage and the aggregated resource usage of
clusters A to D in the 2019 traces can be found in clusters A to D in the 2019 traces can be found in
figure~\ref{fig:spatialresourcewaste-actual}. Additionally, a Figure~\ref{fig:spatialresourcewaste-actual}. Additionally, a
cluster-by-cluster breakdown for the 2019 data can be found in cluster-by-cluster breakdown for the 2019 data can be found in
figure~\ref{fig:spatialresourcewaste-actual-csts}. Figure~\ref{fig:spatialresourcewaste-actual-csts}.
From these figures it is clear that, compared to the relatively even From these figures it is clear that, compared to the relatively even
distribution of used resources in the 2011 traces, the distribution of resources distribution of used resources in the 2011 traces, the distribution of resources
@ -586,14 +639,14 @@ all other task termination types have a significantly lower resource usage:
\texttt{EVICT}ed, \texttt{FAIL}ed and \texttt{FINISH}ed tasks register respectively \texttt{EVICT}ed, \texttt{FAIL}ed and \texttt{FINISH}ed tasks register respectively
8.53\%, 3.17\% and 2.02\% CPU usage and 9.03\%, 4.45\%, and 1.66\% memory usage. 8.53\%, 3.17\% and 2.02\% CPU usage and 9.03\%, 4.45\%, and 1.66\% memory usage.
This resource distribution can also be found in the data from individual This resource distribution can also be found in the data from individual
clusters in figure~\ref{fig:spatialresourcewaste-actual-csts}, with always more clusters in Figure~\ref{fig:spatialresourcewaste-actual-csts}, with always more
than 80\% of resources devoted to \texttt{KILL}ed tasks. than 80\% of resources devoted to \texttt{KILL}ed tasks.
Considering now requested resources instead of used ones, a comparison between Considering now requested resources instead of used ones, a comparison between
2011 and the aggregation of all A-H clusters of the 2019 traces can be found in 2011 and the aggregation of all A-H clusters of the 2019 traces can be found in
figure~\ref{fig:spatialresourcewaste-requested}. Additionally, a Figure~\ref{fig:spatialresourcewaste-requested}. Additionally, a
cluster-by-cluster breakdown for single 2019 clusters can be found in cluster-by-cluster breakdown for single 2019 clusters can be found in
figure~\ref{fig:spatialresourcewaste-requested-csts}. Figure~\ref{fig:spatialresourcewaste-requested-csts}.
Here \texttt{KILL}ed jobs dominate even more the distribution of resources, Here \texttt{KILL}ed jobs dominate even more the distribution of resources,
reaching a global 97.21\% of CPU allocation and a global 96.89\% of memory reaching a global 97.21\% of CPU allocation and a global 96.89\% of memory
@ -603,7 +656,7 @@ respective CPU allocation figures of 2.73\%, 0.06\% and 0.0012\% and memory
allocation figures of 3.04\%, 0.06\% and 0.012\%. allocation figures of 3.04\%, 0.06\% and 0.012\%.
Behaviour across clusters (as Behaviour across clusters (as
evinced in figure~\ref{fig:spatialresourcewaste-requested-csts}) in terms of evinced in Figure~\ref{fig:spatialresourcewaste-requested-csts}) in terms of
requested resources is pretty homogeneous, with the exception of cluster A requested resources is pretty homogeneous, with the exception of cluster A
having a relatively high 2.85\% CPU and 3.42\% memory resource requests from having a relatively high 2.85\% CPU and 3.42\% memory resource requests from
\texttt{EVICT}ed tasks and cluster E having a noteworthy 1.67\% CPU and 1.31\% \texttt{EVICT}ed tasks and cluster E having a noteworthy 1.67\% CPU and 1.31\%
@ -626,7 +679,6 @@ probabilities based on the number of task termination events of a specific type.
Finally, Section~\ref{tabIV-section} aims to find similar correlations, but at Finally, Section~\ref{tabIV-section} aims to find similar correlations, but at
the job level. the job level.
The results found the the 2019 traces seldomly show the same patterns in terms The results found the the 2019 traces seldomly show the same patterns in terms
of task events and job/task distributions, in particular highlighting again the of task events and job/task distributions, in particular highlighting again the
overall non-trivial impact of \texttt{KILL} events, no matter the task and job overall non-trivial impact of \texttt{KILL} events, no matter the task and job
@ -640,9 +692,9 @@ the task-level events, namely \texttt{EVICT}, \texttt{FAIL}, \texttt{FINISH}
and \texttt{KILL} termination events. and \texttt{KILL} termination events.
A comparison of the termination event distribution between the 2011 and 2019 A comparison of the termination event distribution between the 2011 and 2019
traces is shown in figure~\ref{fig:tableIII}. Additionally, a cluster-by-cluster traces is shown in Figure~\ref{fig:tableIII}. Additionally, a cluster-by-cluster
breakdown of the same data for the 2019 traces is shown in breakdown of the same data for the 2019 traces is shown in
figure~\ref{fig:tableIII-csts}. Figure~\ref{fig:tableIII-csts}.
Each table from these figure shows the mean and the 95-th percentile of the Each table from these figure shows the mean and the 95-th percentile of the
number of termination events per task, broke down by task termination. In number of termination events per task, broke down by task termination. In
@ -666,7 +718,7 @@ jobs and their \texttt{EVICT} events (1.876 on average per task with a 8.763
event overall average). event overall average).
Considering cluster-by-cluster behaviour in the 2019 traces (as reported in Considering cluster-by-cluster behaviour in the 2019 traces (as reported in
figure~\ref{fig:tableIII-csts}) the general observations still hold for each Figure~\ref{fig:tableIII-csts}) the general observations still hold for each
cluster, albeit with event count averages having different magnitudes. Notably, cluster, albeit with event count averages having different magnitudes. Notably,
cluster E registers the highest per-event average, with \texttt{FAIL}ed tasks cluster E registers the highest per-event average, with \texttt{FAIL}ed tasks
experiencing 111.471 \texttt{FAIL} events out of \texttt{112.384}. experiencing 111.471 \texttt{FAIL} events out of \texttt{112.384}.
@ -681,11 +733,11 @@ given number of unsuccessful events could affect the termination of the task it
belongs to. belongs to.
Conditional probabilities of each unsuccessful event type are shown in the form Conditional probabilities of each unsuccessful event type are shown in the form
of a plot in figure~\ref{fig:figureV}, comparing the 2011 traces with the of a plot in Figure~\ref{fig:figureV}, comparing the 2011 traces with the
overall data from the 2019 ones, and in figure~\ref{fig:figureV-csts}, as a overall data from the 2019 ones, and in Figure~\ref{fig:figureV-csts}, as a
cluster-by-cluster breakdown of the same data for the 2019 traces. cluster-by-cluster breakdown of the same data for the 2019 traces.
In figure~\ref{fig:figureV} the 2011 and 2019 plots differ in their x-axis: In Figure~\ref{fig:figureV} the 2011 and 2019 plots differ in their x-axis:
for 2011 data conditional probabilities are computed for a maximum event coun for 2011 data conditional probabilities are computed for a maximum event coun
t of 30, while for 2019 data are computed for up to 50 events of a specific t of 30, while for 2019 data are computed for up to 50 events of a specific
kind. Nevertheless, another quite striking difference between the two plots can kind. Nevertheless, another quite striking difference between the two plots can
@ -705,7 +757,7 @@ The \texttt{FAIL} probability curve has instead 18.55\%, 1.79\%, 14.49\%,
2.08\%, 2.40\%, and 1.29\% success probabilities for the same range. 2.08\%, 2.40\%, and 1.29\% success probabilities for the same range.
Considering cluster-to-cluster behaviour in the 2019 traces (as shown in Considering cluster-to-cluster behaviour in the 2019 traces (as shown in
figure~\ref{fig:figureV-csts}), some clusters show quite similar behaviour to Figure~\ref{fig:figureV-csts}), some clusters show quite similar behaviour to
the aggregated plot (namely clusters A, F, and H), while some other clusters the aggregated plot (namely clusters A, F, and H), while some other clusters
show very oscillating probability distribution function curves for show very oscillating probability distribution function curves for
\texttt{EVICT} and \texttt{FINISH} curves. \texttt{KILL} behaviour is instead \texttt{EVICT} and \texttt{FINISH} curves. \texttt{KILL} behaviour is instead
@ -714,15 +766,15 @@ homogeneous even on a single cluster basis.
\subsection{Unsuccessful Job Event Patterns}\label{tabIV-section} \subsection{Unsuccessful Job Event Patterns}\label{tabIV-section}
\input{figures/table_iv} \input{figures/table_iv}
This analysis uses very similar techniques to the ones used in The analysis uses very similar techniques to the ones used in
Section~\ref{tabIII-section}, but focusing at the job level instead. The aim is Section~\ref{tabIII-section}, but focusing at the job level instead. The aim is
to better understand the task-job level relationship and to understand how to better understand the task-job level relationship and to understand how
task-level termination events can influence the termination state of a job. task-level termination events can influence the termination state of a job.
A comparison of the analyzed parameters between the 2011 and 2019 A comparison of the analyzed parameters between the 2011 and 2019
traces is shown in figure~\ref{fig:tableIV}. Additionally, a cluster-by-cluster traces is shown in Figure~\ref{fig:tableIV}. Additionally, a cluster-by-cluster
breakdown of the same data for the 2019 traces is shown in breakdown of the same data for the 2019 traces is shown in
figure~\ref{fig:tableIV-csts}. Figure~\ref{fig:tableIV-csts}.
Considering the distribution of number of tasks in a job, the 2019 traces show a Considering the distribution of number of tasks in a job, the 2019 traces show a
decrease for the mean figure (e.g.\ for \texttt{FAIL}ed jobs, with a mean 60.5 decrease for the mean figure (e.g.\ for \texttt{FAIL}ed jobs, with a mean 60.5
@ -740,7 +792,7 @@ the \texttt{FINISH}ed job category has a new event distribution too, with
\texttt{FINISH} task events being the most popular at 1.778 events per job in \texttt{FINISH} task events being the most popular at 1.778 events per job in
the 2019 traces. the 2019 traces.
The cluster-by-cluster comparison in figure~\ref{fig:tableIV-csts} shows that The cluster-by-cluster comparison in Figure~\ref{fig:tableIV-csts} shows that
the number of tasks per job are generally distributed similarly to the the number of tasks per job are generally distributed similarly to the
aggregated data, with only cluster H having remarkably low mean and 95-th aggregated data, with only cluster H having remarkably low mean and 95-th
percentiles overall. Event-wise, for \texttt{EVICT}ed, \texttt{FINISH}ed, percentiles overall. Event-wise, for \texttt{EVICT}ed, \texttt{FINISH}ed,
@ -749,73 +801,82 @@ one. For some clusters (namely B, C, and D), the mean number of \texttt{FAIL} a
\texttt{KILL} task events for \texttt{FINISH}ed jobs is almost the same. \texttt{KILL} task events for \texttt{FINISH}ed jobs is almost the same.
Additionally, it is noteworthy that cluster A has no \texttt{EVICT}ed jobs. Additionally, it is noteworthy that cluster A has no \texttt{EVICT}ed jobs.
\section{Analysis: Potential Causes of Unsuccessful Executions} % \section{Analysis: Potential Causes of Unsuccessful Executions}
\subsection{Event rates vs. task priority, event execution time, and machine % The aim of this section is to analyze several task-level and job-level
concurrency.} % parameters in order to find correlations with the success of an execution. By
% using the tecniques used in Section V of the Rosa\' et al.\
% paper\cite{dsn-paper} we analyze
% task events' metadata, the use of CPU and Memory resources at the task level,
% and job metadata respectively in Section~\ref{fig7-section},
% Section~\ref{fig8-section} and Section~\ref{fig9-section}.
\input{figures/figure_7} % \subsection{Event rates vs.\ task priority, event execution time, and machine
% concurrency.}\label{fig7-section}
Refer to figures \ref{fig:figureVII-a}, \ref{fig:figureVII-b}, and % \input{figures/figure_7}
\ref{fig:figureVII-c}.
\textbf{Observations}: % Refer to figures \ref{fig:figureVII-a}, \ref{fig:figureVII-b}, and
% \ref{fig:figureVII-c}.
\begin{itemize} % \textbf{Observations}:
\item
No smooth curves in this figure either, unlike 2011 traces
\item
The behaviour of curves for 7a (priority) is almost the opposite of
2011, i.e. in-between priorities have higher kill rates while
priorities at the extremum have lower kill rates. This could also be
due bt the inherent distribution of job terminations;
\item
Event execution time curves are quite different than 2011, here it
seems there is a good correlation between short task execution times
and finish event rates, instead of the U shape curve in 2015 DSN
\item
In figure \ref{fig:figureVII-b} cluster behaviour seems quite uniform
\item
Machine concurrency seems to play little role in the event termination
distribution, as for all concurrency factors the kill rate is at 90\%.
\end{itemize}
\subsection{Event Rates vs. Requested Resources, Resource Reservation, and % \begin{itemize}
Resource Utilization} % \item
\input{figures/figure_8} % No smooth curves in this figure either, unlike 2011 traces
% \item
% The behaviour of curves for 7a (priority) is almost the opposite of
% 2011, i.e. in-between priorities have higher kill rates while
% priorities at the extremum have lower kill rates. This could also be
% due bt the inherent distribution of job terminations;
% \item
% Event execution time curves are quite different than 2011, here it
% seems there is a good correlation between short task execution times
% and finish event rates, instead of the U shape curve in 2015 DSN
% \item
% In figure \ref{fig:figureVII-b} cluster behaviour seems quite uniform
% \item
% Machine concurrency seems to play little role in the event termination
% distribution, as for all concurrency factors the kill rate is at 90\%.
% \end{itemize}
Refer to figure~\ref{fig:figureVIII-a}, figure~\ref{fig:figureVIII-a-csts} % \subsection{Event Rates vs. Requested Resources, Resource Reservation, and
figure~\ref{fig:figureVIII-b}, figure~\ref{fig:figureVIII-b-csts} % Resource Utilization}\label{fig8-section}
figure~\ref{fig:figureVIII-c}, figure~\ref{fig:figureVIII-c-csts} % \input{figures/figure_8}
figure~\ref{fig:figureVIII-d}, figure~\ref{fig:figureVIII-d-csts}
figure~\ref{fig:figureVIII-e}, figure~\ref{fig:figureVIII-e-csts}
figure~\ref{fig:figureVIII-f}, and figure~\ref{fig:figureVIII-f-csts}.
\subsection{Job Rates vs. Job Size, Job Execution Time, and Machine Locality} % Refer to Figure~\ref{fig:figureVIII-a}, Figure~\ref{fig:figureVIII-a-csts}
\input{figures/figure_9} % Figure~\ref{fig:figureVIII-b}, Figure~\ref{fig:figureVIII-b-csts}
% Figure~\ref{fig:figureVIII-c}, Figure~\ref{fig:figureVIII-c-csts}
% Figure~\ref{fig:figureVIII-d}, Figure~\ref{fig:figureVIII-d-csts}
% Figure~\ref{fig:figureVIII-e}, Figure~\ref{fig:figureVIII-e-csts}
% Figure~\ref{fig:figureVIII-f}, and Figure~\ref{fig:figureVIII-f-csts}.
Refer to figures \ref{fig:figureIX-a}, \ref{fig:figureIX-b}, and % \subsection{Job Rates vs. Job Size, Job Execution Time, and Machine Locality
\ref{fig:figureIX-c}. % }\label{fig9-section}
% \input{figures/figure_9}
\textbf{Observations}: % Refer to figures \ref{fig:figureIX-a}, \ref{fig:figureIX-b}, and
% \ref{fig:figureIX-c}.
\begin{itemize} % \textbf{Observations}:
\item
Behaviour between cluster varies a lot % \begin{itemize}
\item % \item
There are no ``smooth'' gradients in the various curves unlike in the % Behaviour between cluster varies a lot
2011 traces % \item
\item % There are no ``smooth'' gradients in the various curves unlike in the
Killed jobs have higher event rates in general, and overall dominate % 2011 traces
all event rates measures % \item
\item % Killed jobs have higher event rates in general, and overall dominate
There still seems to be a correlation between short execution job % all event rates measures
times and successfull final termination, and likewise for kills and % \item
higher job terminations % There still seems to be a correlation between short execution job
\item % times and successfull final termination, and likewise for kills and
Across all clusters, a machine locality factor of 1 seems to lead to % higher job terminations
the highest success event rate % \item
\end{itemize} % Across all clusters, a machine locality factor of 1 seems to lead to
% the highest success event rate
% \end{itemize}
\section{Conclusions, Future Work and Possible Developments} \section{Conclusions, Future Work and Possible Developments}
\textbf{TBD} \textbf{TBD}

View file

@ -9,8 +9,8 @@
\begin{figure}[p] \begin{figure}[p]
\machinetimewaste[1]{2011 data}{cluster_2011.pgf} \machinetimewaste[1]{2011 data}{cluster_2011.pgf}
\machinetimewaste[1]{2019 data}{cluster_all.pgf} \machinetimewaste[1]{2019 data}{cluster_all.pgf}
\caption{Relative task time (in milliseconds) spent in each execution phase \caption{Relative task time spent in each execution phase
w.r.t. task termination in 2011 and 2019 traces. X axis shows task termination type, w.r.t.\ task termination in 2011 and 2019 (all clusters aggregated) traces. The x-axis shows task termination type,
Y axis shows total time \% spent. Colors break down the time in execution phases. ``Unknown'' execution times are Y axis shows total time \% spent. Colors break down the time in execution phases. ``Unknown'' execution times are
2019 specific and correspond to event time transitions that are not consider ``typical'' by Google.}\label{fig:machinetimewaste-rel} 2019 specific and correspond to event time transitions that are not consider ``typical'' by Google.}\label{fig:machinetimewaste-rel}
\end{figure} \end{figure}
@ -24,6 +24,6 @@ Y axis shows total time \% spent. Colors break down the time in execution phases
\machinetimewaste{Cluster F}{cluster_f.pgf} \machinetimewaste{Cluster F}{cluster_f.pgf}
\machinetimewaste{Cluster G}{cluster_g.pgf} \machinetimewaste{Cluster G}{cluster_g.pgf}
\machinetimewaste{Cluster H}{cluster_h.pgf} \machinetimewaste{Cluster H}{cluster_h.pgf}
\caption{Relative task time (in milliseconds) spent in each execution phase w.r.t. clusters in the \caption{Relative task time spent in each execution phase w.r.t. clusters in the
2019 trace. Refer to figure~\ref{fig:machinetimewaste-rel} for axes description.}\label{fig:machinetimewaste-rel-csts} 2019 trace. Refer to Figure~\ref{fig:machinetimewaste-rel} for axes description.}\label{fig:machinetimewaste-rel-csts}
\end{figure} \end{figure}