report

2021-06-18 15:30:50 +02:00 · 2021-06-18 15:30:50 +02:00 · f8045b560c
commit f8045b560c
parent 744a4025a1
2 changed files with 38 additions and 2 deletions
--- a/report/Claudio_Maggioni_report.pdf
+++ b/report/Claudio_Maggioni_report.pdf
--- a/report/Claudio_Maggioni_report.tex
+++ b/report/Claudio_Maggioni_report.tex
@ -949,8 +949,44 @@ Refer to figures \ref{fig:figureIX-a}, \ref{fig:figureIX-b}, and
  the highest success event rate
 \end{itemize}
-\section{Conclusions, Future Work and Possible Developments}\label{sec8}
+\section{Conclusions, Limitations and Future Work}\label{sec8}
-\textbf{TBD}
+In this report we analyze the Google Borg 2019 traces and compared them with
 their 2011 counterpart from the perspective of failures, their impact on
 resources and their causes. We discover that the impact of non-successful
 executions (especially of \texttt{KILL}ed tasks and jobs) in the new traces is
 still very relevant in terms of machine time and resources, even more so than in
 2011. We also discover that unsuccessful job and task event patterns still play
 a major role in the overall execution success of Borg jobs and tasks. We finally
 discover that unsuccessful job and task event rates dominate the overall
 landscape of Borg's own logs, even when grouping tasks and jobs by parameters
 such as priority, resource request, reservation and utilization, and machine
 locality.
 We then can conclude that the performed analysis show a lot of clear trends
 regarding the correlation of execution success with several parameters and
 metadata. These trends can potentially be exploited to build better scheduling
 algorithms and new predictive models
 that could understand if an execution has high probability of failure based on
 its own properties and metadata. The creation of such models could allow for
 computational resources to be saved and used to either increase the throughput
 of higher priority workloads or to allow for a larger workload altoghether.
 The biggest limitation and threat to validity posed to this project is the
 relative lack of infrormation provided by Google on the true meaning of
 unsuccessful terminations. Indeed, given the ``black box'' nature of the traces
 and the rather scarcity of information in the traces
 documentation\cite{google-drive-marso}, it is not clear if unsuccessful
 executions yield any useful computation result or not. Our assumption in this
 report is that unsuccesful jobs and tasks do not produce any result and are
 therefore just burdens on machine time and resources, but should this assumption
 be incorrect then the interpretation of the analyses might change significantly.
 Given the significant computational time invested in obtaining the results shown
 in this report and due to time and resource limitations, some of the analysis
 were not completed. Our future work will focus on finishing these analysis,
 namely by computing results for the missing clusters and obtaining a true
 overall picture of the 2019 Google Borg cluster traces w.r.t.\ failures and
 their causes.
 \newpage
 \printbibliography%