report

2021-06-18 15:30:50 +02:00 · 2021-06-18 15:30:50 +02:00 · b9c1159307
parent 0fa930ae56
commit b9c1159307
2 changed files with 38 additions and 2 deletions
--- a/report/Claudio_Maggioni_report.pdf
+++ b/report/Claudio_Maggioni_report.pdf
--- a/report/Claudio_Maggioni_report.tex
+++ b/report/Claudio_Maggioni_report.tex
@ -949,8 +949,44 @@ Refer to figures \ref{fig:figureIX-a}, \ref{fig:figureIX-b}, and
  the highest success event rate
 \end{itemize}

-\section{Conclusions, Future Work and Possible Developments}\label{sec8}
-\textbf{TBD}
+\section{Conclusions, Limitations and Future Work}\label{sec8}
+In this report we analyze the Google Borg 2019 traces and compared them with
+their 2011 counterpart from the perspective of failures, their impact on
+resources and their causes. We discover that the impact of non-successful
+executions (especially of \texttt{KILL}ed tasks and jobs) in the new traces is
+still very relevant in terms of machine time and resources, even more so than in
+2011. We also discover that unsuccessful job and task event patterns still play
+a major role in the overall execution success of Borg jobs and tasks. We finally
+discover that unsuccessful job and task event rates dominate the overall
+landscape of Borg's own logs, even when grouping tasks and jobs by parameters
+such as priority, resource request, reservation and utilization, and machine
+locality.
+
+We then can conclude that the performed analysis show a lot of clear trends
+regarding the correlation of execution success with several parameters and
+metadata. These trends can potentially be exploited to build better scheduling
+algorithms and new predictive models
+that could understand if an execution has high probability of failure based on
+its own properties and metadata. The creation of such models could allow for
+computational resources to be saved and used to either increase the throughput
+of higher priority workloads or to allow for a larger workload altoghether.
+
+The biggest limitation and threat to validity posed to this project is the
+relative lack of infrormation provided by Google on the true meaning of
+unsuccessful terminations. Indeed, given the ``black box'' nature of the traces
+and the rather scarcity of information in the traces
+documentation\cite{google-drive-marso}, it is not clear if unsuccessful
+executions yield any useful computation result or not. Our assumption in this
+report is that unsuccesful jobs and tasks do not produce any result and are
+therefore just burdens on machine time and resources, but should this assumption
+be incorrect then the interpretation of the analyses might change significantly.
+
+Given the significant computational time invested in obtaining the results shown
+in this report and due to time and resource limitations, some of the analysis
+were not completed. Our future work will focus on finishing these analysis,
+namely by computing results for the missing clusters and obtaining a true
+overall picture of the 2019 Google Borg cluster traces w.r.t.\ failures and
+their causes.

 \newpage
 \printbibliography%