diff --git a/report/Claudio_Maggioni_report.pdf b/report/Claudio_Maggioni_report.pdf index 7f3217c4..e6356cd5 100644 Binary files a/report/Claudio_Maggioni_report.pdf and b/report/Claudio_Maggioni_report.pdf differ diff --git a/report/Claudio_Maggioni_report.tex b/report/Claudio_Maggioni_report.tex index 2c4faec6..191b66f1 100644 --- a/report/Claudio_Maggioni_report.tex +++ b/report/Claudio_Maggioni_report.tex @@ -949,8 +949,44 @@ Refer to figures \ref{fig:figureIX-a}, \ref{fig:figureIX-b}, and the highest success event rate \end{itemize} -\section{Conclusions, Future Work and Possible Developments}\label{sec8} -\textbf{TBD} +\section{Conclusions, Limitations and Future Work}\label{sec8} +In this report we analyze the Google Borg 2019 traces and compared them with +their 2011 counterpart from the perspective of failures, their impact on +resources and their causes. We discover that the impact of non-successful +executions (especially of \texttt{KILL}ed tasks and jobs) in the new traces is +still very relevant in terms of machine time and resources, even more so than in +2011. We also discover that unsuccessful job and task event patterns still play +a major role in the overall execution success of Borg jobs and tasks. We finally +discover that unsuccessful job and task event rates dominate the overall +landscape of Borg's own logs, even when grouping tasks and jobs by parameters +such as priority, resource request, reservation and utilization, and machine +locality. + +We then can conclude that the performed analysis show a lot of clear trends +regarding the correlation of execution success with several parameters and +metadata. These trends can potentially be exploited to build better scheduling +algorithms and new predictive models +that could understand if an execution has high probability of failure based on +its own properties and metadata. The creation of such models could allow for +computational resources to be saved and used to either increase the throughput +of higher priority workloads or to allow for a larger workload altoghether. + +The biggest limitation and threat to validity posed to this project is the +relative lack of infrormation provided by Google on the true meaning of +unsuccessful terminations. Indeed, given the ``black box'' nature of the traces +and the rather scarcity of information in the traces +documentation\cite{google-drive-marso}, it is not clear if unsuccessful +executions yield any useful computation result or not. Our assumption in this +report is that unsuccesful jobs and tasks do not produce any result and are +therefore just burdens on machine time and resources, but should this assumption +be incorrect then the interpretation of the analyses might change significantly. + +Given the significant computational time invested in obtaining the results shown +in this report and due to time and resource limitations, some of the analysis +were not completed. Our future work will focus on finishing these analysis, +namely by computing results for the missing clusters and obtaining a true +overall picture of the 2019 Google Borg cluster traces w.r.t.\ failures and +their causes. \newpage \printbibliography%