report: added introduction

2021-05-19 14:06:38 +02:00 · 2021-05-19 14:06:38 +02:00 · 87b869b92d
commit 87b869b92d
parent 18ce409cde
2 changed files with 30 additions and 3 deletions
--- a/report/Claudio_Maggioni_report.pdf
+++ b/report/Claudio_Maggioni_report.pdf
--- a/report/Claudio_Maggioni_report.tex
+++ b/report/Claudio_Maggioni_report.tex
@ -43,9 +43,36 @@ system attributes such asmachine locality and concurrency level.}
 \tableofcontents
 \newpage
-\hypertarget{introduction-including-motivation}{%
+\section{Introduction}
-\section{Introduction (including
+In today's world there is an ever growing demand for efficient, large scale
-Motivation)}\label{introduction-including-motivation}}
+computations. The rising trend of ``big data'' put the need for efficient
 management of large scaled parallelized computing at an all time high. This fact
 also increases the demand for research in the field of distributed systems, in
 particular in how to schedule computations effectively, avoid wasting resources
 and avoid failures.
 In 2011 Google released a month long data trace of its own \textit{Borg} cluster
 management system, containing a lot of data regarding scheduling, priority
 management, and failures of a real production workload. This data was the
 foundation of the 2015 Ros\'a et al.\ paper \textit{Understanding the Dark Side
 of Big Data Clusters: An Analysis beyond Failures}, which in its many
 conclusions highlighted the need for better cluster management highlighting the
 high amount of failures found in the traces.
 In 2019 Google released an updated version of the \textit{Borg} cluster traces,
 not only containing data from a far bigger workload due to the sheer power of
 Moore's law, but also providing data from 8 different \textit{Borg} cells from
 datacenters all over the world. These new traces are therefore about 100 times
 larger than the old traces, weighing in terms of storage spaces approximately
 8TiB (when compressed and stored in JSONL format), requiring considerable
 computational power to analyze them and the implementation of special data
 engineering tecniques for analysis of the data.
 This project aims to repeat the analysis performed in 2015 to highlight
 similarities and differences in workload this decade brought, and expanding the
 old analysis to understand even better the causes of failures and how to prevent
 them. Additionally, this report will provide an overview on the data engineering
 tecniques used to perform the queries and analyses on the 2019 traces.
 \hypertarget{state-of-the-art}{%
 \section{State of the Art}\label{state-of-the-art}}