report: added introduction

2021-05-19 14:06:38 +02:00 · 2021-05-19 14:06:38 +02:00 · d9e4947789
parent f5eb1f30dd
commit d9e4947789
2 changed files with 30 additions and 3 deletions
--- a/report/Claudio_Maggioni_report.pdf
+++ b/report/Claudio_Maggioni_report.pdf
--- a/report/Claudio_Maggioni_report.tex
+++ b/report/Claudio_Maggioni_report.tex
@ -43,9 +43,36 @@ system attributes such asmachine locality and concurrency level.}
 \tableofcontents
 \newpage

-\hypertarget{introduction-including-motivation}{%
-\section{Introduction (including
-Motivation)}\label{introduction-including-motivation}}
+\section{Introduction}
+In today's world there is an ever growing demand for efficient, large scale
+computations. The rising trend of ``big data'' put the need for efficient
+management of large scaled parallelized computing at an all time high. This fact
+also increases the demand for research in the field of distributed systems, in
+particular in how to schedule computations effectively, avoid wasting resources
+and avoid failures.
+
+In 2011 Google released a month long data trace of its own \textit{Borg} cluster
+management system, containing a lot of data regarding scheduling, priority
+management, and failures of a real production workload. This data was the
+foundation of the 2015 Ros\'a et al.\ paper \textit{Understanding the Dark Side
+of Big Data Clusters: An Analysis beyond Failures}, which in its many
+conclusions highlighted the need for better cluster management highlighting the
+high amount of failures found in the traces.
+
+In 2019 Google released an updated version of the \textit{Borg} cluster traces,
+not only containing data from a far bigger workload due to the sheer power of
+Moore's law, but also providing data from 8 different \textit{Borg} cells from
+datacenters all over the world. These new traces are therefore about 100 times
+larger than the old traces, weighing in terms of storage spaces approximately
+8TiB (when compressed and stored in JSONL format), requiring considerable
+computational power to analyze them and the implementation of special data
+engineering tecniques for analysis of the data.
+
+This project aims to repeat the analysis performed in 2015 to highlight
+similarities and differences in workload this decade brought, and expanding the
+old analysis to understand even better the causes of failures and how to prevent
+them. Additionally, this report will provide an overview on the data engineering
+tecniques used to perform the queries and analyses on the 2019 traces.

 \hypertarget{state-of-the-art}{%
 \section{State of the Art}\label{state-of-the-art}}