report: added introduction
This commit is contained in:
parent
7c3fe3f094
commit
c4dd02c6f3
2 changed files with 30 additions and 3 deletions
Binary file not shown.
|
@ -43,9 +43,36 @@ system attributes such asmachine locality and concurrency level.}
|
||||||
\tableofcontents
|
\tableofcontents
|
||||||
\newpage
|
\newpage
|
||||||
|
|
||||||
\hypertarget{introduction-including-motivation}{%
|
\section{Introduction}
|
||||||
\section{Introduction (including
|
In today's world there is an ever growing demand for efficient, large scale
|
||||||
Motivation)}\label{introduction-including-motivation}}
|
computations. The rising trend of ``big data'' put the need for efficient
|
||||||
|
management of large scaled parallelized computing at an all time high. This fact
|
||||||
|
also increases the demand for research in the field of distributed systems, in
|
||||||
|
particular in how to schedule computations effectively, avoid wasting resources
|
||||||
|
and avoid failures.
|
||||||
|
|
||||||
|
In 2011 Google released a month long data trace of its own \textit{Borg} cluster
|
||||||
|
management system, containing a lot of data regarding scheduling, priority
|
||||||
|
management, and failures of a real production workload. This data was the
|
||||||
|
foundation of the 2015 Ros\'a et al.\ paper \textit{Understanding the Dark Side
|
||||||
|
of Big Data Clusters: An Analysis beyond Failures}, which in its many
|
||||||
|
conclusions highlighted the need for better cluster management highlighting the
|
||||||
|
high amount of failures found in the traces.
|
||||||
|
|
||||||
|
In 2019 Google released an updated version of the \textit{Borg} cluster traces,
|
||||||
|
not only containing data from a far bigger workload due to the sheer power of
|
||||||
|
Moore's law, but also providing data from 8 different \textit{Borg} cells from
|
||||||
|
datacenters all over the world. These new traces are therefore about 100 times
|
||||||
|
larger than the old traces, weighing in terms of storage spaces approximately
|
||||||
|
8TiB (when compressed and stored in JSONL format), requiring considerable
|
||||||
|
computational power to analyze them and the implementation of special data
|
||||||
|
engineering tecniques for analysis of the data.
|
||||||
|
|
||||||
|
This project aims to repeat the analysis performed in 2015 to highlight
|
||||||
|
similarities and differences in workload this decade brought, and expanding the
|
||||||
|
old analysis to understand even better the causes of failures and how to prevent
|
||||||
|
them. Additionally, this report will provide an overview on the data engineering
|
||||||
|
tecniques used to perform the queries and analyses on the 2019 traces.
|
||||||
|
|
||||||
\hypertarget{state-of-the-art}{%
|
\hypertarget{state-of-the-art}{%
|
||||||
\section{State of the Art}\label{state-of-the-art}}
|
\section{State of the Art}\label{state-of-the-art}}
|
||||||
|
|
Loading…
Reference in a new issue