report progress

This commit is contained in:
Claudio Maggioni 2021-05-17 18:50:25 +02:00
parent c5ceae561c
commit d2f896f3ed
2 changed files with 55 additions and 62 deletions

Binary file not shown.

View file

@ -1,11 +1,11 @@
\documentclass{usiinfbachelorproject}
\title{Understanding and Comparing Unsuccessful Executions in Large Datacenters}
\author{Claudio Maggioni}
\usepackage[parfill]{parskip}
\setlength{\parskip}{7pt}
\usepackage{enumitem}
\usepackage{parskip}
\setlength{\parskip}{5pt}
\setlength{\parindent}{0pt}
%\usepackage[printfigures]{figcaps}
\usepackage{xcolor}
\usepackage{amsmath}
\usepackage{subcaption}
@ -93,42 +93,36 @@ are encoded and stored in the trace as rows of various tables. Among the
information events provide, the field ``type'' provides information on
the execution status of the job or task. This field can have the
following values:
\begin{itemize}
\item
\textbf{QUEUE}: The job or task was marked not eligible for scheduling
\begin{center}
\begin{tabular}{p{3cm}p{12cm}}
\toprule
\textbf{Type code} & \textbf{Description} \\
\midrule
\texttt{QUEUE} & The job or task was marked not eligible for scheduling
by Borg's scheduler, and thus Borg will move the job/task in a long
wait queue;
\item
\textbf{SUBMIT}: The job or task was submitted to Borg for execution;
\item
\textbf{ENABLE}: The job or task became eligible for scheduling;
\item
\textbf{SCHEDULE}: The job or task's execution started;
\item
\textbf{EVICT}: The job or task was terminated in order to free
computational resources for an higher priority job;
\item
\textbf{FAIL}: The job or task terminated its execution unsuccesfully
due to a failure;
\item
\textbf{FINISH}: The job or task terminated succesfully;
\item
\textbf{KILL}: The job or task terminated its execution because of a
manual request to stop it;
\item
\textbf{LOST}: It is assumed a job or task is has been terminated, but
wait queue\\
\texttt{SUBMIT} & The job or task was submitted to Borg for execution\\
\texttt{ENABLE} & The job or task became eligible for scheduling\\
\texttt{SCHEDULE} & The job or task's execution started\\
\texttt{EVICT} & The job or task was terminated in order to free
computational resources for an higher priority job\\
\texttt{FAIL} & The job or task terminated its execution unsuccesfully
due to a failure\\
\texttt{FINISH} & The job or task terminated succesfully\\
\texttt{KILL} & The job or task terminated its execution because of a
manual request to stop it\\
\texttt{LOST} & It is assumed a job or task is has been terminated, but
due to missing data there is insufficent information to identify when
or how;
\item
\textbf{UPDATE\_PENDING}: The metadata (scheduling class, resource
or how\\
\texttt{UPDATE\_PENDING} & The metadata (scheduling class, resource
requirements, \ldots) of the job/task was updated while the job was
waiting to be scheduled;
\item
\textbf{UPDATE\_RUNNING}: The metadata (scheduling class, resource
waiting to be scheduled\\
\texttt{UPDATE\_RUNNING} & The metadata (scheduling class, resource
requirements, \ldots) of the job/task was updated while the job was in
execution;
\end{itemize}
execution\\
\bottomrule
\end{tabular}
\end{center}
Figure~\ref{fig:eventTypes} shows the expected transitions between event
types.
@ -177,22 +171,16 @@ file segments) where each carriage return separated line represents a
single record for that table.
There are namely 5 different table ``files'':
\begin{itemize}
\item
\texttt{machine\_configs}, which is a table containing each physical
\begin{description}
\item[\texttt{machine\_configs},] which is a table containing each physical
machine's configuration and its evolution over time;
\item
\texttt{instance\_events}, which is a table of task events;
\item
\texttt{collection\_events}, which is a table of job events;
\item
\texttt{machine\_attributes}, which is a table containing (obfuscated)
\item[\texttt{instance\_events},] which is a table of task events;
\item[\texttt{collection\_events},] which is a table of job events;
\item[\texttt{machine\_attributes},] which is a table containing (obfuscated)
metadata about each physical machine and its evolution over time;
\item
\texttt{instance\_usage}, which contains resource (CPU/RAM) measures
\item[\texttt{instance\_usage},] which contains resource (CPU/RAM) measures
of jobs and tasks running on the single machines.
\end{itemize}
\end{description}
The scope of this thesis focuses on the tables
\texttt{machine\_configs}, \texttt{instance\_events} and
@ -224,7 +212,11 @@ analysis}\label{project-requirements-and-analysis}}
\hypertarget{analysis-methodology}{%
\section{Analysis methodology}\label{analysis-methodology}}
\textbf{TBD}
Due to the inherent complexity in analyzing traces of this size, novel
bleeding-edge data engineering tecniques were adopted to performed the required
computations. We used the framework Apache Spark to perform efficient and
parallel Map-Reduce computations. In this section, we discuss the technical
details behind our approach.
\hypertarget{introduction-on-apache-spark}{%
\subsection{Introduction on Apache
@ -302,15 +294,16 @@ the presence of incomplete data (i.e.~records which contain fields whose values
is unknown). This filtering is performed using the \texttt{.filter()} operation
of Spark's RDD API.
The core of each query is often a \texttt{groupby()} followed by a \texttt{map()}
operation on the aggregated data. The \texttt{groupby()} groups the set of all records
into several subsets of records each having something in common. Then, each of
this small clusters is reduced with a \texttt{map()} operation to a single
record. The motivation behind this computation is often to analyze a time
series of several different traces of programs. This is implemented by
\texttt{groupby()}-ing records by program id, and then \texttt{map()}-ing each program
trace set by sorting by time the traces and computing the desired property in
the form of a record.
The core of each query is often a \texttt{groupby()} followed by a
\texttt{map()} operation on the aggregated data. The \texttt{groupby()} groups
the set of all records into several subsets of records each having something in
common. Then, each of this small clusters is reduced with a \texttt{map()}
operation to a single record. The motivation behind this way of computing data
is that for the analysis in this thesis it is often necessary to analyze the
behaviour w.r.t. time of either task or jobs by looking at their events. These
queries are therefore implemented by \texttt{groupby()}-ing records by task or
job, and then \texttt{map()}-ing each set of event records sorting them by time
and performing the desired computation on the obtained chronological event log.
Sometimes intermediate results are saved in Spark's parquet format in order to
compute and save intermediate results beforehand.