report progress
This commit is contained in:
parent
c5ceae561c
commit
d2f896f3ed
2 changed files with 55 additions and 62 deletions
Binary file not shown.
|
@ -1,11 +1,11 @@
|
||||||
\documentclass{usiinfbachelorproject}
|
\documentclass{usiinfbachelorproject}
|
||||||
\title{Understanding and Comparing Unsuccessful Executions in Large Datacenters}
|
\title{Understanding and Comparing Unsuccessful Executions in Large Datacenters}
|
||||||
\author{Claudio Maggioni}
|
\author{Claudio Maggioni}
|
||||||
|
\usepackage{enumitem}
|
||||||
\usepackage[parfill]{parskip}
|
\usepackage{parskip}
|
||||||
\setlength{\parskip}{7pt}
|
\setlength{\parskip}{5pt}
|
||||||
\setlength{\parindent}{0pt}
|
\setlength{\parindent}{0pt}
|
||||||
|
%\usepackage[printfigures]{figcaps}
|
||||||
\usepackage{xcolor}
|
\usepackage{xcolor}
|
||||||
\usepackage{amsmath}
|
\usepackage{amsmath}
|
||||||
\usepackage{subcaption}
|
\usepackage{subcaption}
|
||||||
|
@ -93,42 +93,36 @@ are encoded and stored in the trace as rows of various tables. Among the
|
||||||
information events provide, the field ``type'' provides information on
|
information events provide, the field ``type'' provides information on
|
||||||
the execution status of the job or task. This field can have the
|
the execution status of the job or task. This field can have the
|
||||||
following values:
|
following values:
|
||||||
|
\begin{center}
|
||||||
\begin{itemize}
|
\begin{tabular}{p{3cm}p{12cm}}
|
||||||
\item
|
\toprule
|
||||||
\textbf{QUEUE}: The job or task was marked not eligible for scheduling
|
\textbf{Type code} & \textbf{Description} \\
|
||||||
|
\midrule
|
||||||
|
\texttt{QUEUE} & The job or task was marked not eligible for scheduling
|
||||||
by Borg's scheduler, and thus Borg will move the job/task in a long
|
by Borg's scheduler, and thus Borg will move the job/task in a long
|
||||||
wait queue;
|
wait queue\\
|
||||||
\item
|
\texttt{SUBMIT} & The job or task was submitted to Borg for execution\\
|
||||||
\textbf{SUBMIT}: The job or task was submitted to Borg for execution;
|
\texttt{ENABLE} & The job or task became eligible for scheduling\\
|
||||||
\item
|
\texttt{SCHEDULE} & The job or task's execution started\\
|
||||||
\textbf{ENABLE}: The job or task became eligible for scheduling;
|
\texttt{EVICT} & The job or task was terminated in order to free
|
||||||
\item
|
computational resources for an higher priority job\\
|
||||||
\textbf{SCHEDULE}: The job or task's execution started;
|
\texttt{FAIL} & The job or task terminated its execution unsuccesfully
|
||||||
\item
|
due to a failure\\
|
||||||
\textbf{EVICT}: The job or task was terminated in order to free
|
\texttt{FINISH} & The job or task terminated succesfully\\
|
||||||
computational resources for an higher priority job;
|
\texttt{KILL} & The job or task terminated its execution because of a
|
||||||
\item
|
manual request to stop it\\
|
||||||
\textbf{FAIL}: The job or task terminated its execution unsuccesfully
|
\texttt{LOST} & It is assumed a job or task is has been terminated, but
|
||||||
due to a failure;
|
|
||||||
\item
|
|
||||||
\textbf{FINISH}: The job or task terminated succesfully;
|
|
||||||
\item
|
|
||||||
\textbf{KILL}: The job or task terminated its execution because of a
|
|
||||||
manual request to stop it;
|
|
||||||
\item
|
|
||||||
\textbf{LOST}: It is assumed a job or task is has been terminated, but
|
|
||||||
due to missing data there is insufficent information to identify when
|
due to missing data there is insufficent information to identify when
|
||||||
or how;
|
or how\\
|
||||||
\item
|
\texttt{UPDATE\_PENDING} & The metadata (scheduling class, resource
|
||||||
\textbf{UPDATE\_PENDING}: The metadata (scheduling class, resource
|
|
||||||
requirements, \ldots) of the job/task was updated while the job was
|
requirements, \ldots) of the job/task was updated while the job was
|
||||||
waiting to be scheduled;
|
waiting to be scheduled\\
|
||||||
\item
|
\texttt{UPDATE\_RUNNING} & The metadata (scheduling class, resource
|
||||||
\textbf{UPDATE\_RUNNING}: The metadata (scheduling class, resource
|
|
||||||
requirements, \ldots) of the job/task was updated while the job was in
|
requirements, \ldots) of the job/task was updated while the job was in
|
||||||
execution;
|
execution\\
|
||||||
\end{itemize}
|
\bottomrule
|
||||||
|
\end{tabular}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
Figure~\ref{fig:eventTypes} shows the expected transitions between event
|
Figure~\ref{fig:eventTypes} shows the expected transitions between event
|
||||||
types.
|
types.
|
||||||
|
@ -177,22 +171,16 @@ file segments) where each carriage return separated line represents a
|
||||||
single record for that table.
|
single record for that table.
|
||||||
|
|
||||||
There are namely 5 different table ``files'':
|
There are namely 5 different table ``files'':
|
||||||
|
\begin{description}
|
||||||
\begin{itemize}
|
\item[\texttt{machine\_configs},] which is a table containing each physical
|
||||||
\item
|
|
||||||
\texttt{machine\_configs}, which is a table containing each physical
|
|
||||||
machine's configuration and its evolution over time;
|
machine's configuration and its evolution over time;
|
||||||
\item
|
\item[\texttt{instance\_events},] which is a table of task events;
|
||||||
\texttt{instance\_events}, which is a table of task events;
|
\item[\texttt{collection\_events},] which is a table of job events;
|
||||||
\item
|
\item[\texttt{machine\_attributes},] which is a table containing (obfuscated)
|
||||||
\texttt{collection\_events}, which is a table of job events;
|
|
||||||
\item
|
|
||||||
\texttt{machine\_attributes}, which is a table containing (obfuscated)
|
|
||||||
metadata about each physical machine and its evolution over time;
|
metadata about each physical machine and its evolution over time;
|
||||||
\item
|
\item[\texttt{instance\_usage},] which contains resource (CPU/RAM) measures
|
||||||
\texttt{instance\_usage}, which contains resource (CPU/RAM) measures
|
|
||||||
of jobs and tasks running on the single machines.
|
of jobs and tasks running on the single machines.
|
||||||
\end{itemize}
|
\end{description}
|
||||||
|
|
||||||
The scope of this thesis focuses on the tables
|
The scope of this thesis focuses on the tables
|
||||||
\texttt{machine\_configs}, \texttt{instance\_events} and
|
\texttt{machine\_configs}, \texttt{instance\_events} and
|
||||||
|
@ -224,7 +212,11 @@ analysis}\label{project-requirements-and-analysis}}
|
||||||
\hypertarget{analysis-methodology}{%
|
\hypertarget{analysis-methodology}{%
|
||||||
\section{Analysis methodology}\label{analysis-methodology}}
|
\section{Analysis methodology}\label{analysis-methodology}}
|
||||||
|
|
||||||
\textbf{TBD}
|
Due to the inherent complexity in analyzing traces of this size, novel
|
||||||
|
bleeding-edge data engineering tecniques were adopted to performed the required
|
||||||
|
computations. We used the framework Apache Spark to perform efficient and
|
||||||
|
parallel Map-Reduce computations. In this section, we discuss the technical
|
||||||
|
details behind our approach.
|
||||||
|
|
||||||
\hypertarget{introduction-on-apache-spark}{%
|
\hypertarget{introduction-on-apache-spark}{%
|
||||||
\subsection{Introduction on Apache
|
\subsection{Introduction on Apache
|
||||||
|
@ -302,15 +294,16 @@ the presence of incomplete data (i.e.~records which contain fields whose values
|
||||||
is unknown). This filtering is performed using the \texttt{.filter()} operation
|
is unknown). This filtering is performed using the \texttt{.filter()} operation
|
||||||
of Spark's RDD API.
|
of Spark's RDD API.
|
||||||
|
|
||||||
The core of each query is often a \texttt{groupby()} followed by a \texttt{map()}
|
The core of each query is often a \texttt{groupby()} followed by a
|
||||||
operation on the aggregated data. The \texttt{groupby()} groups the set of all records
|
\texttt{map()} operation on the aggregated data. The \texttt{groupby()} groups
|
||||||
into several subsets of records each having something in common. Then, each of
|
the set of all records into several subsets of records each having something in
|
||||||
this small clusters is reduced with a \texttt{map()} operation to a single
|
common. Then, each of this small clusters is reduced with a \texttt{map()}
|
||||||
record. The motivation behind this computation is often to analyze a time
|
operation to a single record. The motivation behind this way of computing data
|
||||||
series of several different traces of programs. This is implemented by
|
is that for the analysis in this thesis it is often necessary to analyze the
|
||||||
\texttt{groupby()}-ing records by program id, and then \texttt{map()}-ing each program
|
behaviour w.r.t. time of either task or jobs by looking at their events. These
|
||||||
trace set by sorting by time the traces and computing the desired property in
|
queries are therefore implemented by \texttt{groupby()}-ing records by task or
|
||||||
the form of a record.
|
job, and then \texttt{map()}-ing each set of event records sorting them by time
|
||||||
|
and performing the desired computation on the obtained chronological event log.
|
||||||
|
|
||||||
Sometimes intermediate results are saved in Spark's parquet format in order to
|
Sometimes intermediate results are saved in Spark's parquet format in order to
|
||||||
compute and save intermediate results beforehand.
|
compute and save intermediate results beforehand.
|
||||||
|
|
Loading…
Reference in a new issue