report progress
This commit is contained in:
parent
c5ceae561c
commit
d2f896f3ed
2 changed files with 55 additions and 62 deletions
Binary file not shown.
|
@ -1,11 +1,11 @@
|
|||
\documentclass{usiinfbachelorproject}
|
||||
\title{Understanding and Comparing Unsuccessful Executions in Large Datacenters}
|
||||
\author{Claudio Maggioni}
|
||||
|
||||
\usepackage[parfill]{parskip}
|
||||
\setlength{\parskip}{7pt}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{parskip}
|
||||
\setlength{\parskip}{5pt}
|
||||
\setlength{\parindent}{0pt}
|
||||
|
||||
%\usepackage[printfigures]{figcaps}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{amsmath}
|
||||
\usepackage{subcaption}
|
||||
|
@ -93,42 +93,36 @@ are encoded and stored in the trace as rows of various tables. Among the
|
|||
information events provide, the field ``type'' provides information on
|
||||
the execution status of the job or task. This field can have the
|
||||
following values:
|
||||
|
||||
\begin{itemize}
|
||||
\item
|
||||
\textbf{QUEUE}: The job or task was marked not eligible for scheduling
|
||||
\begin{center}
|
||||
\begin{tabular}{p{3cm}p{12cm}}
|
||||
\toprule
|
||||
\textbf{Type code} & \textbf{Description} \\
|
||||
\midrule
|
||||
\texttt{QUEUE} & The job or task was marked not eligible for scheduling
|
||||
by Borg's scheduler, and thus Borg will move the job/task in a long
|
||||
wait queue;
|
||||
\item
|
||||
\textbf{SUBMIT}: The job or task was submitted to Borg for execution;
|
||||
\item
|
||||
\textbf{ENABLE}: The job or task became eligible for scheduling;
|
||||
\item
|
||||
\textbf{SCHEDULE}: The job or task's execution started;
|
||||
\item
|
||||
\textbf{EVICT}: The job or task was terminated in order to free
|
||||
computational resources for an higher priority job;
|
||||
\item
|
||||
\textbf{FAIL}: The job or task terminated its execution unsuccesfully
|
||||
due to a failure;
|
||||
\item
|
||||
\textbf{FINISH}: The job or task terminated succesfully;
|
||||
\item
|
||||
\textbf{KILL}: The job or task terminated its execution because of a
|
||||
manual request to stop it;
|
||||
\item
|
||||
\textbf{LOST}: It is assumed a job or task is has been terminated, but
|
||||
wait queue\\
|
||||
\texttt{SUBMIT} & The job or task was submitted to Borg for execution\\
|
||||
\texttt{ENABLE} & The job or task became eligible for scheduling\\
|
||||
\texttt{SCHEDULE} & The job or task's execution started\\
|
||||
\texttt{EVICT} & The job or task was terminated in order to free
|
||||
computational resources for an higher priority job\\
|
||||
\texttt{FAIL} & The job or task terminated its execution unsuccesfully
|
||||
due to a failure\\
|
||||
\texttt{FINISH} & The job or task terminated succesfully\\
|
||||
\texttt{KILL} & The job or task terminated its execution because of a
|
||||
manual request to stop it\\
|
||||
\texttt{LOST} & It is assumed a job or task is has been terminated, but
|
||||
due to missing data there is insufficent information to identify when
|
||||
or how;
|
||||
\item
|
||||
\textbf{UPDATE\_PENDING}: The metadata (scheduling class, resource
|
||||
or how\\
|
||||
\texttt{UPDATE\_PENDING} & The metadata (scheduling class, resource
|
||||
requirements, \ldots) of the job/task was updated while the job was
|
||||
waiting to be scheduled;
|
||||
\item
|
||||
\textbf{UPDATE\_RUNNING}: The metadata (scheduling class, resource
|
||||
waiting to be scheduled\\
|
||||
\texttt{UPDATE\_RUNNING} & The metadata (scheduling class, resource
|
||||
requirements, \ldots) of the job/task was updated while the job was in
|
||||
execution;
|
||||
\end{itemize}
|
||||
execution\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
|
||||
Figure~\ref{fig:eventTypes} shows the expected transitions between event
|
||||
types.
|
||||
|
@ -177,22 +171,16 @@ file segments) where each carriage return separated line represents a
|
|||
single record for that table.
|
||||
|
||||
There are namely 5 different table ``files'':
|
||||
|
||||
\begin{itemize}
|
||||
\item
|
||||
\texttt{machine\_configs}, which is a table containing each physical
|
||||
\begin{description}
|
||||
\item[\texttt{machine\_configs},] which is a table containing each physical
|
||||
machine's configuration and its evolution over time;
|
||||
\item
|
||||
\texttt{instance\_events}, which is a table of task events;
|
||||
\item
|
||||
\texttt{collection\_events}, which is a table of job events;
|
||||
\item
|
||||
\texttt{machine\_attributes}, which is a table containing (obfuscated)
|
||||
\item[\texttt{instance\_events},] which is a table of task events;
|
||||
\item[\texttt{collection\_events},] which is a table of job events;
|
||||
\item[\texttt{machine\_attributes},] which is a table containing (obfuscated)
|
||||
metadata about each physical machine and its evolution over time;
|
||||
\item
|
||||
\texttt{instance\_usage}, which contains resource (CPU/RAM) measures
|
||||
\item[\texttt{instance\_usage},] which contains resource (CPU/RAM) measures
|
||||
of jobs and tasks running on the single machines.
|
||||
\end{itemize}
|
||||
\end{description}
|
||||
|
||||
The scope of this thesis focuses on the tables
|
||||
\texttt{machine\_configs}, \texttt{instance\_events} and
|
||||
|
@ -224,7 +212,11 @@ analysis}\label{project-requirements-and-analysis}}
|
|||
\hypertarget{analysis-methodology}{%
|
||||
\section{Analysis methodology}\label{analysis-methodology}}
|
||||
|
||||
\textbf{TBD}
|
||||
Due to the inherent complexity in analyzing traces of this size, novel
|
||||
bleeding-edge data engineering tecniques were adopted to performed the required
|
||||
computations. We used the framework Apache Spark to perform efficient and
|
||||
parallel Map-Reduce computations. In this section, we discuss the technical
|
||||
details behind our approach.
|
||||
|
||||
\hypertarget{introduction-on-apache-spark}{%
|
||||
\subsection{Introduction on Apache
|
||||
|
@ -302,15 +294,16 @@ the presence of incomplete data (i.e.~records which contain fields whose values
|
|||
is unknown). This filtering is performed using the \texttt{.filter()} operation
|
||||
of Spark's RDD API.
|
||||
|
||||
The core of each query is often a \texttt{groupby()} followed by a \texttt{map()}
|
||||
operation on the aggregated data. The \texttt{groupby()} groups the set of all records
|
||||
into several subsets of records each having something in common. Then, each of
|
||||
this small clusters is reduced with a \texttt{map()} operation to a single
|
||||
record. The motivation behind this computation is often to analyze a time
|
||||
series of several different traces of programs. This is implemented by
|
||||
\texttt{groupby()}-ing records by program id, and then \texttt{map()}-ing each program
|
||||
trace set by sorting by time the traces and computing the desired property in
|
||||
the form of a record.
|
||||
The core of each query is often a \texttt{groupby()} followed by a
|
||||
\texttt{map()} operation on the aggregated data. The \texttt{groupby()} groups
|
||||
the set of all records into several subsets of records each having something in
|
||||
common. Then, each of this small clusters is reduced with a \texttt{map()}
|
||||
operation to a single record. The motivation behind this way of computing data
|
||||
is that for the analysis in this thesis it is often necessary to analyze the
|
||||
behaviour w.r.t. time of either task or jobs by looking at their events. These
|
||||
queries are therefore implemented by \texttt{groupby()}-ing records by task or
|
||||
job, and then \texttt{map()}-ing each set of event records sorting them by time
|
||||
and performing the desired computation on the obtained chronological event log.
|
||||
|
||||
Sometimes intermediate results are saved in Spark's parquet format in order to
|
||||
compute and save intermediate results beforehand.
|
||||
|
|
Loading…
Reference in a new issue