report: added figures

This commit is contained in:
Claudio Maggioni 2021-05-12 14:15:49 +02:00
parent 676da45f25
commit b2d995cfa6
27 changed files with 597 additions and 254 deletions

View file

@ -1 +0,0 @@
,maggicl,Apple2gs.local,10.05.2021 18:02,file:///Users/maggicl/Library/Application%20Support/LibreOffice/4;

File diff suppressed because one or more lines are too long

View file

@ -2,7 +2,6 @@
"cells": [
{
"cell_type": "markdown",
"id": "built-symbol",
"metadata": {},
"source": [
"# Machine configurations\n",
@ -15,7 +14,6 @@
{
"cell_type": "code",
"execution_count": 1,
"id": "stuffed-lightning",
"metadata": {},
"outputs": [],
"source": [
@ -31,7 +29,6 @@
{
"cell_type": "code",
"execution_count": 3,
"id": "upper-lloyd",
"metadata": {},
"outputs": [],
"source": [
@ -58,7 +55,6 @@
{
"cell_type": "code",
"execution_count": 4,
"id": "presidential-farmer",
"metadata": {},
"outputs": [
{
@ -127,7 +123,6 @@
{
"cell_type": "code",
"execution_count": 5,
"id": "informative-vietnam",
"metadata": {},
"outputs": [],
"source": [
@ -150,7 +145,6 @@
{
"cell_type": "code",
"execution_count": 6,
"id": "pretty-taiwan",
"metadata": {},
"outputs": [
{
@ -1562,7 +1556,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "supreme-hepatitis",
"metadata": {},
"outputs": [],
"source": []
@ -1584,7 +1577,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
"version": "3.8.3"
}
},
"nbformat": 4,

File diff suppressed because one or more lines are too long

View file

@ -2,9 +2,19 @@
documentclass: usiinfbachelorproject
title: Understanding and Comparing Unsuccessful Executions in Large Datacenters
author: Claudio Maggioni
pandoc-options:
- --filter=pandoc-include
- --latex-engine-opt=--shell-escape
- --latex-engine-opt=--enable-write18
header-includes:
- |
```{=latex}
\usepackage{subcaption}
\usepackage{booktabs}
\usepackage{graphicx}
\captionsetup{labelfont={bf}}
%\subtitle{The (optional) subtitle}
@ -29,57 +39,116 @@ header-includes:
```
---
Introduction
============
# Introduction (including Motivation)
General issues
--------------
# State of the Art
Latex is not so complex. If you aren't familiar with it just spend some
time in googling for latex commands (e.g. font formats, tables, figures,
items,...).
- Introduce Ros\'a 2015 DSN paper on analysis
- Describe Google Borg clusters
- Describe Traces contents
- Differences between 2011 and 2019 traces
Getting started
---------------
# Project requirements and analysis
In order to use the bachelor thesis template, be sure that the following
files are present in your working directory:
(describe our objective with this analysis in detail)
- usiinfbachelorproject.cls (The latex template)
# Analysis methodology
- logo-info.pdf (The logo figure)
## Technical overview of traces' file format and schema
- references.bib (The references file)\
## Overview on challenging aspects of analysis (data size, schema, avaliable computation resources)
Compilation issues
------------------
## Introduction on apache spark
If you are not familiar with Tex, I advise you to download TexShop for
Mac OS.\
To include the references and display them in the final pdf, you have
first to typeset this file with *LaTex* (ComboBox upper left, if you use
TexShop), then with *BibTex* and finally again with *LaTex*.\
In order to resolve figures/table/... references you have to run 2 times
the (latex) typeset.
## General workflow description of apache spark workflow
Document structure
------------------
The Google 2019 Borg cluster traces analysis were conducted by using Apache
Spark and its Python 3 API (pyspark). Spark was used to execute a series of
queries to perform various sums and aggregations over the entire dataset
provided by Google.
Some basic sections:
In general, each query follows a general Map-Reduce template, where traces are
first read, parsed, filtered by performing selections, projections and computing
new derived fields. Then, the trace records are often grouped by one of their
fields, clustering related data toghether before a reduce or fold operation is
applied to each grouping.
- Introduction (including Motivation)
Most input data is in JSONL format and adheres to a schema Google profided in
the form of a protobuffer specification[^1].
- State of the Art
[^1]: [Google 2019 Borg traces Protobuffer specification on Github](
https://github.com/google/cluster-data/blob/master/clusterdata_trace_format_v3.proto)
- Project requirements and analysis
On of the main quirks in the traces is that fields that have a "zero" value
(i.e. a value like 0 or the empty string) are often omitted in the JSON object
records. When reading the traces in Apache Spark is therefore necessary to check
for this possibility and populate those zero fields when omitted.
- Project design (top-down)
Most queries use only two or three fields in each trace records, while the
original records often are made of a couple of dozen fields. In order to save
memory during the query, a projection is often applied to the data by the means
of a .map() operation over the entire trace set, performed using Spark's RDD
API.
- Implementation issues (bottom-up)
Another operation that is often necessary to perform prior to the Map-Reduce core of
each query is a record filtering process, which is often motivated by the
presence of incomplete data (i.e. records which contain fields whose values is
unknown). This filtering is performed using the .filter() operation of Spark's
RDD API.
- Tests (methodology, results, comments)
The core of each query is often a groupBy followed by a map() operation on the
aggregated data. The groupby groups the set of all records into several subsets
of records each having something in common. Then, each of this small clusters is
reduced with a .map() operation to a single record. The motivation behind this
computation is often to analyze a time series of several different traces of
programs. This is implemented by groupBy()-ing records by program id, and then
map()-ing each program trace set by sorting by time the traces and computing the
desired property in the form of a record.
Sometimes intermediate results are saved in Spark's parquet format in order to
compute and save intermediate results beforehand.
## General Query script design
## Ad-Hoc presentation of some analysis scripts (w diagrams)
# Analysis (w observations)
## machine_configs
\input{figures/machine_configs}
**Observations**:
- machine configurations are definitely more varied than the ones in the 2011
traces
- some clusters have more machine variability
## machine_time_waste
\input{figures/machine_time_waste}
**Observations**:
## task_slowdown
## spatial_resource_waste
## figure_7
## figure_8
## figure_9
## table_iii, table_iv, figure_v
## Potential causes of unsuccesful executions
# Implementation issues -- Analysis limitations
## Discussion on unknown fields
## Limitation on computation resources required for the analysis
## Other limitations ...
# Conclusions and future work or possible developments
- Conclusions and future work or possible developments
Some examples
-------------

Binary file not shown.

View file

@ -0,0 +1,170 @@
\newcommand{\machineconfigs}[3][0.9\textwidth]{
\begin{subfigure}{0.32\textwidth}
\begin{minipage}[c][#1]{\textwidth}%
\resizebox{\textwidth}{!}{
\centering
\begin{tabular}{llll}
\toprule
\textbf{CPU (NCU)} & \textbf{RAM (NMU)} & \textbf{Machine count} &
\textbf{\% Machines} \\
\midrule
#3
\bottomrule
\end{tabular}}
\end{minipage}
\caption{#2}
\end{subfigure}}
\begin{figure}
\centering
\machineconfigs[1.2\textwidth]{All clusters}{
Unknown & Unknown & 8729 & 1.639218\% \\
1.000000 & 0.500000 & 124234 & 23.329891\% \\
0.591797 & 0.333496 & 103013 & 19.344801\% \\
0.259277 & 0.166748 & 78078 & 14.662260\% \\
0.708984 & 0.333496 & 55801 & 10.478864\% \\
0.386719 & 0.333496 & 36237 & 6.804943\% \\
0.958984 & 0.500000 & 31151 & 5.849843\% \\
0.708984 & 0.666992 & 29594 & 5.557454\% \\
0.386719 & 0.166748 & 27011 & 5.072393\% \\
1.000000 & 1.000000 & 12286 & 2.307187\% \\
0.591797 & 0.166748 & 9902 & 1.859496\% \\
1.000000 & 0.250000 & 7550 & 1.417814\% \\
0.958984 & 1.000000 & 3552 & 0.667030\% \\
0.259277 & 0.333496 & 3024 & 0.567877\% \\
0.591797 & 0.666992 & 1000 & 0.187790\% \\
0.259277 & 0.083374 & 634 & 0.119059\% \\
0.958984 & 0.250000 & 600 & 0.112674\% \\
0.500000 & 0.062500 & 54 & 0.010141\% \\
0.500000 & 0.250000 & 34 & 0.006385\% \\
0.479492 & 0.250000 & 12 & 0.002253\% \\
0.708984 & 0.250000 & 6 & 0.001127\% \\
0.591797 & 0.250000 & 4 & 0.000751\% \\
0.708984 & 0.500000 & 2 & 0.000376\% \\
0.479492 & 0.500000 & 2 & 0.000376\% \\
}
\machineconfigs[1.2\textwidth]{A cluster}{
Unknown & Unknown & 1377 & 1.623170\% \\
0.591797 & 0.333496 & 29487 & 34.758469\% \\
1.000000 & 0.500000 & 13440 & 15.842705\% \\
0.708984 & 0.333496 & 12495 & 14.728764\% \\
0.386719 & 0.333496 & 9057 & 10.676144\% \\
0.386719 & 0.166748 & 5265 & 6.206238\% \\
0.708984 & 0.666992 & 4608 & 5.431784\% \\
1.000000 & 1.000000 & 4446 & 5.240823\% \\
0.591797 & 0.166748 & 2484 & 2.928071\% \\
0.958984 & 0.500000 & 1143 & 1.347337\% \\
0.958984 & 1.000000 & 654 & 0.770917\% \\
1.000000 & 0.250000 & 366 & 0.431431\% \\
0.479492 & 0.250000 & 6 & 0.007073\% \\
0.708984 & 0.250000 & 6 & 0.007073\% \\
}
\machineconfigs[1.2\textwidth]{Cluster B}{
Unknown & Unknown & 134 & 0.264812\% \\
0.591797 & 0.333496 & 16184 & 31.982926\% \\
1.000000 & 0.500000 & 9790 & 19.347061\% \\
0.708984 & 0.333496 & 8448 & 16.694992\% \\
0.958984 & 0.500000 & 5502 & 10.873088\% \\
0.708984 & 0.666992 & 3832 & 7.572823\% \\
1.000000 & 1.000000 & 2214 & 4.375321\% \\
0.591797 & 0.166748 & 2152 & 4.252796\% \\
0.386719 & 0.333496 & 816 & 1.612584\% \\
0.958984 & 1.000000 & 618 & 1.221296\% \\
0.591797 & 0.666992 & 500 & 0.988103\% \\
0.386719 & 0.166748 & 412 & 0.814197\% \\
}
\machineconfigs{Cluster C}{
Unknown & Unknown & 1466 & 2.274208\% \\
0.259277 & 0.166748 & 15754 & 24.439204\% \\
0.386719 & 0.333496 & 11104 & 17.225652\% \\
0.591797 & 0.333496 & 10404 & 16.139741\% \\
0.958984 & 0.500000 & 6634 & 10.291334\% \\
1.000000 & 0.500000 & 5654 & 8.771059\% \\
0.386719 & 0.166748 & 3580 & 5.553660\% \\
0.708984 & 0.666992 & 2900 & 4.498774\% \\
1.000000 & 1.000000 & 2736 & 4.244361\% \\
1.000000 & 0.250000 & 2132 & 3.307375\% \\
0.958984 & 1.000000 & 766 & 1.188297\% \\
0.708984 & 0.333496 & 620 & 0.961807\% \\
0.958984 & 0.250000 & 600 & 0.930781\% \\
0.591797 & 0.166748 & 112 & 0.173746\% \\
}
\machineconfigs{Cluster D}{
Unknown & Unknown & 498 & 0.794309\% \\
0.591797 & 0.333496 & 28394 & 45.288376\% \\
0.386719 & 0.333496 & 8402 & 13.401174\% \\
0.259277 & 0.166748 & 8020 & 12.791885\% \\
0.386719 & 0.166748 & 5806 & 9.260559\% \\
0.708984 & 0.666992 & 4380 & 6.986092\% \\
0.708984 & 0.333496 & 3924 & 6.258772\% \\
0.591797 & 0.166748 & 2548 & 4.064055\% \\
0.259277 & 0.333496 & 426 & 0.679469\% \\
1.000000 & 0.500000 & 292 & 0.465739\% \\
0.591797 & 0.250000 & 4 & 0.006380\% \\
0.708984 & 0.500000 & 2 & 0.003190\% \\
}
\machineconfigs{Cluster E}{
Unknown & Unknown & 536 & 0.671915\% \\
0.259277 & 0.166748 & 38452 & 48.202377\% \\
0.708984 & 0.333496 & 11786 & 14.774608\% \\
0.958984 & 0.500000 & 8646 & 10.838389\% \\
0.708984 & 0.666992 & 7606 & 9.534674\% \\
1.000000 & 0.500000 & 5586 & 7.002457\% \\
0.386719 & 0.166748 & 4470 & 5.603470\% \\
0.259277 & 0.333496 & 1268 & 1.589530\% \\
0.259277 & 0.083374 & 634 & 0.794765\% \\
0.591797 & 0.333496 & 324 & 0.406158\% \\
1.000000 & 0.250000 & 268 & 0.335957\% \\
1.000000 & 1.000000 & 138 & 0.172993\% \\
0.500000 & 0.062500 & 54 & 0.067693\% \\
0.500000 & 0.250000 & 4 & 0.005014\% \\
}
\machineconfigs{Cluster F}{
Unknown & Unknown & 1432 & 2.299958\% \\
1.000000 & 0.500000 & 41340 & 66.396839\% \\
0.708984 & 0.333496 & 6878 & 11.046866\% \\
0.591797 & 0.333496 & 5564 & 8.936430\% \\
0.958984 & 0.500000 & 2172 & 3.488484\% \\
0.386719 & 0.166748 & 1544 & 2.479843\% \\
0.708984 & 0.666992 & 1244 & 1.998008\% \\
1.000000 & 0.250000 & 792 & 1.272044\% \\
0.958984 & 1.000000 & 536 & 0.860878\% \\
0.386719 & 0.333496 & 398 & 0.639234\% \\
1.000000 & 1.000000 & 344 & 0.552504\% \\
0.500000 & 0.250000 & 18 & 0.028910\% \\
}
\machineconfigs{Cluster G}{
Unknown & Unknown & 1566 & 2.261568\% \\
0.259277 & 0.166748 & 15852 & 22.892958\% \\
1.000000 & 0.500000 & 11808 & 17.052741\% \\
0.708984 & 0.333496 & 7968 & 11.507134\% \\
0.591797 & 0.333496 & 7830 & 11.307839\% \\
0.386719 & 0.166748 & 4690 & 6.773150\% \\
0.708984 & 0.666992 & 4258 & 6.149269\% \\
0.958984 & 0.500000 & 4196 & 6.059731\% \\
0.386719 & 0.333496 & 3864 & 5.580267\% \\
0.591797 & 0.166748 & 2606 & 3.763503\% \\
1.000000 & 0.250000 & 2100 & 3.032754\% \\
0.259277 & 0.333496 & 1330 & 1.920744\% \\
0.958984 & 1.000000 & 778 & 1.123563\% \\
1.000000 & 1.000000 & 378 & 0.545896\% \\
0.500000 & 0.250000 & 12 & 0.017330\% \\
0.479492 & 0.250000 & 6 & 0.008665\% \\
0.479492 & 0.500000 & 2 & 0.002888\% \\
}
\machineconfigs{Cluster H}{
Unknown & Unknown & 1720 & 2.933251\% \\
1.000000 & 0.500000 & 36324 & 61.946178\% \\
0.591797 & 0.333496 & 4826 & 8.230158\% \\
0.708984 & 0.333496 & 3682 & 6.279205\% \\
0.958984 & 0.500000 & 2858 & 4.873973\% \\
0.386719 & 0.333496 & 2596 & 4.427163\% \\
1.000000 & 1.000000 & 2030 & 3.461919\% \\
1.000000 & 0.250000 & 1892 & 3.226577\% \\
0.386719 & 0.166748 & 1244 & 2.121491\% \\
0.708984 & 0.666992 & 766 & 1.306320\% \\
0.591797 & 0.666992 & 500 & 0.852689\% \\
0.958984 & 1.000000 & 200 & 0.341076\% \\
}
\caption{Overwiew of machine configurations in terms of CPU and RAM resources for each cluster}
\end{figure}

View file

@ -0,0 +1,58 @@
\newcommand{\machinetimewaste}[3][0]{
\begin{subfigure}{\ifnum#1=1 0.5\textwidth \else 0.24\textwidth \fi}
\vspace{0.5cm}
%\ifnum#1=1 \hspace{0.25\textwidth} \fi
\begin{minipage}[c]{\textwidth}%
\includegraphics[width=1\textwidth]{figures/machine_time_waste/#3}
\end{minipage}
%\hfill
\caption{#2}
\end{subfigure}}
\newcommand{\machinetimewastelegend}{
\begin{subfigure}{0.5\textwidth}
\vspace{0.5cm}
\centering
\begin{tabular}{cc}
\toprule
\textbf{Color} & \textbf{Execution phase} \\
\midrule
{\color{blue}Blue} & Queued \\
{\color{orange}Orange} & Ended \\
{\color{teal}Green} & Ready \\
{\color{red}Red} & Running \\
{\color{violet}Violet} & Evicted \\
{\color{brown}Brown} & Unknown \\
\bottomrule
\end{tabular}
\caption{Execution state legend for the graphs}
\end{subfigure}
\hfill}
\begin{figure}
\machinetimewastelegend
\machinetimewaste[1]{All clusters}{output_9_16.png}
\machinetimewaste{Cluster A}{output_9_0.png}
\machinetimewaste{Cluster B}{output_9_2.png}
\machinetimewaste{Cluster C}{output_9_4.png}
\machinetimewaste{Cluster D}{output_9_6.png}
\machinetimewaste{Cluster E}{output_9_8.png}
\machinetimewaste{Cluster F}{output_9_10.png}
\machinetimewaste{Cluster G}{output_9_12.png}
\machinetimewaste{Cluster H}{output_9_14.png}
\caption{Total task time (in milliseconds) spent in each execution phase w.r.t. task termination.}
\end{figure}
\begin{figure}
\machinetimewastelegend
\machinetimewaste[1]{All clusters}{output_9_17.png}
\machinetimewaste{Cluster A}{output_9_1.png}
\machinetimewaste{Cluster B}{output_9_3.png}
\machinetimewaste{Cluster C}{output_9_5.png}
\machinetimewaste{Cluster D}{output_9_7.png}
\machinetimewaste{Cluster E}{output_9_9.png}
\machinetimewaste{Cluster F}{output_9_11.png}
\machinetimewaste{Cluster G}{output_9_13.png}
\machinetimewaste{Cluster H}{output_9_15.png}
\caption{Relative task time (in milliseconds) spent in each execution phase w.r.t. task termination.}
\end{figure}

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

View file

@ -3,7 +3,6 @@
{
"cell_type": "code",
"execution_count": 10,
"id": "black-funeral",
"metadata": {},
"outputs": [],
"source": [
@ -19,7 +18,6 @@
{
"cell_type": "code",
"execution_count": 70,
"id": "hawaiian-cabin",
"metadata": {},
"outputs": [],
"source": [
@ -29,7 +27,6 @@
{
"cell_type": "code",
"execution_count": 84,
"id": "blessed-updating",
"metadata": {},
"outputs": [],
"source": [
@ -68,7 +65,6 @@
{
"cell_type": "code",
"execution_count": 85,
"id": "consistent-toilet",
"metadata": {},
"outputs": [
{
@ -406,9 +402,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"display_name": "Python 3",
"language": "python",
"name": "venv"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
@ -420,7 +416,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.4"
"version": "3.8.3"
}
},
"nbformat": 4,