173 lines
5.8 KiB
Markdown
173 lines
5.8 KiB
Markdown
---
|
||
documentclass: usiinfbachelorproject
|
||
title: Understanding and Comparing Unsuccessful Executions in Large Datacenters
|
||
author: Claudio Maggioni
|
||
|
||
pandoc-options:
|
||
- --filter=pandoc-include
|
||
- --latex-engine-opt=--shell-escape
|
||
- --latex-engine-opt=--enable-write18
|
||
|
||
header-includes:
|
||
- |
|
||
```{=latex}
|
||
\usepackage{subcaption}
|
||
\usepackage{booktabs}
|
||
\usepackage{graphicx}
|
||
|
||
\captionsetup{labelfont={bf}}
|
||
%\subtitle{The (optional) subtitle}
|
||
|
||
\versiondate{\today}
|
||
|
||
\begin{committee}
|
||
\advisor[Universit\`a della Svizzera Italiana,
|
||
Switzerland]{Prof.}{Walter}{Binder}
|
||
\assistant[Universit\`a della Svizzera Italiana,
|
||
Switzerland]{Dr.}{Andrea}{Ros\'a}
|
||
\end{committee}
|
||
|
||
\abstract{The project aims at comparing two different traces coming from large
|
||
datacenters, focusing in particular on unsuccessful executions of jobs and
|
||
tasks submitted by users. The objective of this project is to compare the
|
||
resource waste caused by unsuccessful executions, their impact on application
|
||
performance, and their root causes. We will show the strong negative impact on
|
||
CPU and RAM usage and on task slowdown. We will analyze patterns of
|
||
unsuccessful jobs and tasks, particularly focusing on their interdependency.
|
||
Moreover, we will uncover their root causes by inspecting key workload and
|
||
system attributes such asmachine locality and concurrency level.}
|
||
```
|
||
---
|
||
|
||
# Introduction (including Motivation)
|
||
|
||
# State of the Art
|
||
|
||
- Introduce Ros\'a 2015 DSN paper on analysis
|
||
- Describe Google Borg clusters
|
||
- Describe Traces contents
|
||
- Differences between 2011 and 2019 traces
|
||
|
||
# Project requirements and analysis
|
||
|
||
(describe our objective with this analysis in detail)
|
||
|
||
# Analysis methodology
|
||
|
||
## Technical overview of traces' file format and schema
|
||
|
||
## Overview on challenging aspects of analysis (data size, schema, avaliable computation resources)
|
||
|
||
## Introduction on apache spark
|
||
|
||
## General workflow description of apache spark workflow
|
||
|
||
The Google 2019 Borg cluster traces analysis were conducted by using Apache
|
||
Spark and its Python 3 API (pyspark). Spark was used to execute a series of
|
||
queries to perform various sums and aggregations over the entire dataset
|
||
provided by Google.
|
||
|
||
In general, each query follows a general Map-Reduce template, where traces are
|
||
first read, parsed, filtered by performing selections, projections and computing
|
||
new derived fields. Then, the trace records are often grouped by one of their
|
||
fields, clustering related data toghether before a reduce or fold operation is
|
||
applied to each grouping.
|
||
|
||
Most input data is in JSONL format and adheres to a schema Google profided in
|
||
the form of a protobuffer specification[^1].
|
||
|
||
[^1]: [Google 2019 Borg traces Protobuffer specification on Github](
|
||
https://github.com/google/cluster-data/blob/master/clusterdata_trace_format_v3.proto)
|
||
|
||
On of the main quirks in the traces is that fields that have a "zero" value
|
||
(i.e. a value like 0 or the empty string) are often omitted in the JSON object
|
||
records. When reading the traces in Apache Spark is therefore necessary to check
|
||
for this possibility and populate those zero fields when omitted.
|
||
|
||
Most queries use only two or three fields in each trace records, while the
|
||
original records often are made of a couple of dozen fields. In order to save
|
||
memory during the query, a projection is often applied to the data by the means
|
||
of a .map() operation over the entire trace set, performed using Spark's RDD
|
||
API.
|
||
|
||
Another operation that is often necessary to perform prior to the Map-Reduce core of
|
||
each query is a record filtering process, which is often motivated by the
|
||
presence of incomplete data (i.e. records which contain fields whose values is
|
||
unknown). This filtering is performed using the .filter() operation of Spark's
|
||
RDD API.
|
||
|
||
The core of each query is often a groupBy followed by a map() operation on the
|
||
aggregated data. The groupby groups the set of all records into several subsets
|
||
of records each having something in common. Then, each of this small clusters is
|
||
reduced with a .map() operation to a single record. The motivation behind this
|
||
computation is often to analyze a time series of several different traces of
|
||
programs. This is implemented by groupBy()-ing records by program id, and then
|
||
map()-ing each program trace set by sorting by time the traces and computing the
|
||
desired property in the form of a record.
|
||
|
||
Sometimes intermediate results are saved in Spark's parquet format in order to
|
||
compute and save intermediate results beforehand.
|
||
|
||
## General Query script design
|
||
|
||
## Ad-Hoc presentation of some analysis scripts (w diagrams)
|
||
|
||
# Analysis (w observations)
|
||
|
||
## machine_configs
|
||
|
||
\input{figures/machine_configs}
|
||
|
||
**Observations**:
|
||
|
||
- machine configurations are definitely more varied than the ones in the 2011
|
||
traces
|
||
- some clusters have more machine variability
|
||
|
||
## machine_time_waste
|
||
|
||
\input{figures/machine_time_waste}
|
||
|
||
**Observations**:
|
||
|
||
## task_slowdown
|
||
## spatial_resource_waste
|
||
## figure_7
|
||
## figure_8
|
||
## figure_9
|
||
## table_iii, table_iv, figure_v
|
||
|
||
## Potential causes of unsuccesful executions
|
||
|
||
# Implementation issues -- Analysis limitations
|
||
|
||
## Discussion on unknown fields
|
||
|
||
## Limitation on computation resources required for the analysis
|
||
|
||
## Other limitations ...
|
||
|
||
# Conclusions and future work or possible developments
|
||
|
||
|
||
Some examples
|
||
-------------
|
||
|
||
**Figure [1](#fig:USILogo){reference-type="ref"
|
||
reference="fig:USILogo"}** shows how to insert figures in the document.
|
||
|
||
![Caption of the figure](logo-info.pdf){#fig:USILogo width="50%"}
|
||
|
||
**Table [1](#tab:numbers){reference-type="ref"
|
||
reference="tab:numbers"}** shows how to insert tables in the document.
|
||
|
||
::: {#tab:numbers}
|
||
**Col 1** **Col 2** **Col 3** **Col 4**
|
||
----------- ----------- ----------- -----------
|
||
1 2 3 Goofy
|
||
4 5 6 Mickey
|
||
|
||
: Caption of the table
|
||
:::
|
||
|
||
<!-- vim: set ts=2 sw=2 et tw=80: -->
|