diff --git a/.~lock.status.ods# b/.~lock.status.ods# new file mode 100644 index 00000000..0eeee260 --- /dev/null +++ b/.~lock.status.ods# @@ -0,0 +1 @@ +,maggicl,Apple2gs.local,16.05.2021 14:55,file:///Users/maggicl/Library/Application%20Support/LibreOffice/4; \ No newline at end of file diff --git a/report/Claudio_Maggioni_report.md b/report/Claudio_Maggioni_report.md index 72dbb210..0b15d961 100644 --- a/report/Claudio_Maggioni_report.md +++ b/report/Claudio_Maggioni_report.md @@ -39,29 +39,141 @@ header-includes: ``` --- +\tableofcontents +\newpage + # Introduction (including Motivation) # State of the Art -- Introduce Ros\'a 2015 DSN paper on analysis -- Describe Google Borg clusters -- Describe Traces contents -- Differences between 2011 and 2019 traces +## Introduction + +**TBD** + +## Rosà et al. 2015 DSN paper + +**TBD** + +## Google Borg + +Borg is Google's own cluster management software. Among the various cluster +management services it provides, the main ones are: job queuing, scheduling, +allocation, and deallocation due to higher priority computations. + +The data this thesis is based on is from 8 Borg "cells" (i.e. clusters) spanning +8 different datacenters, all focused on "compute" (i.e. computational oriented) +workloads. The data collection timespan matches the entire month of May 2019. + +In Google's lingo a "job" is a large unit of computational workload made up of +several "tasks", i.e. a number of executions of single executables running on a +single machine. A job may run tasks sequentially or in parallel, and the +condition for a job's succesful termination is nontrivial. + +Both tasks and jobs lifecyles are represented by several events, which are +encoded and stored in the trace as rows of various tables. Among the information +events provide, the field "type" provides information on the execution status of +the job or task. This field can have the following values: + +- **QUEUE**: The job or task was marked not eligible for scheduling by Borg's + scheduler, and thus Borg will move the job/task in a long wait queue; +- **SUBMIT**: The job or task was submitted to Borg for execution; +- **ENABLE**: The job or task became eligible for scheduling; +- **SCHEDULE**: The job or task's execution started; +- **EVICT**: The job or task was terminated in order to free computational + resources for an higher priority job; +- **FAIL**: The job or task terminated its execution unsuccesfully due to a + failure; +- **FINISH**: The job or task terminated succesfully; +- **KILL**: The job or task terminated its execution because of a manual request + to stop it; +- **LOST**: It is assumed a job or task is has been terminated, but due to + missing data there is insufficent information to identify when or how; +- **UPDATE_PENDING**: The metadata (scheduling class, resource requirements, + ...) of the job/task was updated while the job was waiting to be scheduled; +- **UPDATE_RUNNING**: The metadata (scheduling class, resource requirements, + ...) of the job/task was updated while the job was in execution; + +Figure \ref{fig:eventTypes} shows the expected transitions between event types. + +![Typical transitions between task/job event types according to Google +\label{fig:eventTypes}](./figures/event_types.png) + +## Traces contents + +The traces provided by Google contain mainly a collection of job and task events +spanning a month of execution of the 8 different clusters. In addition to this +data, some additional data on the machines' configuration in terms of resources +(i.e. amount of CPU and RAM) and additional machine-related metadata. + +Due to Google's policy, most identification related data (like job/task IDs, +raw resource amounts and other text values) were obfuscated prior to the release +of the traces. One obfuscation that is noteworthy in the scope of this thesis is +related to CPU and RAM amounts, which are expressed respetively in NCUs +(_Normalized Compute Units_) and NMUs (_Normalized Memory Units_). + +NCUs and NMUs are defined based on the raw machine resource distributions of the +machines within the 8 clusters. A machine having 1 NCU CPU power and 1 NMU +memory size has the maximum amount of raw CPU power and raw RAM size found in +the clusters. While RAM size is measured in bytes for normalization purposes, +CPU power was measured in GCU (_Google Compute Units_), a proprietary CPU power +measurement unit used by Google that combines several parameters like number of +processors and cores, clock frequency, and architecture (i.e. ISA). + +## Overview of traces' format + +The traces have a collective size of approximately 8TiB and are stored in a +Gzip-compressed JSONL (JSON lines) format, which means that each table is +represented by a single logical "file" (stored in several file segments) where +each carriage return separated line represents a single record for that table. + +There are namely 5 different table "files": + +- `machine_configs`, which is a table containing each physical machine's + configuration and its evolution over time; +- `instance_events`, which is a table of task events; +- `collection_events`, which is a table of job events; +- `machine_attributes`, which is a table containing (obfuscated) metadata about + each physical machine and its evolution over time; +- `instance_usage`, which contains resource (CPU/RAM) measures of jobs and tasks + running on the single machines. + +The scope of this thesis focuses on the tables `machine_configs`, +`instance_events` and `collection_events`. + +## Remark on traces size + +While the 2011 Google Borg traces were relatively small, with a total size in +the order of the tens of gigabytes, the 2019 traces are quite challenging to +analyze due to their sheer size. As stated before, the traces have a total size +of 8 TiB when stored in the format provided by Google. Even when broken down to +table "files", unitary sizes still reach the single tebibyte mark (namely for +`machine_configs`, the largest table in the trace). + +Due to this constraints, a careful data engineering based approach was used when +reproducing the 2015 DSN paper analysis. Bleeding edge data science technologies +like Apache Spark were used to achieve efficient and parallelized computations. +This approach is discussed with further detail in the following section. # Project requirements and analysis -(describe our objective with this analysis in detail) +**TBD** (describe our objective with this analysis in detail) # Analysis methodology -## Technical overview of traces' file format and schema +**TBD** ## Overview on challenging aspects of analysis (data size, schema, avaliable computation resources) -## Introduction on apache spark +**TBD** + +## Introduction on Apache Spark + +**TBD** ## General workflow description of apache spark workflow +**TBD** (extract from the notes sent to Filippo shown below) + The Google 2019 Borg cluster traces analysis were conducted by using Apache Spark and its Python 3 API (pyspark). Spark was used to execute a series of queries to perform various sums and aggregations over the entire dataset @@ -110,7 +222,11 @@ compute and save intermediate results beforehand. ## General Query script design -## Ad-Hoc presentation of some analysis scripts (w diagrams) +**TBD** + +## Ad-Hoc presentation of some analysis scripts + +**TBD** (with diagrams) # Analysis and observations @@ -271,14 +387,24 @@ Refer to figure \ref{fig:figureV}. ## Potential causes of unsuccesful executions +**TBD** + # Implementation issues -- Analysis limitations ## Discussion on unknown fields +**TBD** + ## Limitation on computation resources required for the analysis +**TBD** + ## Other limitations ... +**TBD** + # Conclusions and future work or possible developments +**TBD** + diff --git a/report/Claudio_Maggioni_report.pdf b/report/Claudio_Maggioni_report.pdf index 2a36b56f..9077899f 100644 Binary files a/report/Claudio_Maggioni_report.pdf and b/report/Claudio_Maggioni_report.pdf differ diff --git a/report/figures/event_types.png b/report/figures/event_types.png new file mode 100644 index 00000000..92190832 Binary files /dev/null and b/report/figures/event_types.png differ diff --git a/status.ods b/status.ods index 19d30ff3..56b4e7aa 100644 Binary files a/status.ods and b/status.ods differ