diff --git a/Assignment2_part2/report.pdf b/Assignment2_part2/report.pdf new file mode 100644 index 0000000..38b7a06 Binary files /dev/null and b/Assignment2_part2/report.pdf differ diff --git a/Assignment2_part2/report/.gitignore b/Assignment2_part2/report/.gitignore new file mode 100644 index 0000000..5935864 --- /dev/null +++ b/Assignment2_part2/report/.gitignore @@ -0,0 +1 @@ +_tmp.md \ No newline at end of file diff --git a/Assignment2_part2/report/build.sh b/Assignment2_part2/report/build.sh new file mode 100755 index 0000000..bff4f80 --- /dev/null +++ b/Assignment2_part2/report/build.sh @@ -0,0 +1,9 @@ +#!/bin/bash + +set -e + +SCRIPT_DIR=$(cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd) + +cd "$SCRIPT_DIR" +m4 -I"$SCRIPT_DIR" main.md > _tmp.md +pandoc _tmp.md -o ../report.pdf diff --git a/Assignment2_part2/report/canvas_any.png b/Assignment2_part2/report/canvas_any.png new file mode 100644 index 0000000..9f61484 Binary files /dev/null and b/Assignment2_part2/report/canvas_any.png differ diff --git a/Assignment2_part2/report/canvas_city.png b/Assignment2_part2/report/canvas_city.png new file mode 100644 index 0000000..918dc95 Binary files /dev/null and b/Assignment2_part2/report/canvas_city.png differ diff --git a/Assignment2_part2/report/dashboard.png b/Assignment2_part2/report/dashboard.png new file mode 100644 index 0000000..2cab8a0 Binary files /dev/null and b/Assignment2_part2/report/dashboard.png differ diff --git a/Assignment2_part2/report/main.md b/Assignment2_part2/report/main.md new file mode 100644 index 0000000..0f17b05 --- /dev/null +++ b/Assignment2_part2/report/main.md @@ -0,0 +1,84 @@ +--- +author: Claudio Maggioni +title: Visual Analytics -- Assignment 2 -- Part 2 +geometry: margin=2cm,bottom=3cm +--- + +changequote(`{{', `}}') + +# Indexing + +Similarly to part 1 of the assignment, the first step of indexing is to convert +the newly given CSV dataset (stored in `data/restaurants_extended.csv`) into a +JSON-lines file which can be directly used as the HTTP request body of +Elasticsearch document insertion requests. + +The conversion is performed by the script `./convert.sh`. The converted file +is stored in the JSON-lines file `data/restaurants_extended.jsonl`. + +The sources of `./convert.sh` are listed below: + +```shell +include({{../convert.sh}}) +``` + +The only change in the script is the way the field containing the restaurant +location is parsed. In the extended dataset, city, country and continent are in +this field and separated by `/`. The script maps the three values in separate +fields and additionally maps the entire string to an additional `cityRaw` field +which is used in the generation of the runtime field for part 2. + +The sourced of the updated upload script, loading the new index are listed +below: + +```shell +include({{../upload.sh}}) +``` + +Mappings are stored in `mappings.json` and are identical to the ones in Part 1 +other than for the new location fields and their `.keyword` counterparts +similarly generated as the old `city` field. + +9499 documents are imported. + +# Data Visualization + +The Dashboard, Canvas, and requested dependencies (like scripted fields and +stored searched) are stored in the JSON object export file `export.ndjson`. +Screenshot of the Dashboard and Canvas can be found below. + +The scripted field `continent_scripted` has been generated with the following +Painless expression: + +```java +doc['cityRaw.keyword'].value.substring(doc['cityRaw.keyword'].value.lastIndexOf("/") + 1) +``` + +The expression extracts the last portion of the `cityRaw` field, i.e. the +portion of text between the last `/` and the end of the field, which contains +the continent. + + +![Part 2 Dashboard](dashboard.png) + +![Part 2 Canvas with no city selected](canvas_any.png) + +![Part 2 Canvas with a city selected](canvas_city.png) + +# Ingestion Plugin + +Sources for the ingestion plugin can be found in the Gitlab repository: + +[_usi-si-teaching/msde/2022-2023/visual-analytics-atelier/elasticsearch-plugin/ingest-lookup-maggicl_](https://gitlab.com/usi-si-teaching/msde/2022-2023/visual-analytics-atelier/elasticsearch-plugin/ingest-lookup-maggicl). + +The plugin can be built and installed on Elasticsearch with the script +`./install-on-ec.sh` included in the repository by changing the variable +`ES_LOCATION` to the path to the local installation of Elasticsearch. + +The plugin works as illustrated in the `README.md` file in the repository, and +it has been tested with a unit test suite included in its sources. + +The plugin lookup procedure works by splitting the indicated field in words +(non-empty sequences of non-space characters -- according to the PCRE regular +expression specification) and matching each word with the given +substitution map, performing substitutions when needed.