IRProject/report/report.tex

% vim: set ts=2 sw=2 et tw=80:

\documentclass{scrartcl}
\usepackage{hyperref}
\usepackage{parskip}
\usepackage{minted}
\usepackage[utf8]{inputenc}

\setlength{\parindent}{0pt}

\usepackage[margin=2.5cm]{geometry}

\title{\textit{Image Search IR System} \\\vspace{0.3cm}
\Large{WS2020-21 Information Retrieval Project}}
\author{Claudio Maggioni}

\begin{document}
\maketitle
\tableofcontents
\newpage

\section{Introduction}
This report is a summary of the work I have done to create the ``Image Search IR
system'', a proof-of-concept IR system implementation implementing the ``Image
Search Engine'' project (project \#13).

The project is built on a simple
\textit{Scrapy}-\textit{Solr}-\textit{HTML5+CSS+JS} stack. Installation
instructions, an in-depth look to the project components for scraping, indexing,
and displaying the results, and finally the user evaluation report, can all be
found in the following sections.

\section{Installation instructions}

\subsection{Project repository}
The project Git repository is located here:
\url{https://git.maggioni.xyz/maggicl/IRProject}.

\subsection{Solr installation}
The installation of the project and population of the test collection with the
scraped documents is automated by a single script. The script requires you have
downloaded \textit{Solr} version 8.6.2. as a ZIP file, i.e.\  the same
\textit{Solr} ZIP we had to download during lab lectures. Should you need to
download a copy of the ZIP file, you can find it here: \url{https://maggioni.xyz/solr-8.6.2.zip}.

Clone the project's git repository and position yourself with a shell on the
project's root directory. Then execute this command:

% linenos
\begin{minted}[frame=lines,framesep=2mm]{bash}
./solr_install.sh {ZIP path}
\end{minted}

where \texttt{\{ZIP path\}} is the path of the ZIP file mentioned earlier. This
will install, start, and update \textit{Solr} with the test collection.

\subsection{UI installation}
In order to start the UI, open with your browser of choice the file
\texttt{ui/index.html}. In order to use the UI, it is necessary to bypass
\texttt{Cross Origin Resource Sharing} security checks by downloading and
enabling a ``CORS everywhere'' extension. I suggest
\href{https://addons.mozilla.org/en-US/firefox/addon/cors-everywhere/}{this one} for
Mozilla Firefox and derivatives.

\subsection{Run the website scrapers}
A prerequisite to run the Flickr crawler is to have a working Scrapy Splash
instance listening on port \texttt{localhost:8050}. This can be achieved by
executing this Docker command, should a Docker installation be available:

\begin{minted}[frame=lines,framesep=2mm]{bash}
docker run -p 8050:8050 scrapinghub/scrapy
\end{minted}

In order to all the website scrapers, run the script \texttt{./scrape.sh} with
no arguments.

\section{Scraping}

\section{Indexing and \textit{Solr} configuration}

\section{User interface}

\section{User evaluation}
\end{document}
Started report 2020-12-07 17:45:46 +00:00			`% vim: set ts=2 sw=2 et tw=80:`

			`\documentclass{scrartcl}`
			`\usepackage{hyperref}`
			`\usepackage{parskip}`
			`\usepackage{minted}`
			`\usepackage[utf8]{inputenc}`

			`\setlength{\parindent}{0pt}`

			`\usepackage[margin=2.5cm]{geometry}`

			`\title{\textit{Image Search IR System} \\\vspace{0.3cm}`
			`\Large{WS2020-21 Information Retrieval Project}}`
			`\author{Claudio Maggioni}`

			`\begin{document}`
			`\maketitle`
			`\tableofcontents`
			`\newpage`

			`\section{Introduction}`
			This report is a summary of the work I have done to create the ``Image Search IR
			system'', a proof-of-concept IR system implementation implementing the ``Image
			`Search Engine'' project (project \#13).`

			`The project is built on a simple`
			`\textit{Scrapy}-\textit{Solr}-\textit{HTML5+CSS+JS} stack. Installation`
			`instructions, an in-depth look to the project components for scraping, indexing,`
			`and displaying the results, and finally the user evaluation report, can all be`
			`found in the following sections.`

			`\section{Installation instructions}`

			`\subsection{Project repository}`
			`The project Git repository is located here:`
			`\url{https://git.maggioni.xyz/maggicl/IRProject}.`

			`\subsection{Solr installation}`
			`The installation of the project and population of the test collection with the`
			`scraped documents is automated by a single script. The script requires you have`
			`downloaded \textit{Solr} version 8.6.2. as a ZIP file, i.e.\ the same`
			`\textit{Solr} ZIP we had to download during lab lectures. Should you need to`
Fix report 2020-12-07 17:54:22 +00:00			`download a copy of the ZIP file, you can find it here: \url{https://maggioni.xyz/solr-8.6.2.zip}.`
Started report 2020-12-07 17:45:46 +00:00
			`Clone the project's git repository and position yourself with a shell on the`
			`project's root directory. Then execute this command:`

			`% linenos`
			`\begin{minted}[frame=lines,framesep=2mm]{bash}`
			`./solr_install.sh {ZIP path}`
			`\end{minted}`

Fix report 2020-12-07 17:54:22 +00:00			`where \texttt{\{ZIP path\}} is the path of the ZIP file mentioned earlier. This`
Started report 2020-12-07 17:45:46 +00:00			`will install, start, and update \textit{Solr} with the test collection.`

			`\subsection{UI installation}`
			`In order to start the UI, open with your browser of choice the file`
			`\texttt{ui/index.html}. In order to use the UI, it is necessary to bypass`
			`\texttt{Cross Origin Resource Sharing} security checks by downloading and`
			enabling a ``CORS everywhere'' extension. I suggest
			`\href{https://addons.mozilla.org/en-US/firefox/addon/cors-everywhere/}{this one} for`
			`Mozilla Firefox and derivatives.`

			`\subsection{Run the website scrapers}`
			`A prerequisite to run the Flickr crawler is to have a working Scrapy Splash`
			`instance listening on port \texttt{localhost:8050}. This can be achieved by`
			`executing this Docker command, should a Docker installation be available:`

			`\begin{minted}[frame=lines,framesep=2mm]{bash}`
			`docker run -p 8050:8050 scrapinghub/scrapy`
			`\end{minted}`

			`In order to all the website scrapers, run the script \texttt{./scrape.sh} with`
			`no arguments.`

			`\section{Scraping}`

			`\section{Indexing and \textit{Solr} configuration}`

			`\section{User interface}`

			`\section{User evaluation}`
			`\end{document}`