IRProject/report/report.tex

% vim: set ts=2 sw=2 et tw=80:

\documentclass{scrartcl}
\usepackage{hyperref}
\usepackage{parskip}
\usepackage{minted}
\usepackage[utf8]{inputenc}

\setlength{\parindent}{0pt}

\usepackage[margin=2.5cm]{geometry}

\title{\textit{Image Search IR System} \\\vspace{0.3cm}
\Large{WS2020-21 Information Retrieval Project}}
\author{Claudio Maggioni}

\begin{document}
\maketitle
\tableofcontents
\newpage

\section{Introduction}
This report is a summary of the work I have done to create the ``Image Search IR
system'', a proof-of-concept IR system implementation implementing the ``Image
Search Engine'' project (project \#13).

The project is built on a simple
\textit{Scrapy}-\textit{Solr}-\textit{HTML5+CSS+JS} stack. Installation
instructions, an in-depth look to the project components for scraping, indexing,
and displaying the results, and finally the user evaluation report, can all be
found in the following sections.

\section{Installation instructions}

\subsection{Project repository}
The project Git repository is located here:
\url{https://git.maggioni.xyz/maggicl/IRProject}.

\subsection{Solr installation}
The installation of the project and population of the test collection with the
scraped documents is automated by a single script. The script requires you have
downloaded \textit{Solr} version 8.6.2. as a ZIP file, i.e.\  the same
\textit{Solr} ZIP we had to download during lab lectures. Should you need to
download a copy of the ZIP file, you can find it here (on USI's onedrive
hosting): \url{http://to-do.com/file}.

Clone the project's git repository and position yourself with a shell on the
project's root directory. Then execute this command:

% linenos
\begin{minted}[frame=lines,framesep=2mm]{bash}
./solr_install.sh {ZIP path}
\end{minted}

where \texttt{<ZIP path>} is the path of the ZIP file mentioned earlier. This
will install, start, and update \textit{Solr} with the test collection.

\subsection{UI installation}
In order to start the UI, open with your browser of choice the file
\texttt{ui/index.html}. In order to use the UI, it is necessary to bypass
\texttt{Cross Origin Resource Sharing} security checks by downloading and
enabling a ``CORS everywhere'' extension. I suggest
\href{https://addons.mozilla.org/en-US/firefox/addon/cors-everywhere/}{this one} for
Mozilla Firefox and derivatives.

\subsection{Run the website scrapers}
A prerequisite to run the Flickr crawler is to have a working Scrapy Splash
instance listening on port \texttt{localhost:8050}. This can be achieved by
executing this Docker command, should a Docker installation be available:

\begin{minted}[frame=lines,framesep=2mm]{bash}
docker run -p 8050:8050 scrapinghub/scrapy
\end{minted}

In order to all the website scrapers, run the script \texttt{./scrape.sh} with
no arguments.

\section{Scraping}

\section{Indexing and \textit{Solr} configuration}

\section{User interface}

\section{User evaluation}
\end{document}