% vim: set ts=2 sw=2 et tw=80: \documentclass{scrartcl} \usepackage{hyperref} \usepackage{parskip} \usepackage{minted} \usepackage[utf8]{inputenc} \setlength{\parindent}{0pt} \usepackage[margin=2.5cm]{geometry} \title{\textit{Image Search IR System} \\\vspace{0.3cm} \Large{WS2020-21 Information Retrieval Project}} \author{Claudio Maggioni} \begin{document} \maketitle \tableofcontents \newpage \section{Introduction} This report is a summary of the work I have done to create the ``Image Search IR system'', a proof-of-concept IR system implementation implementing the ``Image Search Engine'' project (project \#13). The project is built on a simple \textit{Scrapy}-\textit{Solr}-\textit{HTML5+CSS+JS} stack. Installation instructions, an in-depth look to the project components for scraping, indexing, and displaying the results, and finally the user evaluation report, can all be found in the following sections. \section{Installation instructions} \subsection{Project repository} The project Git repository is located here: \url{https://git.maggioni.xyz/maggicl/IRProject}. \subsection{Solr installation} The installation of the project and population of the test collection with the scraped documents is automated by a single script. The script requires you have downloaded \textit{Solr} version 8.6.2. as a ZIP file, i.e.\ the same \textit{Solr} ZIP we had to download during lab lectures. Should you need to download a copy of the ZIP file, you can find it here: \url{https://maggioni.xyz/solr-8.6.2.zip}. Clone the project's git repository and position yourself with a shell on the project's root directory. Then execute this command: % linenos \begin{minted}[frame=lines,framesep=2mm]{bash} ./solr_install.sh {ZIP path} \end{minted} where \texttt{\{ZIP path\}} is the path of the ZIP file mentioned earlier. This will install, start, and update \textit{Solr} with the test collection. \subsection{UI installation} In order to start the UI, open with your browser of choice the file \texttt{ui/index.html}. In order to use the UI, it is necessary to bypass \texttt{Cross Origin Resource Sharing} security checks by downloading and enabling a ``CORS everywhere'' extension. I suggest \href{https://addons.mozilla.org/en-US/firefox/addon/cors-everywhere/}{this one} for Mozilla Firefox and derivatives. \subsection{Run the website scrapers} A prerequisite to run the Flickr crawler is to have a working Scrapy Splash instance listening on port \texttt{localhost:8050}. This can be achieved by executing this Docker command, should a Docker installation be available: \begin{minted}[frame=lines,framesep=2mm]{bash} docker run -p 8050:8050 scrapinghub/scrapy \end{minted} In order to all the website scrapers, run the script \texttt{./scrape.sh} with no arguments. \section{Scraping} \section{Indexing and \textit{Solr} configuration} \section{User interface} \section{User evaluation} \end{document}