Image Search IR System

WS2020-21 Information Retrieval Project

Claudio Maggioni

Introduction

This report is a summary of the work I have done to create the "Image Search IR system", a proof-of-concept IR system implementation implementing the "Image Search Engine" project (project #13). The project is built on a simple Scrapy-Solr-HTML5+CSS+JS stack. Installation instructions, an in-depth look to the project components for scraping, indexing, and displaying the results, and finally the user evaluation report, can all be found in the following sections.

Installation instructions

Project repository

The project Git repository is located here: https://git.maggioni.xyz/maggicl/IRProject.

Solr installation

The installation of the project and population of the test collection with the scraped documents is automated by a single script. The script requires you have downloaded Solr version 8.6.2. as a ZIP file, i.e. the same Solr ZIP we had to download during lab lectures. Should you need to download a copy of the ZIP file, you can find it here: https://maggioni.xyz/solr-8.6.2.zip. Clone the project's git repository and position yourself with a shell on the project's root directory. Then execute this command:

./solr_install.sh {ZIP path}

where {ZIP path} is the path of the ZIP file mentioned earlier. This will install, start, and update Solr with the test collection.

UI installation

In order to start the UI, open with your browser of choice the file ui/index.html. In order to use the UI, it is necessary to bypass Cross Origin Resource Sharing security checks by downloading and enabling a "CORS everywhere" extension. I suggest this one for Mozilla Firefox and derivatives: https://addons.mozilla.org/en-US/firefox/addon/cors-everywhere/

Run the website scrapers

A prerequisite to run the Flickr crawler is to have a working Scrapy Splash instance listening on port localhost:8050. This can be achieved by executing this Docker command, should a Docker installation be available:

docker run -p 8050:8050 scrapinghub/scrapy

In order to all the website scrapers, run the script ./scrape.sh with no arguments.

Scraping

Indexing and Solr configuration

User interface

User evaluation