AICup/Lectures/Student_lecture 1.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## First Lab\n",
    "\n",
    "What we are going to do today:\n",
    "- read TSP data\n",
    "- define euclidean distance function\n",
    "- define a ProblemInstance python class \n",
    "- store nodes in an instance of the class defined before\n",
    "- plot raw data\n",
    "- generate naive solution \n",
    "- check if the solution is valid\n",
    "- evaluate solution!#\n",
    "\n",
    "NOTE: I've marked all the code that you will have to fill with a `# TODO` comment\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This cell below is simply importing some useful stuff for later"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import glob\n",
    "import numpy as np\n",
    "from matplotlib import pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Read TSP data\n",
    "In this Cup you will have to deal with predefined set of problems. These problems are located in the `problems` folder.\n",
    "\n",
    "First lets get list them out"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ch130.tsp\n",
      "d198.tsp\n",
      "eil76.tsp\n",
      "fl1577.tsp\n",
      "kroA100.tsp\n",
      "lin318.tsp\n",
      "pcb442.tsp\n",
      "pr439.tsp\n",
      "rat783.tsp\n",
      "u1060.tsp\n"
     ]
    }
   ],
   "source": [
    "problems = glob.glob('../problems/*.tsp')\n",
    "# example_problem = [\"../problems/eil76.tsp\"]\n",
    "for prob in problems:\n",
    "    print(prob[12:])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Checking by hand if all of the 10 problems are in the folder would be a waste of time so we can write a line of code just to check if they are all there"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "print(np.all([n[12:] in ['fl1577.tsp','pr439.tsp','ch130.tsp','rat783.tsp','d198.tsp', 'kroA100.tsp','u1060.tsp','lin318.tsp','eil76.tsp','pcb442.tsp'] for n in problems]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### File format\n",
    "All the problems are stored in a `.tsp` (this file is actually a renamed `.txt` file, so you could open them with your favorite text editor)\n",
    "\n",
    "As we will see in a bit all the problems files are composed of different sections:\n",
    "* `NAME`: the shortned name of the problem\n",
    "* `COMMENT`: a comment area that can contain the full name of the problem\n",
    "* `TYPE`: this defines the type of problem at hand, in our case is always TSP\n",
    "* `DIMENSION`: this states the problem dimension\n",
    "* `EDGE_WEIGHT_TYPE`: this section states the types of weights applied to edges, in our case it is always EUC_2D or the weights are giveng using the euclidean distance in 2 dimension\n",
    "* `BEST_KNOWN`: this states the best known result obtained, note that as the Prof said, it is unlikely to get a better performance than this\n",
    "* `NODE_COORD_SECTION`: finally we have the section that states the triplets that defines the problems points. These triplets are (point_number, x,y).\n",
    "\n",
    "Now that we know all of that, lets print the content of a single problem"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['NAME : eil76', 'COMMENT : 76-city problem (Christofides/Eilon)', 'TYPE : TSP', 'DIMENSION : 76', 'EDGE_WEIGHT_TYPE : EUC_2D', 'BEST_KNOWN : 538', 'NODE_COORD_SECTION', '1 22 22', '2 36 26', '3 21 45', '4 45 35', '5 55 20', '6 33 34', '7 50 50', '8 55 45', '9 26 59', '10 40 66', '11 55 65', '12 35 51', '13 62 35', '14 62 57', '15 62 24', '16 21 36', '17 33 44', '18 9 56', '19 62 48', '20 66 14', '21 44 13', '22 26 13', '23 11 28', '24 7 43', '25 17 64', '26 41 46', '27 55 34', '28 35 16', '29 52 26', '30 43 26', '31 31 76', '32 22 53', '33 26 29', '34 50 40', '35 55 50', '36 54 10', '37 60 15', '38 47 66', '39 30 60', '40 30 50', '41 12 17', '42 15 14', '43 16 19', '44 21 48', '45 50 30', '46 51 42', '47 50 15', '48 48 21', '49 12 38', '50 15 56', '51 29 39', '52 54 38', '53 55 57', '54 67 41', '55 10 70', '56 6 25', '57 65 27', '58 40 60', '59 70 64', '60 64 4', '61 36 6', '62 30 20', '63 20 30', '64 15 5', '65 50 70', '66 57 72', '67 45 42', '68 38 33', '69 50 4', '70 66 8', '71 59 5', '72 35 60', '73 27 24', '74 40 20', '75 40 37', '76 40 40', 'EOF']\n"
     ]
    }
   ],
   "source": [
    "example_problem = \"../problems/eil76.tsp\"\n",
    "with open(example_problem,\"r\") as exprob:\n",
    "    print(exprob.read().splitlines())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Euclidean Distance\n",
    "Since all of our problems are using the euclidean distance between points for the edges weights.\n",
    "We will now define a function that computes the euclidean distance. This distance will also be used to build the distance matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def distance_euc(point_i, point_j): # TODO\n",
    "    rounding = 0\n",
    "    x_i = point_i[0]\n",
    "    y_i = point_i[1]\n",
    "    x_j, y_j = point_j[0], point_j[1]\n",
    "    distance = np.sqrt((x_i - x_j) ** 2 + (y_i- y_j) ** 2)\n",
    "    return round(distance, rounding)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's test it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4.0"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "point_1 = (2, 2)\n",
    "point_2 = (5, 5)\n",
    "distance_euc(point_1, point_2)\n",
    "# Expected output is 4.0 with rounding to 0 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading and storing the data\n",
    "We will now define a Class called `ProblemInstance`\n",
    "\n",
    "in the Constructor of the class (`__init__()`method of a class in Python) you will have to implement the code for:\n",
    "* reading the raw data\n",
    "* store the metadata\n",
    "* read all the point and store them\n",
    "* code the method that creates the distance matrix between points\n",
    "* \\[optional\\] check if the problem loaded has an optimal and in that case store the optimal solution\n",
    "* \\[optional\\] code the plotting method\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from src.utils import distance_euc\n",
    "\n",
    "class ProblemInstance:\n",
    "\n",
    "    def __init__(self, name_tsp):\n",
    "        self.exist_opt = False\n",
    "        self.optimal_tour = None\n",
    "        self.dist_matrix = None\n",
    "        \n",
    "        # read raw data  \n",
    "        # TODO\n",
    "        with open(name_tsp) as f_o:\n",
    "            data= f_o.read()\n",
    "            self.lines = data.splitlines()\n",
    "        \n",
    "#         file_object = open(name_tsp)\n",
    "#         data = file_object.read()\n",
    "#         file_object.close()\n",
    "#         self.lines = data.splitlines()\n",
    "\n",
    "        # store metadata set information \n",
    "        # TODO\n",
    "        self.name = self.lines[0].split(' ')[2]\n",
    "        # here we expect the name of the problem\n",
    "        self.nPoints =  np.int(self.lines[3].split(' ')[2])\n",
    "        self.best_sol = np.float(self.lines[5].split(' ')[2])\n",
    "            # here the lenght of the best solution\n",
    "        \n",
    "        # read all data points and store them \n",
    "        # TODO\n",
    "        self.points = np.zeros((self.nPoints, 3)) # this is the structure where we will store the pts data \n",
    "        for i in range(self.nPoints):\n",
    "            line_i = self.line[7 + i].split(' ')\n",
    "            self.points[i, 0] = int(line_i[0])\n",
    "            self.points[i, 1] = line_i[1]\n",
    "            self.points[i, 2] = line_i[2]\n",
    "        \n",
    "        self.create_dist_matrix()\n",
    "        \n",
    "        # TODO [optional]\n",
    "        # if the problem is one with a optimal solution, that solution is loaded\n",
    "        if name_tsp in [\"../problems/eil76.tsp\", \"../problems/kroA100.tsp\"]:\n",
    "            self.exist_opt = True\n",
    "            file_object = open(name_tsp.replace(\".tsp\", \".opt.tour\"))\n",
    "            data = file_object.read()\n",
    "            file_object.close()\n",
    "            lines = data.splitlines()\n",
    "\n",
    "            # read all data points and store them\n",
    "            self.optimal_tour = np.zeros(self.nPoints, dtype=np.int)\n",
    "            for i in range(self.nPoints):\n",
    "                line_i = lines[5 + i].split(' ')\n",
    "                self.optimal_tour[i] = int(line_i[0]) - 1\n",
    "\n",
    "    def print_info(self):\n",
    "        print(\"\\n#############################\\n\")\n",
    "        print('name: ' + self.name)\n",
    "        print('nPoints: ' + str(self.nPoints))\n",
    "        print('best_sol: ' + str(self.best_sol))\n",
    "        print('exist optimal: ' + str(self.exist_opt))\n",
    "\n",
    "    def plot_data(self,show_numbers=False): # todo [optional]\n",
    "        plt.figure(figsize=(8, 8))\n",
    "        plt.title(self.name)\n",
    "        plt.scatter(self.points[:, 1], self.points[:, 2])\n",
    "        if show_numbers:\n",
    "            for i, txt in enumerate(np.arange(self.nPoints)):  # tour_found[:-1]\n",
    "                plt.annotate(txt, (self.points[i, 1], self.points[i, 2]))\n",
    "        plt.show()\n",
    "\n",
    "    def create_dist_matrix(self): # TODO\n",
    "        self.dist_matrix = np.zeros((self.nPoints, self.nPoints))\n",
    "        \n",
    "        for i in range(self.nPoints):\n",
    "            for j in range(i, self.nPoints):\n",
    "                self.dist_matrix[i, j] = distance_euc(self.points[i][1:3], self.points[j][1:3])\n",
    "        self.dist_matrix += self.dist_matrix.T\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "------------------------\n",
    "Now we can test our Class with an example problem"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "example_problem = \"../problems/eil76.tsp\"\n",
    "p_inst=ProblemInstance(example_problem)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p_inst.print_info()\n",
    "p_inst.plot_data()\n",
    "#Expected output\n",
    "\"\"\"\n",
    "#############################\n",
    "\n",
    "name: eil76\n",
    "nPoints: 76\n",
    "best_sol: 538.0\n",
    "exist optimal: True\n",
    "\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "p_inst.plot_data(show_numbers=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-------------\n",
    "### Random solver \n",
    "Now we will code the random solver and test it with a class called `SolverTSP` that takes the solvers and the problem instance and act as a framework to compute the solution and gives us some additional information.\n",
    "We will also need to code the `evaluate_solution` method of the the `SolverTSP` class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_method(instance_): # TODO\n",
    "    return solution\n",
    "available_methods = {\"random\": random_method} # this is here because the SolverTSP will check for the available methods"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from time import time as t\n",
    "\n",
    "class SolverTSP:\n",
    "    def __init__(self, algorithm_name, problem_instance):\n",
    "        self.duration = np.inf\n",
    "        self.found_length = np.inf\n",
    "        self.algorithm_name = algorithm_name\n",
    "        self.name_method = \"initialized with \" + algorithm_name\n",
    "        self.solved = False\n",
    "        self.problem_instance = problem_instance\n",
    "        self.solution = None\n",
    "\n",
    "    def compute_solution(self, verbose=True, return_value=True):\n",
    "        self.solved = False\n",
    "        if verbose:\n",
    "            print(f\"###  solving with {self.algorithm_name}  ####\")\n",
    "        start_time = t()\n",
    "        self.solution = available_methods[self.algorithm_name](self.problem_instance)\n",
    "        assert self.check_if_solution_is_valid(self.solution), \"Error the solution is not valid\"\n",
    "        end_time = t()\n",
    "        self.duration = np.around(end_time - start_time, 3)\n",
    "        if verbose:\n",
    "            print(f\"###  solved  ####\")\n",
    "        self.solved = True\n",
    "        self.evaluate_solution()\n",
    "        self._gap()\n",
    "        if return_value:\n",
    "            return self.solution\n",
    "\n",
    "    def plot_solution(self):\n",
    "        assert self.solved, \"You can't plot the solution, you need to compute it first!\"\n",
    "        plt.figure(figsize=(8, 8))\n",
    "        self._gap()\n",
    "        plt.title(f\"{self.problem_instance.name} solved with {self.name_method} solver, gap {self.gap}\")\n",
    "        ordered_points = self.problem_instance.points[self.solution]\n",
    "        plt.plot(ordered_points[:, 1], ordered_points[:, 2], 'b-')\n",
    "        plt.show()\n",
    "\n",
    "    def check_if_solution_is_valid(self, solution):\n",
    "        rights_values = np.sum([self.check_validation(i, solution) for i in np.arange(self.problem_instance.nPoints)])\n",
    "        if  rights_values == self.problem_instance.nPoints:\n",
    "            return True\n",
    "        else:\n",
    "            return False \n",
    "    def check_validation(self, node , solution):\n",
    "         if np.sum(solution == node) == 1:\n",
    "            return 1\n",
    "         else:\n",
    "            return 0\n",
    "\n",
    "    def evaluate_solution(self, return_value=False):\n",
    "        total_length = 0\n",
    "        from_node = self.solution[0] # starting_node\n",
    "        # TODO\n",
    "        # [...] compute total_lenght of the solution \n",
    "        self.found_length = total_length\n",
    "        if return_value:\n",
    "            return total_length\n",
    "\n",
    "    def _gap(self):\n",
    "        self.evaluate_solution(return_value=False)\n",
    "        self.gap = np.round(\n",
    "            ((self.found_length - self.problem_instance.best_sol) / self.problem_instance.best_sol) * 100, 2)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----------------------------\n",
    "Now we will test our code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "solver_name=\"random\"\n",
    "# here I'm repeating this two lines just to remind you which problem we are using\n",
    "example_problem = \"../problems/eil76.tsp\"\n",
    "p_inst = ProblemInstance(example_problem)\n",
    "\n",
    "# TODO\n",
    "# create an instance of SolverTSP\n",
    "# compute a solution\n",
    "# print the information as for the output\n",
    "# plot the solution\n",
    "\n",
    "# this is the output expected and after that the solution's plot\n",
    "\"\"\"\n",
    "###  solving with random  ####\n",
    "###  solved  ####\n",
    "the total length for the solution found is 2424.0\n",
    "while the optimal length is 538.0\n",
    "the gap is 350.56%\n",
    "the solution is found in 0.0 seconds\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "--------------------\n",
    "Finally since our example problem has an optimal solution we can plot it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "solver = SolverTSP(\"optimal\", p_inst)\n",
    "solver.solved = True\n",
    "solver.solution = np.concatenate([p_inst.optimal_tour, [p_inst.optimal_tour[0]]])\n",
    "solver.plot_solution()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "PyCharm (AI2020BsC)",
   "language": "python",
   "name": "pycharm-61970693"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}