mp1: done 2 and 4

This commit is contained in:
Claudio Maggioni 2020-09-20 17:10:30 +02:00
parent db29a75788
commit d24c1f252d
19 changed files with 271 additions and 1 deletions

22
mp1/files_data/ex2.m Normal file
View file

@ -0,0 +1,22 @@
%[U,G] = surfer('https://www.usi.ch',500);
% pagerank(U,G);
% A = (1/40) * [1 1 1 35 1 1;
% 18 1 1 1 1 1;
% 18 18 1 1 1 1;
% 1 18 35 1 1 1;
% 1 1 1 1 1 35;
% 1 1 1 1 35 1];
A = (1/40) * [
0 0 0 40 0 0;
20 0 0 0 0 0;
20 20 0 0 0 0;
0 20 40 0 0 0;
0 0 0 0 0 40;
0 0 0 0 40 0];
[v,d] = eig(A);
display(d);
display(v);

BIN
mp1/files_data/run1.mat Normal file

Binary file not shown.

BIN
mp1/files_data/run1rank.fig Normal file

Binary file not shown.

View file

@ -0,0 +1,32 @@
page-rank in out url
360 0.0869 31 1 https://creativecommons.org/licenses/by-sa/3.0
204 0.0406 8 1 https://forum.gitlab.com
82 0.0189 117 18 https://www.mediawiki.org
81 0.0188 117 4 https://wikimediafoundation.org
87 0.0150 6 1 https://docs.gitea.io
78 0.0145 114 9 https://www.mediawiki.org/wiki/Special:MyLanguage/How_to_contribute
77 0.0132 77 13 https://foundation.wikimedia.org/wiki/Privacy_policy
217 0.0127 40 8 https://bugs.archlinux.org
80 0.0115 107 6 https://foundation.wikimedia.org/wiki/Cookie_statement
215 0.0114 38 5 https://bbs.archlinux.org
216 0.0114 38 8 https://wiki.archlinux.org
218 0.0114 38 6 https://security.archlinux.org
428 0.0107 9 1 https://www.dnb.de/kataloghilfe
219 0.0102 38 7 https://aur.archlinux.org
359 0.0098 9 1 https://creativecommons.org/publicdomain/zero/1.0
366 0.0092 27 21 https://archive.org
446 0.0089 24 5 https://foundation.wikimedia.org/wiki/Terms_of_Use
83 0.0079 78 0 https:\/\/schema.org
85 0.0074 77 0 https:\/\/www.wikimedia.org\/static\/images\/wmf-hor-googpub.png
181 0.0066 8 2 https://gitlab.com
95 0.0062 2 1 https://www.enable-javascript.com
113 0.0061 13 1 https://www.britannica.com/topic/polenta
417 0.0058 8 1 https://www.dnb.de/DE/Home/home_node.html
429 0.0058 8 1 https://www.dnb.de/EN/Home/home_node.html
432 0.0058 8 1 https://www.dnb.de/expertensuche
379 0.0057 24 2 https://blog.archive.org
213 0.0057 32 7 https://www.archlinux.org
99 0.0056 3 1 https://www.usi.ch/it
19 0.0051 4 1 https://creativecommons.org/licenses/by-nc-sa/4.0
214 0.0050 31 5 https://www.archlinux.org/packages
220 0.0050 31 6 https://www.archlinux.org/download

BIN
mp1/files_data/run2.mat Normal file

Binary file not shown.

BIN
mp1/files_data/run2rank.fig Normal file

Binary file not shown.

View file

@ -0,0 +1,23 @@
page-rank in out url
411 0.0249 42 1 https://twitter.com/mozilla
63 0.0248 145 1 https://twitter.com/firefox
68 0.0203 142 1 https://www.instagram.com/firefox
412 0.0164 37 1 https://www.instagram.com/mozilla
62 0.0080 21 1 https://github.com/mozilla/kitsune
81 0.0070 110 2 https://www.apple.com
384 0.0064 5 1 https://www.xfinity.com/privacy/policy/dns
4 0.0064 32 0 https:
377 0.0059 19 1 https://abouthome-snippets-service.readthedocs.io/en/latest/data_collection.html 1
393 0.0059 19 1 https://www.adjust.com/terms/privacy-policy
410 0.0057 16 1 https://wiki.mozilla.org/Firefox/Data_Collection
400 0.0057 15 1 https://yandex.ru/legal/confidential
396 0.0057 15 1 https://github.com/mozilla-mobile/firefox-ios/blob/master/Docs/MMA.md
5 0.0056 31 0 https://ssl
3 0.0054 36 0 https://www.iisbadoni.edu.it/sites/default/files/favicon.ico
6 0.0054 36 0 https://www.iisbadoni.edu.it/sites/default/files/logo.png
208 0.0054 159 0 https://schema.org
74 0.0052 178 5 https://foundation.mozilla.org
72 0.0052 33 32 https://www.mozilla.org/privacy/websites/#cookies
23 0.0051 2 1 https://www.iisbadoni.edu.it/mad
300 0.0051 157 0 https://accounts.firefox.com

BIN
mp1/files_data/run3.mat Normal file

Binary file not shown.

BIN
mp1/files_data/run3rank.fig Normal file

Binary file not shown.

View file

@ -0,0 +1,43 @@
page-rank in out url
55 0.0741 354 1 https://www.instagram.com/usiuniversity
53 0.0324 366 3 https://www.facebook.com/usiuniversity
299 0.0248 6 1 https://twitter.com/usi_en
329 0.0243 8 1 https://www.facebook.com/USIeLab
308 0.0156 7 3 https://www.facebook.com/USIFinancialCommunication
60 0.0155 316 2 https://www.swissuniversities.ch
424 0.0144 96 1 https://it.bul.sbu.usi.ch
330 0.0123 6 4 https://www.facebook.com/USI.ITDxC
320 0.0122 7 1 https://www.facebook.com/usiimeg
56 0.0107 320 0 https://www.youtube.com/usiuniversity
5 0.0096 317 71 https://usi.ch
62 0.0090 319 18 https://search.usi.ch
337 0.0087 7 1 https://twitter.com/usisoftware
63 0.0080 303 19 https://desk.usi.ch
130 0.0077 25 0 https://www.swissuniversities.ch/it
54 0.0072 208 0 https://twitter.com/USI_university
323 0.0066 9 5 https://www.facebook.com/usiorientamento
150 0.0062 12 1 https://www.innosuisse.ch/inno/it/home.html
248 0.0061 10 1 https://www.facebook.com/usimdfc
106 0.0060 132 8 https://newsletter.usi.ch/archive/en
135 0.0057 201 0 https://schema.org
326 0.0057 6 1 https://www.facebook.com/usialloggimendrisio
322 0.0055 6 1 https://www.facebook.com/USImem
366 0.0054 6 1 https://www.instagram.com/usi_ics_lugano
212 0.0054 12 3 https://www.facebook.com/usimt
7 0.0051 211 32 https://search.usi.ch/it
6 0.0051 204 0 https://www.usi.ch/sites/all/themes/usiclean/img/bollino-usi.svg
14 0.0051 204 62 https://www.usi.ch/originalnode/342
15 0.0051 204 57 https://www.usi.ch/originalnode/358
16 0.0051 204 62 https://www.usi.ch/originalnode/343
17 0.0051 204 57 https://www.usi.ch/originalnode/344
18 0.0051 204 58 https://www.usi.ch/en/originalnode/12174
20 0.0051 204 60 https://www.usi.ch/originalnode/349
21 0.0051 204 62 https://www.usi.ch/originalnode/8996
22 0.0051 204 60 https://www.usi.ch/originalnode/348
23 0.0051 204 59 https://www.usi.ch/originalnode/351
24 0.0051 204 58 https://www.usi.ch/originalnode/350
25 0.0051 204 61 https://www.usi.ch/originalnode/353
27 0.0051 204 59 https://www.usi.ch/originalnode/8014
26 0.0051 204 58 https://www.usi.ch/en/originalnode/354
61 0.0051 204 0 https://www.usi.ch/sites/all/themes/usiclean/img/swissuniversities.svg
57 0.0050 188 9 https://newsletter.usi.ch/archive

BIN
mp1/files_data/run3spy.fig Normal file

Binary file not shown.

BIN
mp1/run1rank.pdf Normal file

Binary file not shown.

BIN
mp1/run1spy.pdf Normal file

Binary file not shown.

BIN
mp1/run2rank.pdf Normal file

Binary file not shown.

BIN
mp1/run2spy.pdf Normal file

Binary file not shown.

BIN
mp1/run3rank.pdf Normal file

Binary file not shown.

BIN
mp1/run3spy.pdf Normal file

Binary file not shown.

Binary file not shown.

View file

@ -1,5 +1,7 @@
\documentclass[unicode,11pt,a4paper,oneside,numbers=endperiod,openany]{scrartcl}
\usepackage{graphicx}
\usepackage{subcaption}
\usepackage{amsmath}
\input{assignment.sty}
\begin{document}
@ -57,10 +59,158 @@ is the corresponding eigenvalue, while if $x$ is an eigenvector approximation, f
\subsection{Other webgraphs [10 points]}
The provided PageRank MATLAB implementation was run 3 times on the starting websites \texttt{http://atelier.inf.usi.ch/~maggicl}, \texttt{https://www.iisbadoni.edu.it}, and \texttt{https://www.usi.ch}, with results listed respectively in Figure \ref{fig:run1}, Figure \ref{fig:run2} and Figure \ref{fig:run3}.
One patten that emerges on the first and third execution is the presence of 1s in the main diagonal. This indicates that several pages found have a link to themselves. Another interesting pattern, this time observable in all executions, is the presence of contiguous rectangular regions filled with 1s, especially along the main diagonal. This may be due to the presence of pages belonging to the same website, thus having a common layout and perhaps linking to a common set of internal (when near to the main diagonal) or external pages.
\begin{figure}[h]
\centering
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width = \textwidth]{run1spy}
\caption{Spy plot of connectivity matrix}
\end{subfigure}
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width = \textwidth]{run1rank}
\caption{Page rank bar graph}
\end{subfigure}
\begin{subfigure}{\textwidth}
\begin{verbatim}
360 0.0869 31 1 https://creativecommons.org/licenses/by-sa/3.0
204 0.0406 8 1 https://forum.gitlab.com
82 0.0189 117 18 https://www.mediawiki.org
81 0.0188 117 4 https://wikimediafoundation.org
87 0.0150 6 1 https://docs.gitea.io
78 0.0145 114 9 https://www.mediawiki.org/wiki/Special:MyLanguage/
How_to_contribute
77 0.0132 77 13 https://foundation.wikimedia.org/wiki/Privacy_policy
217 0.0127 40 8 https://bugs.archlinux.org
80 0.0115 107 6 https://foundation.wikimedia.org/wiki/Cookie_statement
215 0.0114 38 5 https://bbs.archlinux.org
\end{verbatim}
\caption{Top 10 webpages with highest PageRank}
\end{subfigure}
\label{fig:run1}
\caption{Results of first PageRank calculation (for starting website \texttt{http://atelier.inf.usi.ch/~maggicl/})}
\end{figure}
\begin{figure}[h]
\centering
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width = \textwidth]{run2spy}
\caption{Spy plot of connectivity matrix}
\end{subfigure}
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width = \textwidth]{run2rank}
\caption{Page rank bar graph}
\end{subfigure}
\begin{subfigure}{\textwidth}
\begin{verbatim}
411 0.0249 42 1 https://twitter.com/mozilla
63 0.0248 145 1 https://twitter.com/firefox
68 0.0203 142 1 https://www.instagram.com/firefox
412 0.0164 37 1 https://www.instagram.com/mozilla
62 0.0080 21 1 https://github.com/mozilla/kitsune
81 0.0070 110 2 https://www.apple.com
384 0.0064 5 1 https://www.xfinity.com/privacy/policy/dns
4 0.0064 32 0 https:
377 0.0059 19 1 https://abouthome-snippets-service.readthedocs.io/en/
latest/data_collection.html
393 0.0059 19 1 https://www.adjust.com/terms/privacy-policy
410 0.0057 16 1 https://wiki.mozilla.org/Firefox/Data_Collection
\end{verbatim}
\caption{Top 10 webpages with highest PageRank}
\end{subfigure}
\label{fig:run2}
\caption{Results of second PageRank calculation (for starting website \texttt{https://www.iisbadoni.edu.it/})}
\end{figure}
\begin{figure}[h]
\centering
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width = \textwidth]{run3spy}
\caption{Spy plot of connectivity matrix}
\end{subfigure}
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width = \textwidth]{run3rank}
\caption{Page rank bar graph}
\end{subfigure}
\begin{subfigure}{\textwidth}
\begin{verbatim}
55 0.0741 354 1 https://www.instagram.com/usiuniversity
53 0.0324 366 3 https://www.facebook.com/usiuniversity
299 0.0248 6 1 https://twitter.com/usi_en
329 0.0243 8 1 https://www.facebook.com/USIeLab
308 0.0156 7 3 https://www.facebook.com/USIFinancialCommunication
60 0.0155 316 2 https://www.swissuniversities.ch
424 0.0144 96 1 https://it.bul.sbu.usi.ch
330 0.0123 6 4 https://www.facebook.com/USI.ITDxC
320 0.0122 7 1 https://www.facebook.com/usiimeg
56 0.0107 320 0 https://www.youtube.com/usiuniversity
\end{verbatim}
\caption{Top 10 webpages with highest PageRank}
\end{subfigure}
\label{fig:run3}
\caption{Results of third PageRank calculation (for starting website \texttt{https://www.usi.ch/})}
\end{figure}
\subsection{Connectivity matrix and subcliques [10 points]}
\subsection{Connectivity matrix and disjoint subgraphs [10 points]}
\subsubsection{What is the connectivity matrix G (w.r.t figure 5)?}
The connectivity matrix G, with U being defined as $\{"alpha", "beta", "gamma", "delta", "rho", "sigma"\}$ is:
\[G = \begin{bmatrix}
0&0&0&1&0&0\\
1&0&0&0&0&0\\
1&1&0&0&0&0\\
0&1&1&0&0&0\\
0&0&0&0&0&1\\
0&0&0&0&1&0\\
\end{bmatrix}\]
\subsubsection{What are the PageRanks if the hyperlink transition probability $p$ is the default value 0.85?}
First we compute the matrix A, finding:
\[A = \frac1{40} \begin{bmatrix}
1 &1 &1 &35&1 &1 \\
18&1 &1 &1 &1 &1 \\
18&18&1 &1 &1 &1 \\
1 &18&35&1 &1 &1 \\
1 &1 &1 &1 &1 &35 \\
1 &1 &1 &1 &35&1 \\
\end{bmatrix}\]
We then find the eigenvectors and eigenvalues of A through MATLAB, finding that the solution of $A x = 1 x$ is:
\[x\approx\begin{bmatrix}
0.4771\\
0.2630\\
0.3747\\
0.4905\\
0.4013\\
0.4013\\
\end{bmatrix}\]
Thus the pageranks are the components of vector $x$, w.r.t. the order given in U.
\subsubsection{Describe what happens with this example to both the definition of PageRank and the computation done by pagerank in the limit $p \to 1$.}
If $p$ is closer to 1, then the probability a web user will visit a certain page randomly decreases, thus giving more weight in the computation of PageRank to the links between one page and another.
In the computation, increasing $p$ decreases $\delta$ (which represents the probability of a user randomly visiting a page), eventually making it 0 when $p$ is 1.
\subsection{PageRanks by solving a sparse linear system [50 points]}