diff --git a/mp1/files_data/ex2.m b/mp1/files_data/ex2.m new file mode 100644 index 0000000..64b3d02 --- /dev/null +++ b/mp1/files_data/ex2.m @@ -0,0 +1,22 @@ +%[U,G] = surfer('https://www.usi.ch',500); +% pagerank(U,G); + +% A = (1/40) * [1 1 1 35 1 1; +% 18 1 1 1 1 1; +% 18 18 1 1 1 1; +% 1 18 35 1 1 1; +% 1 1 1 1 1 35; +% 1 1 1 1 35 1]; +A = (1/40) * [ + 0 0 0 40 0 0; + 20 0 0 0 0 0; + 20 20 0 0 0 0; + 0 20 40 0 0 0; + 0 0 0 0 0 40; + 0 0 0 0 40 0]; + +[v,d] = eig(A); + +display(d); +display(v); + diff --git a/mp1/files_data/run1.mat b/mp1/files_data/run1.mat new file mode 100644 index 0000000..da0d413 Binary files /dev/null and b/mp1/files_data/run1.mat differ diff --git a/mp1/files_data/run1rank.fig b/mp1/files_data/run1rank.fig new file mode 100644 index 0000000..3ba6de6 Binary files /dev/null and b/mp1/files_data/run1rank.fig differ diff --git a/mp1/files_data/run1rank.txt b/mp1/files_data/run1rank.txt new file mode 100644 index 0000000..593bc17 --- /dev/null +++ b/mp1/files_data/run1rank.txt @@ -0,0 +1,32 @@ +page-rank in out url +360 0.0869 31 1 https://creativecommons.org/licenses/by-sa/3.0 +204 0.0406 8 1 https://forum.gitlab.com +82 0.0189 117 18 https://www.mediawiki.org +81 0.0188 117 4 https://wikimediafoundation.org +87 0.0150 6 1 https://docs.gitea.io +78 0.0145 114 9 https://www.mediawiki.org/wiki/Special:MyLanguage/How_to_contribute +77 0.0132 77 13 https://foundation.wikimedia.org/wiki/Privacy_policy +217 0.0127 40 8 https://bugs.archlinux.org +80 0.0115 107 6 https://foundation.wikimedia.org/wiki/Cookie_statement +215 0.0114 38 5 https://bbs.archlinux.org +216 0.0114 38 8 https://wiki.archlinux.org +218 0.0114 38 6 https://security.archlinux.org +428 0.0107 9 1 https://www.dnb.de/kataloghilfe +219 0.0102 38 7 https://aur.archlinux.org +359 0.0098 9 1 https://creativecommons.org/publicdomain/zero/1.0 +366 0.0092 27 21 https://archive.org +446 0.0089 24 5 https://foundation.wikimedia.org/wiki/Terms_of_Use +83 0.0079 78 0 https:\/\/schema.org +85 0.0074 77 0 https:\/\/www.wikimedia.org\/static\/images\/wmf-hor-googpub.png +181 0.0066 8 2 https://gitlab.com +95 0.0062 2 1 https://www.enable-javascript.com +113 0.0061 13 1 https://www.britannica.com/topic/polenta +417 0.0058 8 1 https://www.dnb.de/DE/Home/home_node.html +429 0.0058 8 1 https://www.dnb.de/EN/Home/home_node.html +432 0.0058 8 1 https://www.dnb.de/expertensuche +379 0.0057 24 2 https://blog.archive.org +213 0.0057 32 7 https://www.archlinux.org +99 0.0056 3 1 https://www.usi.ch/it +19 0.0051 4 1 https://creativecommons.org/licenses/by-nc-sa/4.0 +214 0.0050 31 5 https://www.archlinux.org/packages +220 0.0050 31 6 https://www.archlinux.org/download diff --git a/mp1/files_data/run2.mat b/mp1/files_data/run2.mat new file mode 100644 index 0000000..8066d07 Binary files /dev/null and b/mp1/files_data/run2.mat differ diff --git a/mp1/files_data/run2rank.fig b/mp1/files_data/run2rank.fig new file mode 100644 index 0000000..d2598d6 Binary files /dev/null and b/mp1/files_data/run2rank.fig differ diff --git a/mp1/files_data/run2rank.txt b/mp1/files_data/run2rank.txt new file mode 100644 index 0000000..73bcea5 --- /dev/null +++ b/mp1/files_data/run2rank.txt @@ -0,0 +1,23 @@ + page-rank in out url +411 0.0249 42 1 https://twitter.com/mozilla +63 0.0248 145 1 https://twitter.com/firefox +68 0.0203 142 1 https://www.instagram.com/firefox +412 0.0164 37 1 https://www.instagram.com/mozilla +62 0.0080 21 1 https://github.com/mozilla/kitsune +81 0.0070 110 2 https://www.apple.com +384 0.0064 5 1 https://www.xfinity.com/privacy/policy/dns +4 0.0064 32 0 https: +377 0.0059 19 1 https://abouthome-snippets-service.readthedocs.io/en/latest/data_collection.html 1 +393 0.0059 19 1 https://www.adjust.com/terms/privacy-policy +410 0.0057 16 1 https://wiki.mozilla.org/Firefox/Data_Collection +400 0.0057 15 1 https://yandex.ru/legal/confidential +396 0.0057 15 1 https://github.com/mozilla-mobile/firefox-ios/blob/master/Docs/MMA.md +5 0.0056 31 0 https://ssl +3 0.0054 36 0 https://www.iisbadoni.edu.it/sites/default/files/favicon.ico +6 0.0054 36 0 https://www.iisbadoni.edu.it/sites/default/files/logo.png +208 0.0054 159 0 https://schema.org +74 0.0052 178 5 https://foundation.mozilla.org +72 0.0052 33 32 https://www.mozilla.org/privacy/websites/#cookies +23 0.0051 2 1 https://www.iisbadoni.edu.it/mad +300 0.0051 157 0 https://accounts.firefox.com + diff --git a/mp1/files_data/run3.mat b/mp1/files_data/run3.mat new file mode 100644 index 0000000..0a63dd8 Binary files /dev/null and b/mp1/files_data/run3.mat differ diff --git a/mp1/files_data/run3rank.fig b/mp1/files_data/run3rank.fig new file mode 100644 index 0000000..815e55a Binary files /dev/null and b/mp1/files_data/run3rank.fig differ diff --git a/mp1/files_data/run3rank.txt b/mp1/files_data/run3rank.txt new file mode 100644 index 0000000..2d419eb --- /dev/null +++ b/mp1/files_data/run3rank.txt @@ -0,0 +1,43 @@ +page-rank in out url +55 0.0741 354 1 https://www.instagram.com/usiuniversity +53 0.0324 366 3 https://www.facebook.com/usiuniversity +299 0.0248 6 1 https://twitter.com/usi_en +329 0.0243 8 1 https://www.facebook.com/USIeLab +308 0.0156 7 3 https://www.facebook.com/USIFinancialCommunication +60 0.0155 316 2 https://www.swissuniversities.ch +424 0.0144 96 1 https://it.bul.sbu.usi.ch +330 0.0123 6 4 https://www.facebook.com/USI.ITDxC +320 0.0122 7 1 https://www.facebook.com/usiimeg +56 0.0107 320 0 https://www.youtube.com/usiuniversity +5 0.0096 317 71 https://usi.ch +62 0.0090 319 18 https://search.usi.ch +337 0.0087 7 1 https://twitter.com/usisoftware +63 0.0080 303 19 https://desk.usi.ch +130 0.0077 25 0 https://www.swissuniversities.ch/it +54 0.0072 208 0 https://twitter.com/USI_university +323 0.0066 9 5 https://www.facebook.com/usiorientamento +150 0.0062 12 1 https://www.innosuisse.ch/inno/it/home.html +248 0.0061 10 1 https://www.facebook.com/usimdfc +106 0.0060 132 8 https://newsletter.usi.ch/archive/en +135 0.0057 201 0 https://schema.org +326 0.0057 6 1 https://www.facebook.com/usialloggimendrisio +322 0.0055 6 1 https://www.facebook.com/USImem +366 0.0054 6 1 https://www.instagram.com/usi_ics_lugano +212 0.0054 12 3 https://www.facebook.com/usimt +7 0.0051 211 32 https://search.usi.ch/it +6 0.0051 204 0 https://www.usi.ch/sites/all/themes/usiclean/img/bollino-usi.svg +14 0.0051 204 62 https://www.usi.ch/originalnode/342 +15 0.0051 204 57 https://www.usi.ch/originalnode/358 +16 0.0051 204 62 https://www.usi.ch/originalnode/343 +17 0.0051 204 57 https://www.usi.ch/originalnode/344 +18 0.0051 204 58 https://www.usi.ch/en/originalnode/12174 +20 0.0051 204 60 https://www.usi.ch/originalnode/349 +21 0.0051 204 62 https://www.usi.ch/originalnode/8996 +22 0.0051 204 60 https://www.usi.ch/originalnode/348 +23 0.0051 204 59 https://www.usi.ch/originalnode/351 +24 0.0051 204 58 https://www.usi.ch/originalnode/350 +25 0.0051 204 61 https://www.usi.ch/originalnode/353 +27 0.0051 204 59 https://www.usi.ch/originalnode/8014 +26 0.0051 204 58 https://www.usi.ch/en/originalnode/354 +61 0.0051 204 0 https://www.usi.ch/sites/all/themes/usiclean/img/swissuniversities.svg +57 0.0050 188 9 https://newsletter.usi.ch/archive diff --git a/mp1/files_data/run3spy.fig b/mp1/files_data/run3spy.fig new file mode 100644 index 0000000..e4d5394 Binary files /dev/null and b/mp1/files_data/run3spy.fig differ diff --git a/mp1/run1rank.pdf b/mp1/run1rank.pdf new file mode 100644 index 0000000..946a66c Binary files /dev/null and b/mp1/run1rank.pdf differ diff --git a/mp1/run1spy.pdf b/mp1/run1spy.pdf new file mode 100644 index 0000000..c856cdc Binary files /dev/null and b/mp1/run1spy.pdf differ diff --git a/mp1/run2rank.pdf b/mp1/run2rank.pdf new file mode 100644 index 0000000..08bcdc9 Binary files /dev/null and b/mp1/run2rank.pdf differ diff --git a/mp1/run2spy.pdf b/mp1/run2spy.pdf new file mode 100644 index 0000000..f15442a Binary files /dev/null and b/mp1/run2spy.pdf differ diff --git a/mp1/run3rank.pdf b/mp1/run3rank.pdf new file mode 100644 index 0000000..2526fe4 Binary files /dev/null and b/mp1/run3rank.pdf differ diff --git a/mp1/run3spy.pdf b/mp1/run3spy.pdf new file mode 100644 index 0000000..6c5e4ed Binary files /dev/null and b/mp1/run3spy.pdf differ diff --git a/mp1/template.pdf b/mp1/template.pdf index 51e12fd..c37cfa2 100644 Binary files a/mp1/template.pdf and b/mp1/template.pdf differ diff --git a/mp1/template.tex b/mp1/template.tex index 389f539..9519b43 100644 --- a/mp1/template.tex +++ b/mp1/template.tex @@ -1,5 +1,7 @@ \documentclass[unicode,11pt,a4paper,oneside,numbers=endperiod,openany]{scrartcl} - +\usepackage{graphicx} +\usepackage{subcaption} +\usepackage{amsmath} \input{assignment.sty} \begin{document} @@ -57,10 +59,158 @@ is the corresponding eigenvalue, while if $x$ is an eigenvector approximation, f \subsection{Other webgraphs [10 points]} +The provided PageRank MATLAB implementation was run 3 times on the starting websites \texttt{http://atelier.inf.usi.ch/~maggicl}, \texttt{https://www.iisbadoni.edu.it}, and \texttt{https://www.usi.ch}, with results listed respectively in Figure \ref{fig:run1}, Figure \ref{fig:run2} and Figure \ref{fig:run3}. + +One patten that emerges on the first and third execution is the presence of 1s in the main diagonal. This indicates that several pages found have a link to themselves. Another interesting pattern, this time observable in all executions, is the presence of contiguous rectangular regions filled with 1s, especially along the main diagonal. This may be due to the presence of pages belonging to the same website, thus having a common layout and perhaps linking to a common set of internal (when near to the main diagonal) or external pages. + +\begin{figure}[h] +\centering +\begin{subfigure}{0.49\textwidth} +\centering +\includegraphics[width = \textwidth]{run1spy} +\caption{Spy plot of connectivity matrix} +\end{subfigure} +\begin{subfigure}{0.49\textwidth} +\centering +\includegraphics[width = \textwidth]{run1rank} +\caption{Page rank bar graph} +\end{subfigure} +\begin{subfigure}{\textwidth} +\begin{verbatim} + +360 0.0869 31 1 https://creativecommons.org/licenses/by-sa/3.0 +204 0.0406 8 1 https://forum.gitlab.com +82 0.0189 117 18 https://www.mediawiki.org +81 0.0188 117 4 https://wikimediafoundation.org +87 0.0150 6 1 https://docs.gitea.io +78 0.0145 114 9 https://www.mediawiki.org/wiki/Special:MyLanguage/ + How_to_contribute +77 0.0132 77 13 https://foundation.wikimedia.org/wiki/Privacy_policy +217 0.0127 40 8 https://bugs.archlinux.org +80 0.0115 107 6 https://foundation.wikimedia.org/wiki/Cookie_statement +215 0.0114 38 5 https://bbs.archlinux.org +\end{verbatim} +\caption{Top 10 webpages with highest PageRank} +\end{subfigure} +\label{fig:run1} +\caption{Results of first PageRank calculation (for starting website \texttt{http://atelier.inf.usi.ch/~maggicl/})} +\end{figure} + +\begin{figure}[h] +\centering +\begin{subfigure}{0.49\textwidth} +\centering +\includegraphics[width = \textwidth]{run2spy} +\caption{Spy plot of connectivity matrix} +\end{subfigure} +\begin{subfigure}{0.49\textwidth} +\centering +\includegraphics[width = \textwidth]{run2rank} +\caption{Page rank bar graph} +\end{subfigure} +\begin{subfigure}{\textwidth} +\begin{verbatim} + +411 0.0249 42 1 https://twitter.com/mozilla +63 0.0248 145 1 https://twitter.com/firefox +68 0.0203 142 1 https://www.instagram.com/firefox +412 0.0164 37 1 https://www.instagram.com/mozilla +62 0.0080 21 1 https://github.com/mozilla/kitsune +81 0.0070 110 2 https://www.apple.com +384 0.0064 5 1 https://www.xfinity.com/privacy/policy/dns +4 0.0064 32 0 https: +377 0.0059 19 1 https://abouthome-snippets-service.readthedocs.io/en/ + latest/data_collection.html +393 0.0059 19 1 https://www.adjust.com/terms/privacy-policy +410 0.0057 16 1 https://wiki.mozilla.org/Firefox/Data_Collection +\end{verbatim} +\caption{Top 10 webpages with highest PageRank} +\end{subfigure} +\label{fig:run2} +\caption{Results of second PageRank calculation (for starting website \texttt{https://www.iisbadoni.edu.it/})} +\end{figure} + +\begin{figure}[h] +\centering +\begin{subfigure}{0.49\textwidth} +\centering +\includegraphics[width = \textwidth]{run3spy} +\caption{Spy plot of connectivity matrix} +\end{subfigure} +\begin{subfigure}{0.49\textwidth} +\centering +\includegraphics[width = \textwidth]{run3rank} +\caption{Page rank bar graph} +\end{subfigure} +\begin{subfigure}{\textwidth} +\begin{verbatim} + +55 0.0741 354 1 https://www.instagram.com/usiuniversity +53 0.0324 366 3 https://www.facebook.com/usiuniversity +299 0.0248 6 1 https://twitter.com/usi_en +329 0.0243 8 1 https://www.facebook.com/USIeLab +308 0.0156 7 3 https://www.facebook.com/USIFinancialCommunication +60 0.0155 316 2 https://www.swissuniversities.ch +424 0.0144 96 1 https://it.bul.sbu.usi.ch +330 0.0123 6 4 https://www.facebook.com/USI.ITDxC +320 0.0122 7 1 https://www.facebook.com/usiimeg +56 0.0107 320 0 https://www.youtube.com/usiuniversity +\end{verbatim} +\caption{Top 10 webpages with highest PageRank} +\end{subfigure} +\label{fig:run3} +\caption{Results of third PageRank calculation (for starting website \texttt{https://www.usi.ch/})} +\end{figure} + \subsection{Connectivity matrix and subcliques [10 points]} \subsection{Connectivity matrix and disjoint subgraphs [10 points]} +\subsubsection{What is the connectivity matrix G (w.r.t figure 5)?} + +The connectivity matrix G, with U being defined as $\{"alpha", "beta", "gamma", "delta", "rho", "sigma"\}$ is: + +\[G = \begin{bmatrix} + 0&0&0&1&0&0\\ + 1&0&0&0&0&0\\ + 1&1&0&0&0&0\\ + 0&1&1&0&0&0\\ + 0&0&0&0&0&1\\ + 0&0&0&0&1&0\\ +\end{bmatrix}\] + +\subsubsection{What are the PageRanks if the hyperlink transition probability $p$ is the default value 0.85?} + +First we compute the matrix A, finding: + +\[A = \frac1{40} \begin{bmatrix} + 1 &1 &1 &35&1 &1 \\ + 18&1 &1 &1 &1 &1 \\ + 18&18&1 &1 &1 &1 \\ + 1 &18&35&1 &1 &1 \\ + 1 &1 &1 &1 &1 &35 \\ + 1 &1 &1 &1 &35&1 \\ +\end{bmatrix}\] + +We then find the eigenvectors and eigenvalues of A through MATLAB, finding that the solution of $A x = 1 x$ is: + +\[x\approx\begin{bmatrix} + 0.4771\\ + 0.2630\\ + 0.3747\\ + 0.4905\\ + 0.4013\\ + 0.4013\\ +\end{bmatrix}\] + +Thus the pageranks are the components of vector $x$, w.r.t. the order given in U. + +\subsubsection{Describe what happens with this example to both the definition of PageRank and the computation done by pagerank in the limit $p \to 1$.} + +If $p$ is closer to 1, then the probability a web user will visit a certain page randomly decreases, thus giving more weight in the computation of PageRank to the links between one page and another. + +In the computation, increasing $p$ decreases $\delta$ (which represents the probability of a user randomly visiting a page), eventually making it 0 when $p$ is 1. + \subsection{PageRanks by solving a sparse linear system [50 points]}