This repository has been archived on 2021-09-27. You can view files and clone it, but cannot push or open issues or pull requests.
NC/mp2/project_2_Maggioni_Claudio.tex

492 lines
18 KiB
TeX
Raw Normal View History

2020-09-29 11:58:49 +00:00
\documentclass[unicode,11pt,a4paper,oneside,numbers=endperiod,openany]{scrartcl}
2020-10-04 15:49:37 +00:00
\usepackage{graphicx}
\usepackage{subcaption}
\usepackage{amsmath}
2020-09-29 11:58:49 +00:00
\input{assignment.sty}
2020-10-07 13:39:10 +00:00
\usepackage{pgfplots}
\pgfplotsset{compat=newest}
\usetikzlibrary{plotmarks}
\usetikzlibrary{arrows.meta}
\usepgfplotslibrary{patchplots}
\usepackage{grffile}
\usepackage{amsmath}
2020-09-29 11:58:49 +00:00
\hyphenation{PageRank}
\hyphenation{PageRanks}
\begin{document}
\setassignment
\setduedate{Wednesday, 14 October 2020, 11:55 PM}
2020-10-04 15:49:37 +00:00
\serieheader{Numerical Computing}{2020}{Student: Claudio Maggioni}{Discussed with: FULL NAME}{Solution for Project 2}{}
2020-09-29 11:58:49 +00:00
\newline
\assignmentpolicy
2020-10-04 15:49:37 +00:00
The purpose of this assignment\footnote{This document is originally
based on a blog from Cleve Moler, who wrote a fantastic blog post about the Lake Arrowhead graph, and John
Gilbert, who initially created the coauthor graph from the 1993 Householder Meeting. You can find more information
2020-09-29 11:58:49 +00:00
at \url{http://blogs.mathworks.com/cleve/2013/06/10/lake-arrowhead-coauthor-graph/}. Most of this assignment is derived
2020-10-04 15:49:37 +00:00
from this archived work.} is to learn the importance of sparse linear algebra algorithms to solve fundamental
2020-09-29 11:58:49 +00:00
questions in social network analyses.
We will use the coauthor graph from the Householder Meeting and the social network of friendships from Zachary's karate club~\cite{karate}.
These two graphs are one of the first examples where matrix methods were used in computational social network analyses.
\section{The Reverse Cuthill McKee Ordering [10 points]}
2020-10-04 15:49:37 +00:00
The Reverse Cuthill McKee Ordering of matrix \texttt{A\_SymPosDef} is computed with MATLAB's \texttt{sysrcm(\ldots)} and
the matrix is rearranged accordingly. Here are the spy plot of these matrices:
\begin{figure}[h]
\centering
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width = \textwidth]{1_spy_a}
\caption{Spy plot of \texttt{A\_SymPosDef}}
\end{subfigure}
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width = \textwidth]{1_spy_rcm}
\caption{Spy plot of \texttt{sysrcm(\ldots)} rearranged version of \texttt{A\_SymPosDef}}
\end{subfigure}
\caption{Spy plots of the two matrices}
\label{fig:1}
\end{figure}
And the spy plots of the corresponding Cholesky factor are listed in figure~\ref{fig:1chol}.
\begin{figure}[h]
\centering
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width = \textwidth]{1_spy_chol_a}
\caption{Spy plot of \texttt{chol(A\_SymPosDef)}}
\end{subfigure}
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width = \textwidth]{1_spy_chol_rcm}
\caption{Spy plot of \texttt{chol(A\_SymPosDef(sysrcm(A\_SymPosDef), sysrcm(A\_SymPosDef)))}}
\end{subfigure}
\caption{Spy plots of the two Cholesky factors}
\label{fig:1chol}
\end{figure}
2020-10-07 13:39:10 +00:00
The number of nonzero elements in the Cholesky factor of the RCM optimized
matrix are significantly lower (circa 0.1x) of the ones in the vanilla process.
The respective nonzero counts can be found in figure~\ref{fig:1chol}.
2020-10-04 15:49:37 +00:00
2020-09-29 11:58:49 +00:00
\section{Sparse Matrix Factorization [10 points]}
2020-10-04 15:49:37 +00:00
\subsection{Show that $A \in R^{n x n}$ has exactly $5n - 6$ nonzero elements.}
The given description of $A$ says that all the element at the edges of the
matrix (rows and columns 1 and $n$) plus all the elements on the main diagonal
are the only nonzero elements of $A$. Therefore, this cells can be counted as
the 4 vertex cells in the matrix square plus 5 $n-2$-long segments,
corresponding to all edges and the main diagonal. Therefore:
\[4 + 5 \dot (n - 2) = 5n - 6\]
2020-10-07 13:39:10 +00:00
\subsection{Write a short Matlab script to construct this matrix and visualize
its non-zero structure(you can use, e.g., the command \texttt{spy()}).}
The MATLAB script can be found in file \texttt{ex3.m}.
Here is a spy plot of the nonzero values of $A$, for $n = 5$:
\centering{\input{ex2_2_spy.tex}}
2020-10-04 15:49:37 +00:00
The matrix $A \in R^{n x n}$ looks like this (zero entries are represented as
blanks):
\[
A := \begin{bmatrix}
2020-10-06 11:58:55 +00:00
n & 1 & 1 & \hdots & 1 \\
1 & n + 1 & && 1 \\
1 & & n + 2 && 1 \\
2020-10-04 15:49:37 +00:00
\vdots & & & \ddots & \vdots \\
2020-10-06 11:58:55 +00:00
1 & 1 & 1 & \hdots & 2n - 1 \\
2020-10-04 15:49:37 +00:00
\end{bmatrix}
\]
2020-10-07 13:39:10 +00:00
\subsection{Using again the \texttt{spy()} command, visualize side by side the
original matrix $A$ and the result of the Cholesky factorization (\texttt{chol()}
in Matlab). Then explain why for n = $100000$ using Matlabs \texttt{chol(\ldots)}
2020-10-04 15:49:37 +00:00
to solve $Ax = b$
for a given righthand-side vector would be problematic.}
2020-10-07 13:39:10 +00:00
Here is the plot of \texttt{spy(A)} (on the left) and \texttt{chol(spy(A))} (on
the right).
\centering{\input{ex2_3_spy.tex}}
2020-10-06 11:58:55 +00:00
Solving $Ax = b$ would be a costly operation since the a Cholesky
decomposition of matrix $A$ (performed using MATLAB's \texttt{chol(\ldots)})
would drastically reduce the number of zero elements in the matrix in the very
first iteration. This is due to the fact that the first row, by definition, is
made of of only nonzero elements (namely 1s) and by subtracting the first row to
every other row (as what would effectively happen in the first iteration of the
Cholesky decomposition of A) the zero elements would become (negative) nonzero
elements, thus making all columns but the first almost empty of 0s.
2020-10-04 15:49:37 +00:00
2020-09-29 11:58:49 +00:00
\section{Degree Centrality [10 points]}
2020-10-04 15:49:37 +00:00
Assuming that the degree of the Householder graph is the number of co-authors of
each author and that an author is not co-author of himself, the degree
centralities of all authors sorted in descending order are below.
2020-10-06 12:43:53 +00:00
This output has been obtained by running \texttt{ex3.m}.
2020-10-04 15:49:37 +00:00
\begin{verbatim}
Author Centrality: Coauthors...
Golub 31: Wilkinson TChan Varah Overton Ernst VanLoan Saunders Bojanczyk
Dubrulle George Nachtigal Kahan Varga Kagstrom Widlund
OLeary Bjorck Eisenstat Zha VanDooren Tang Reichel Luk Fischer
Gutknecht Heath Plemmons Berry Sameh Meyer Gill
Demmel 15: Edelman VanLoan Bai Schreiber Kahan Kagstrom Barlow
NHigham Arioli Duff Hammarling Bunch Heath Greenbaum Gragg
Plemmons 13: Golub Nagy Harrod Pan Funderlic Bojanczyk George Barlow Heath
Berry Sameh Meyer Nichols
Heath 12: Golub TChan Funderlic George Gilbert Eisenstat Ng Liu Laub Plemmons
Paige Demmel
Schreiber 12: TChan VanLoan Moler Gilbert Pothen NTrefethen Bjorstad NHigham
Eisenstat Tang Elden Demmel
Hammarling 10: Wilkinson Kaufman Bai Bjorck VanHuffel VanDooren Duff Greenbaum
Gill Demmel
VanDooren 10: Golub Boley Bojanczyk Kagstrom VanHuffel Luk Hammarling Laub
Nichols Paige
TChan 10: Golub Saied Ong Kuo Tong Schreiber Arioli Duff Heath Hansen
Gragg 9: Borges Kaufman Harrod Reichel Stewart BunseGerstner Ammar Warner Demmel
Moler 8: Wilkinson VanLoan Gilbert Schreiber Henrici Stewart Bunch Laub
VanLoan 8: Golub Moler Schreiber Kagstrom Luk Bunch Paige Demmel
Paige 7: Anjos VanLoan Saunders Bjorck VanDooren Laub Heath
Gutknecht 7: Golub Ashby Boley NTrefethen Nachtigal Varga Hochbruck
Luk 7: Golub Overton Boley VanLoan Bojanczyk Park VanDooren
Eisenstat 7: Golub Gu George Schreiber Liu Heath Ipsen
George 7: Golub Eisenstat Ng Liu Tang Heath Plemmons
Meyer 6: Golub Benzi Funderlic Stewart Ipsen Plemmons
Bunch 6: LeBorne Fierro VanLoan Moler Stewart Demmel
Stewart 6: Moler Bunch Gragg Meyer Gill Mathias
Reichel 6: Golub NTrefethen Nachtigal Fischer Gragg Ammar
Bjorck 6: Golub Park Duff Hammarling Elden Paige
NTrefethen 6: Schreiber Nachtigal Reichel Gutknecht Greenbaum ATrefethen
Nichols 5: Byers Barlow VanDooren Plemmons BunseGerstner
Greenbaum 5: Cullum Strakos NTrefethen Hammarling Demmel
Ipsen 5: Chandrasekaran Barlow Eisenstat Meyer Jessup
Laub 5: Kenney Moler VanDooren Heath Paige
Duff 5: TChan Bjorck Arioli Hammarling Demmel
Liu 5: George Gilbert Eisenstat Ng Heath
Park 5: Boley Bjorck VanHuffel Luk Elden
Zha 5: Golub Bai Barlow VanHuffel Hansen
Widlund 5: Golub Bjorstad OLeary Smith Szyld
Barlow 5: Zha Ipsen Plemmons Nichols Demmel
Kagstrom 5: Golub VanLoan VanDooren Ruhe Demmel
Varga 5: Golub Marek Young Gutknecht Starke
Gilbert 5: Moler Schreiber Ng Liu Heath
Gill 4: Golub Saunders Hammarling Stewart
Sameh 4: Golub Harrod Plemmons Berry
Berry 4: Golub Harrod Plemmons Sameh
BunseGerstner 4: He Byers Gragg Nichols
Hansen 4: TChan Fierro OLeary Zha
Ng 4: George Gilbert Liu Heath
Arioli 4: TChan MuntheKaas Duff Demmel
VanHuffel 4: Zha Park VanDooren Hammarling
Nachtigal 4: Golub NTrefethen Reichel Gutknecht
Bojanczyk 4: Golub VanDooren Luk Plemmons
Harrod 4: Plemmons Gragg Berry Sameh
Boley 4: Park VanDooren Luk Gutknecht
Wilkinson 4: Golub Dubrulle Moler Hammarling
Ammar 3: He Reichel Gragg
Elden 3: Schreiber Bjorck Park
Fischer 3: Golub Modersitzki Reichel
Tang 3: Golub George Schreiber
NHigham 3: Schreiber Pothen Demmel
OLeary 3: Golub Widlund Hansen
Bjorstad 3: Schreiber Widlund Boman
Kahan 3: Golub Davis Demmel
Bai 3: Zha Hammarling Demmel
Saunders 3: Golub Paige Gill
Funderlic 3: Heath Plemmons Meyer
Kaufman 3: Hammarling Gragg Warner
Starke 2: Varga Hochbruck
Hochbruck 2: Gutknecht Starke
Jessup 2: Crevelli Ipsen
Warner 2: Kaufman Gragg
Ruhe 2: Wold Kagstrom
Szyld 2: Marek Widlund
Young 2: Kincaid Varga
Pothen 2: Schreiber NHigham
Tong 2: TChan Kuo
Kuo 2: TChan Tong
Marek 2: Varga Szyld
Dubrulle 2: Golub Wilkinson
Fierro 2: Bunch Hansen
Byers 2: BunseGerstner Nichols
Overton 2: Golub Luk
He 2: BunseGerstner Ammar
Mathias 1: Stewart
Davis 1: Kahan
ATrefethen 1: NTrefethen
Henrici 1: Moler
Smith 1: Widlund
MuntheKaas 1: Arioli
Boman 1: Bjorstad
Chandrasekaran 1: Ipsen
Wold 1: Ruhe
Ong 1: TChan
Saied 1: TChan
Strakos 1: Greenbaum
Cullum 1: Greenbaum
Edelman 1: Demmel
Pan 1: Plemmons
Nagy 1: Plemmons
Gu 1: Eisenstat
Benzi 1: Meyer
Anjos 1: Paige
Crevelli 1: Jessup
Kincaid 1: Young
Borges 1: Gragg
Ernst 1: Golub
Modersitzki 1: Fischer
LeBorne 1: Bunch
Ashby 1: Gutknecht
Kenney 1: Laub
Varah 1: Golub
\end{verbatim}
2020-09-29 11:58:49 +00:00
\section{The Connectivity of the Coauthors [10 points]}
2020-10-06 11:58:55 +00:00
The author indexes of the common authors between the author at index $i$ and the
author at index $j$ can be computed by listing the indexes of the nonzero
elements in the Schur product (or element-wise product) between $A_{:,i}$ and
$A_{:,j}$ (respectively the i-th and j-th column vector of $A$). Therefore the set $C$ of common coauthor's indexes can be defined
as:
\[C = \{i \in N_0 \;|\; (A_{:,i} \odot A_{:,j})_i = 1\}\]
2020-10-06 12:43:53 +00:00
The results below were computing by using the script \texttt{ex4.m}.
2020-10-06 11:58:55 +00:00
The common Co-authors between Golub and Moler are Wilkinson and Van Loan.
The common Co-authors between Golub and Saunders are Golub, Saunders and Gill.
The common Co-authors between TChan and Demmel are Schreiber, Arioli, Duff and
Heath.
2020-09-29 11:58:49 +00:00
\section{PageRank of the Coauthor Graph [10 points]}
2020-10-06 12:43:53 +00:00
The PageRank values for all authors were computing by using the scripts
\texttt{ex5.m} and \texttt{pagerank.m}, a basically identical version of
\texttt{pagerank.m} from Mini Project 1. The output is shown below.
\begin{verbatim}
page-rank in out author
1 0.0511 32 32 Golub
104 0.0261 16 16 Demmel
86 0.0229 14 14 Plemmons
44 0.0212 13 13 Schreiber
3 0.0201 11 11 TChan
81 0.0198 13 13 Heath
90 0.0181 10 10 Gragg
74 0.0177 11 11 Hammarling
66 0.0171 11 11 VanDooren
42 0.0152 9 9 Moler
79 0.0151 8 8 Gutknecht
32 0.0142 9 9 VanLoan
59 0.0135 8 8 Eisenstat
98 0.0133 8 8 Paige
46 0.0130 7 7 NTrefethen
49 0.0129 6 6 Varga
96 0.0128 7 7 Meyer
77 0.0128 7 7 Stewart
73 0.0127 8 8 Luk
78 0.0127 7 7 Bunch
53 0.0127 6 6 Widlund
72 0.0125 7 7 Reichel
41 0.0125 8 8 George
82 0.0124 6 6 Ipsen
83 0.0122 6 6 Greenbaum
58 0.0113 7 7 Bjorck
97 0.0107 6 6 Nichols
51 0.0106 6 6 Kagstrom
80 0.0106 6 6 Laub
52 0.0104 6 6 Barlow
60 0.0103 6 6 Zha
69 0.0102 6 6 Duff
62 0.0100 6 6 Park
89 0.0099 5 5 BunseGerstner
63 0.0098 5 5 Arioli
43 0.0097 6 6 Gilbert
67 0.0096 6 6 Liu
87 0.0096 5 5 Hansen
47 0.0090 5 5 Nachtigal
54 0.0090 4 4 Bjorstad
2 0.0088 5 5 Wilkinson
23 0.0088 5 5 Harrod
99 0.0087 5 5 Gill
92 0.0086 5 5 Sameh
91 0.0086 5 5 Berry
15 0.0086 5 5 Boley
76 0.0085 4 4 Fischer
50 0.0085 3 3 Young
61 0.0084 5 5 VanHuffel
100 0.0084 3 3 Jessup
48 0.0083 4 4 Kahan
35 0.0083 5 5 Bojanczyk
65 0.0082 5 5 Ng
93 0.0082 4 4 Ammar
55 0.0079 4 4 OLeary
84 0.0079 3 3 Ruhe
19 0.0078 4 4 Kaufman
56 0.0076 4 4 NHigham
37 0.0075 3 3 Marek
75 0.0075 3 3 Szyld
103 0.0074 3 3 Starke
34 0.0072 4 4 Saunders
25 0.0072 4 4 Funderlic
39 0.0072 4 4 Bai
102 0.0072 3 3 Hochbruck
88 0.0071 4 4 Elden
71 0.0070 4 4 Tang
38 0.0069 3 3 Kuo
40 0.0069 3 3 Tong
4 0.0068 3 3 He
13 0.0067 2 2 Kincaid
14 0.0067 2 2 Crevelli
94 0.0065 3 3 Warner
17 0.0065 3 3 Byers
21 0.0064 3 3 Fierro
31 0.0064 2 2 Wold
45 0.0062 3 3 Pothen
36 0.0060 3 3 Dubrulle
57 0.0058 2 2 Boman
10 0.0058 3 3 Overton
9 0.0057 2 2 Modersitzki
68 0.0056 2 2 Smith
95 0.0056 2 2 Davis
33 0.0056 2 2 Chandrasekaran
27 0.0055 2 2 Cullum
28 0.0055 2 2 Strakos
64 0.0054 2 2 MuntheKaas
7 0.0053 2 2 Ashby
85 0.0053 2 2 ATrefethen
29 0.0052 2 2 Saied
30 0.0052 2 2 Ong
18 0.0052 2 2 Benzi
101 0.0052 2 2 Mathias
8 0.0052 2 2 LeBorne
12 0.0052 2 2 Borges
6 0.0051 2 2 Kenney
70 0.0050 2 2 Henrici
\end{verbatim}
2020-09-29 11:58:49 +00:00
\section{Zachary's karate club: social network of friendships between 34 members [50 points]}
\subsection{Write a Matlab code that ranks the five nodes with the largest
degree centrality? What are their degrees?}
Results found here can be computed using the file \texttt{ex6.m}.
Please find the top 5 nodes by degree centrality, with their degree and their
neighbours listed below:
\begin{verbatim}
Node Degree: Neighbours...
34 16: 9, 10, 14, 15, 16, 19, 20, 21, 23, 24, 27, 28, 29, 30, 31, 32, 33,
1 15: 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 18, 20, 22, 32,
33 11: 3, 9, 15, 16, 19, 21, 23, 24, 30, 31, 32, 34,
3 9: 1, 2, 4, 8, 9, 10, 14, 28, 29, 33,
2 8: 1, 3, 4, 8, 14, 18, 20, 22, 31,
\end{verbatim}
\subsection{Rank the five nodes with the largest eigenvector centrality. What are
their (properly normalized) eigenvector centralities?}
Results found here can be computed using the file \texttt{ex6.m}.
Please find the top 5 nodes by eigenvector centrality (page-rank column)
listed below:
\begin{verbatim}
page-rank in out author
34 0.1009 17 17 34
1 0.0970 16 16 1
33 0.0717 12 12 33
3 0.0571 10 10 3
2 0.0529 9 9 2
\end{verbatim}
\subsection{Are the rankings in (a) and (b) identical? Give a brief verbal
explanation of the similarities and differences.}
The rankings found are identical, even though if we normalize the degree
centrality to the greatest eigenvector centrality we find slighly different
values ($[0.1009, 0.0946, 0.0694, 0.0568, 0.0505]$) w.r.t the actual eigenvector
centrality.
The identical rankings may be explained by the fact that by computing the
eigenvector centrality we are effectively applying PageRank to a symmetrical
matrix, i.e. to a graph with bidirectional links. Since the links are
bidirectional, we effectively make all the nodes in the graph of the same
``importance'' to the eyes of PageRank, thus avoiding a case where a node has
high PageRank thank to connections with few, but very ``important'' nodes.
Therefore PageRank is simply reduced to a priotarization of nodes with many
edges, i.e. the degree centrality ranking.
\subsection{Use spectral graph partitioning to find a near-optimal split of the
network into two groups of 16 and 18 nodes, respectively. List the nodes in the
two groups. How does spectral bisection compare to the real split observed by
Zachary?}
2020-10-07 13:39:10 +00:00
The spectral bisection of the matrix a in two groups of 16 and 18 members
respectively is identical to the real split observed by Zachary. To compute the
split, the script \texttt{ex6.m} was used.
Here are the (sorted) two groups found:
\begin{gather*}
G_1 = [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 18, 20, 22] \\
G_2 = [9, 10, 15, 16, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]
\end{gather*}
Here are the spy plots of the original matrix A (to the left) and the spectral
bisected permutated matrix (to the right):
\centering{\input{ex6_6_spy.tex}}
Here is a plot of the sorted elements of the second eigenvector $\lambda_2$:
\centering{\input{ex6_6_ev.tex}}
and here are the actual (sorted) values of $\lambda_2$:
\begin{align*}
sort(\lambda_2) = [-0.4228, -0.3237, -0.3237, -0.2846,
-0.2846, -0.2110, -0.1121, -0.1095, -0.1002, \\ -0.1002, -0.0555,
-0.0526, -0.0413, -0.0147, -0.0136, 0.0232, 0.0516, 0.0735, \\
0.0928, 0.0952, 0.0988, 0.1189, 0.1277, 0.1303, 0.1530,
0.1557, 0.1610, \\ 0.1628, 0.1628, 0.1628, 0.1628, 0.1628,
0.1677, 0.1871]^T
\end{align*}
As it can be seen above, there are only 15 negative values out the 16 we would
need to obtain a perfect 16/18 partition. We therefore add the index corresponding to
the smallest positive value in $\lambda_2$ in the set of indexes of group 1.
This seems to be a good approximation since indeed we get the same partitioning
as the original Zachary's one.
2020-09-29 11:58:49 +00:00
\begin{thebibliography}{99}
\bibitem{karate} The social network of a karate club at a US university, M.~E.~J. Newman and M. Girvan, Phys. Rev. E 69,026113 (2004)
pp. 219-229.
\end{thebibliography}
\end{document}