431 lines
16 KiB
TeX
431 lines
16 KiB
TeX
\documentclass[unicode,11pt,a4paper,oneside,numbers=endperiod,openany]{scrartcl}
|
||
\usepackage{graphicx}
|
||
\usepackage{subcaption}
|
||
\usepackage{amsmath}
|
||
\input{assignment.sty}
|
||
|
||
|
||
\hyphenation{PageRank}
|
||
\hyphenation{PageRanks}
|
||
|
||
|
||
\begin{document}
|
||
|
||
|
||
\setassignment
|
||
\setduedate{Wednesday, 14 October 2020, 11:55 PM}
|
||
|
||
\serieheader{Numerical Computing}{2020}{Student: Claudio Maggioni}{Discussed with: FULL NAME}{Solution for Project 2}{}
|
||
\newline
|
||
|
||
\assignmentpolicy
|
||
|
||
|
||
The purpose of this assignment\footnote{This document is originally
|
||
based on a blog from Cleve Moler, who wrote a fantastic blog post about the Lake Arrowhead graph, and John
|
||
Gilbert, who initially created the coauthor graph from the 1993 Householder Meeting. You can find more information
|
||
at \url{http://blogs.mathworks.com/cleve/2013/06/10/lake-arrowhead-coauthor-graph/}. Most of this assignment is derived
|
||
from this archived work.} is to learn the importance of sparse linear algebra algorithms to solve fundamental
|
||
questions in social network analyses.
|
||
We will use the coauthor graph from the Householder Meeting and the social network of friendships from Zachary's karate club~\cite{karate}.
|
||
These two graphs are one of the first examples where matrix methods were used in computational social network analyses.
|
||
|
||
|
||
\section{The Reverse Cuthill McKee Ordering [10 points]}
|
||
|
||
The Reverse Cuthill McKee Ordering of matrix \texttt{A\_SymPosDef} is computed with MATLAB's \texttt{sysrcm(\ldots)} and
|
||
the matrix is rearranged accordingly. Here are the spy plot of these matrices:
|
||
|
||
\begin{figure}[h]
|
||
\centering
|
||
\begin{subfigure}{0.49\textwidth}
|
||
\centering
|
||
\includegraphics[width = \textwidth]{1_spy_a}
|
||
\caption{Spy plot of \texttt{A\_SymPosDef}}
|
||
\end{subfigure}
|
||
\begin{subfigure}{0.49\textwidth}
|
||
\centering
|
||
\includegraphics[width = \textwidth]{1_spy_rcm}
|
||
\caption{Spy plot of \texttt{sysrcm(\ldots)} rearranged version of \texttt{A\_SymPosDef}}
|
||
\end{subfigure}
|
||
\caption{Spy plots of the two matrices}
|
||
\label{fig:1}
|
||
\end{figure}
|
||
|
||
And the spy plots of the corresponding Cholesky factor are listed in figure~\ref{fig:1chol}.
|
||
|
||
\begin{figure}[h]
|
||
\centering
|
||
\begin{subfigure}{0.49\textwidth}
|
||
\centering
|
||
\includegraphics[width = \textwidth]{1_spy_chol_a}
|
||
\caption{Spy plot of \texttt{chol(A\_SymPosDef)}}
|
||
\end{subfigure}
|
||
\begin{subfigure}{0.49\textwidth}
|
||
\centering
|
||
\includegraphics[width = \textwidth]{1_spy_chol_rcm}
|
||
\caption{Spy plot of \texttt{chol(A\_SymPosDef(sysrcm(A\_SymPosDef), sysrcm(A\_SymPosDef)))}}
|
||
\end{subfigure}
|
||
\caption{Spy plots of the two Cholesky factors}
|
||
\label{fig:1chol}
|
||
\end{figure}
|
||
|
||
The number of nonzero elements in the Cholesky factor of the RCM optimized matrix are significantly lower (circa 0.1x) of the ones in the vanilla process. The respective nonzero counts can be found in figure~\ref{fig:1chol}.
|
||
|
||
\section{Sparse Matrix Factorization [10 points]}
|
||
|
||
\subsection{Show that $A \in R^{n x n}$ has exactly $5n - 6$ nonzero elements.}
|
||
|
||
The given description of $A$ says that all the element at the edges of the
|
||
matrix (rows and columns 1 and $n$) plus all the elements on the main diagonal
|
||
are the only nonzero elements of $A$. Therefore, this cells can be counted as
|
||
the 4 vertex cells in the matrix square plus 5 $n-2$-long segments,
|
||
corresponding to all edges and the main diagonal. Therefore:
|
||
|
||
\[4 + 5 \dot (n - 2) = 5n - 6\]
|
||
|
||
\subsection{Construct this matrix and visualize its non-zero structure.}
|
||
|
||
The matrix $A \in R^{n x n}$ looks like this (zero entries are represented as
|
||
blanks):
|
||
|
||
\[
|
||
A := \begin{bmatrix}
|
||
n & 1 & 1 & \hdots & 1 \\
|
||
1 & n + 1 & && 1 \\
|
||
1 & & n + 2 && 1 \\
|
||
\vdots & & & \ddots & \vdots \\
|
||
1 & 1 & 1 & \hdots & 2n - 1 \\
|
||
\end{bmatrix}
|
||
\]
|
||
|
||
\subsection{Explain why for n = $100000$ using Matlab’s \texttt{chol(\ldots)}
|
||
to solve $Ax = b$
|
||
for a given righthand-side vector would be problematic.}
|
||
|
||
Solving $Ax = b$ would be a costly operation since the a Cholesky
|
||
decomposition of matrix $A$ (performed using MATLAB's \texttt{chol(\ldots)})
|
||
would drastically reduce the number of zero elements in the matrix in the very
|
||
first iteration. This is due to the fact that the first row, by definition, is
|
||
made of of only nonzero elements (namely 1s) and by subtracting the first row to
|
||
every other row (as what would effectively happen in the first iteration of the
|
||
Cholesky decomposition of A) the zero elements would become (negative) nonzero
|
||
elements, thus making all columns but the first almost empty of 0s.
|
||
|
||
\section{Degree Centrality [10 points]}
|
||
|
||
Assuming that the degree of the Householder graph is the number of co-authors of
|
||
each author and that an author is not co-author of himself, the degree
|
||
centralities of all authors sorted in descending order are below.
|
||
|
||
This output has been obtained by running \texttt{ex3.m}.
|
||
|
||
\begin{verbatim}
|
||
Author Centrality: Coauthors...
|
||
|
||
Golub 31: Wilkinson TChan Varah Overton Ernst VanLoan Saunders Bojanczyk
|
||
Dubrulle George Nachtigal Kahan Varga Kagstrom Widlund
|
||
OLeary Bjorck Eisenstat Zha VanDooren Tang Reichel Luk Fischer
|
||
Gutknecht Heath Plemmons Berry Sameh Meyer Gill
|
||
Demmel 15: Edelman VanLoan Bai Schreiber Kahan Kagstrom Barlow
|
||
NHigham Arioli Duff Hammarling Bunch Heath Greenbaum Gragg
|
||
Plemmons 13: Golub Nagy Harrod Pan Funderlic Bojanczyk George Barlow Heath
|
||
Berry Sameh Meyer Nichols
|
||
Heath 12: Golub TChan Funderlic George Gilbert Eisenstat Ng Liu Laub Plemmons
|
||
Paige Demmel
|
||
Schreiber 12: TChan VanLoan Moler Gilbert Pothen NTrefethen Bjorstad NHigham
|
||
Eisenstat Tang Elden Demmel
|
||
Hammarling 10: Wilkinson Kaufman Bai Bjorck VanHuffel VanDooren Duff Greenbaum
|
||
Gill Demmel
|
||
VanDooren 10: Golub Boley Bojanczyk Kagstrom VanHuffel Luk Hammarling Laub
|
||
Nichols Paige
|
||
TChan 10: Golub Saied Ong Kuo Tong Schreiber Arioli Duff Heath Hansen
|
||
Gragg 9: Borges Kaufman Harrod Reichel Stewart BunseGerstner Ammar Warner Demmel
|
||
Moler 8: Wilkinson VanLoan Gilbert Schreiber Henrici Stewart Bunch Laub
|
||
VanLoan 8: Golub Moler Schreiber Kagstrom Luk Bunch Paige Demmel
|
||
Paige 7: Anjos VanLoan Saunders Bjorck VanDooren Laub Heath
|
||
Gutknecht 7: Golub Ashby Boley NTrefethen Nachtigal Varga Hochbruck
|
||
Luk 7: Golub Overton Boley VanLoan Bojanczyk Park VanDooren
|
||
Eisenstat 7: Golub Gu George Schreiber Liu Heath Ipsen
|
||
George 7: Golub Eisenstat Ng Liu Tang Heath Plemmons
|
||
Meyer 6: Golub Benzi Funderlic Stewart Ipsen Plemmons
|
||
Bunch 6: LeBorne Fierro VanLoan Moler Stewart Demmel
|
||
Stewart 6: Moler Bunch Gragg Meyer Gill Mathias
|
||
Reichel 6: Golub NTrefethen Nachtigal Fischer Gragg Ammar
|
||
Bjorck 6: Golub Park Duff Hammarling Elden Paige
|
||
NTrefethen 6: Schreiber Nachtigal Reichel Gutknecht Greenbaum ATrefethen
|
||
Nichols 5: Byers Barlow VanDooren Plemmons BunseGerstner
|
||
Greenbaum 5: Cullum Strakos NTrefethen Hammarling Demmel
|
||
Ipsen 5: Chandrasekaran Barlow Eisenstat Meyer Jessup
|
||
Laub 5: Kenney Moler VanDooren Heath Paige
|
||
Duff 5: TChan Bjorck Arioli Hammarling Demmel
|
||
Liu 5: George Gilbert Eisenstat Ng Heath
|
||
Park 5: Boley Bjorck VanHuffel Luk Elden
|
||
Zha 5: Golub Bai Barlow VanHuffel Hansen
|
||
Widlund 5: Golub Bjorstad OLeary Smith Szyld
|
||
Barlow 5: Zha Ipsen Plemmons Nichols Demmel
|
||
Kagstrom 5: Golub VanLoan VanDooren Ruhe Demmel
|
||
Varga 5: Golub Marek Young Gutknecht Starke
|
||
Gilbert 5: Moler Schreiber Ng Liu Heath
|
||
Gill 4: Golub Saunders Hammarling Stewart
|
||
Sameh 4: Golub Harrod Plemmons Berry
|
||
Berry 4: Golub Harrod Plemmons Sameh
|
||
BunseGerstner 4: He Byers Gragg Nichols
|
||
Hansen 4: TChan Fierro OLeary Zha
|
||
Ng 4: George Gilbert Liu Heath
|
||
Arioli 4: TChan MuntheKaas Duff Demmel
|
||
VanHuffel 4: Zha Park VanDooren Hammarling
|
||
Nachtigal 4: Golub NTrefethen Reichel Gutknecht
|
||
Bojanczyk 4: Golub VanDooren Luk Plemmons
|
||
Harrod 4: Plemmons Gragg Berry Sameh
|
||
Boley 4: Park VanDooren Luk Gutknecht
|
||
Wilkinson 4: Golub Dubrulle Moler Hammarling
|
||
Ammar 3: He Reichel Gragg
|
||
Elden 3: Schreiber Bjorck Park
|
||
Fischer 3: Golub Modersitzki Reichel
|
||
Tang 3: Golub George Schreiber
|
||
NHigham 3: Schreiber Pothen Demmel
|
||
OLeary 3: Golub Widlund Hansen
|
||
Bjorstad 3: Schreiber Widlund Boman
|
||
Kahan 3: Golub Davis Demmel
|
||
Bai 3: Zha Hammarling Demmel
|
||
Saunders 3: Golub Paige Gill
|
||
Funderlic 3: Heath Plemmons Meyer
|
||
Kaufman 3: Hammarling Gragg Warner
|
||
Starke 2: Varga Hochbruck
|
||
Hochbruck 2: Gutknecht Starke
|
||
Jessup 2: Crevelli Ipsen
|
||
Warner 2: Kaufman Gragg
|
||
Ruhe 2: Wold Kagstrom
|
||
Szyld 2: Marek Widlund
|
||
Young 2: Kincaid Varga
|
||
Pothen 2: Schreiber NHigham
|
||
Tong 2: TChan Kuo
|
||
Kuo 2: TChan Tong
|
||
Marek 2: Varga Szyld
|
||
Dubrulle 2: Golub Wilkinson
|
||
Fierro 2: Bunch Hansen
|
||
Byers 2: BunseGerstner Nichols
|
||
Overton 2: Golub Luk
|
||
He 2: BunseGerstner Ammar
|
||
Mathias 1: Stewart
|
||
Davis 1: Kahan
|
||
ATrefethen 1: NTrefethen
|
||
Henrici 1: Moler
|
||
Smith 1: Widlund
|
||
MuntheKaas 1: Arioli
|
||
Boman 1: Bjorstad
|
||
Chandrasekaran 1: Ipsen
|
||
Wold 1: Ruhe
|
||
Ong 1: TChan
|
||
Saied 1: TChan
|
||
Strakos 1: Greenbaum
|
||
Cullum 1: Greenbaum
|
||
Edelman 1: Demmel
|
||
Pan 1: Plemmons
|
||
Nagy 1: Plemmons
|
||
Gu 1: Eisenstat
|
||
Benzi 1: Meyer
|
||
Anjos 1: Paige
|
||
Crevelli 1: Jessup
|
||
Kincaid 1: Young
|
||
Borges 1: Gragg
|
||
Ernst 1: Golub
|
||
Modersitzki 1: Fischer
|
||
LeBorne 1: Bunch
|
||
Ashby 1: Gutknecht
|
||
Kenney 1: Laub
|
||
Varah 1: Golub
|
||
\end{verbatim}
|
||
|
||
\section{The Connectivity of the Coauthors [10 points]}
|
||
|
||
The author indexes of the common authors between the author at index $i$ and the
|
||
author at index $j$ can be computed by listing the indexes of the nonzero
|
||
elements in the Schur product (or element-wise product) between $A_{:,i}$ and
|
||
$A_{:,j}$ (respectively the i-th and j-th column vector of $A$). Therefore the set $C$ of common coauthor's indexes can be defined
|
||
as:
|
||
|
||
\[C = \{i \in N_0 \;|\; (A_{:,i} \odot A_{:,j})_i = 1\}\]
|
||
|
||
The results below were computing by using the script \texttt{ex4.m}.
|
||
|
||
The common Co-authors between Golub and Moler are Wilkinson and Van Loan.
|
||
|
||
The common Co-authors between Golub and Saunders are Golub, Saunders and Gill.
|
||
|
||
The common Co-authors between TChan and Demmel are Schreiber, Arioli, Duff and
|
||
Heath.
|
||
|
||
\section{PageRank of the Coauthor Graph [10 points]}
|
||
|
||
The PageRank values for all authors were computing by using the scripts
|
||
\texttt{ex5.m} and \texttt{pagerank.m}, a basically identical version of
|
||
\texttt{pagerank.m} from Mini Project 1. The output is shown below.
|
||
|
||
\begin{verbatim}
|
||
page-rank in out author
|
||
1 0.0511 32 32 Golub
|
||
104 0.0261 16 16 Demmel
|
||
86 0.0229 14 14 Plemmons
|
||
44 0.0212 13 13 Schreiber
|
||
3 0.0201 11 11 TChan
|
||
81 0.0198 13 13 Heath
|
||
90 0.0181 10 10 Gragg
|
||
74 0.0177 11 11 Hammarling
|
||
66 0.0171 11 11 VanDooren
|
||
42 0.0152 9 9 Moler
|
||
79 0.0151 8 8 Gutknecht
|
||
32 0.0142 9 9 VanLoan
|
||
59 0.0135 8 8 Eisenstat
|
||
98 0.0133 8 8 Paige
|
||
46 0.0130 7 7 NTrefethen
|
||
49 0.0129 6 6 Varga
|
||
96 0.0128 7 7 Meyer
|
||
77 0.0128 7 7 Stewart
|
||
73 0.0127 8 8 Luk
|
||
78 0.0127 7 7 Bunch
|
||
53 0.0127 6 6 Widlund
|
||
72 0.0125 7 7 Reichel
|
||
41 0.0125 8 8 George
|
||
82 0.0124 6 6 Ipsen
|
||
83 0.0122 6 6 Greenbaum
|
||
58 0.0113 7 7 Bjorck
|
||
97 0.0107 6 6 Nichols
|
||
51 0.0106 6 6 Kagstrom
|
||
80 0.0106 6 6 Laub
|
||
52 0.0104 6 6 Barlow
|
||
60 0.0103 6 6 Zha
|
||
69 0.0102 6 6 Duff
|
||
62 0.0100 6 6 Park
|
||
89 0.0099 5 5 BunseGerstner
|
||
63 0.0098 5 5 Arioli
|
||
43 0.0097 6 6 Gilbert
|
||
67 0.0096 6 6 Liu
|
||
87 0.0096 5 5 Hansen
|
||
47 0.0090 5 5 Nachtigal
|
||
54 0.0090 4 4 Bjorstad
|
||
2 0.0088 5 5 Wilkinson
|
||
23 0.0088 5 5 Harrod
|
||
99 0.0087 5 5 Gill
|
||
92 0.0086 5 5 Sameh
|
||
91 0.0086 5 5 Berry
|
||
15 0.0086 5 5 Boley
|
||
76 0.0085 4 4 Fischer
|
||
50 0.0085 3 3 Young
|
||
61 0.0084 5 5 VanHuffel
|
||
100 0.0084 3 3 Jessup
|
||
48 0.0083 4 4 Kahan
|
||
35 0.0083 5 5 Bojanczyk
|
||
65 0.0082 5 5 Ng
|
||
93 0.0082 4 4 Ammar
|
||
55 0.0079 4 4 OLeary
|
||
84 0.0079 3 3 Ruhe
|
||
19 0.0078 4 4 Kaufman
|
||
56 0.0076 4 4 NHigham
|
||
37 0.0075 3 3 Marek
|
||
75 0.0075 3 3 Szyld
|
||
103 0.0074 3 3 Starke
|
||
34 0.0072 4 4 Saunders
|
||
25 0.0072 4 4 Funderlic
|
||
39 0.0072 4 4 Bai
|
||
102 0.0072 3 3 Hochbruck
|
||
88 0.0071 4 4 Elden
|
||
71 0.0070 4 4 Tang
|
||
38 0.0069 3 3 Kuo
|
||
40 0.0069 3 3 Tong
|
||
4 0.0068 3 3 He
|
||
13 0.0067 2 2 Kincaid
|
||
14 0.0067 2 2 Crevelli
|
||
94 0.0065 3 3 Warner
|
||
17 0.0065 3 3 Byers
|
||
21 0.0064 3 3 Fierro
|
||
31 0.0064 2 2 Wold
|
||
45 0.0062 3 3 Pothen
|
||
36 0.0060 3 3 Dubrulle
|
||
57 0.0058 2 2 Boman
|
||
10 0.0058 3 3 Overton
|
||
9 0.0057 2 2 Modersitzki
|
||
68 0.0056 2 2 Smith
|
||
95 0.0056 2 2 Davis
|
||
33 0.0056 2 2 Chandrasekaran
|
||
27 0.0055 2 2 Cullum
|
||
28 0.0055 2 2 Strakos
|
||
64 0.0054 2 2 MuntheKaas
|
||
7 0.0053 2 2 Ashby
|
||
85 0.0053 2 2 ATrefethen
|
||
29 0.0052 2 2 Saied
|
||
30 0.0052 2 2 Ong
|
||
18 0.0052 2 2 Benzi
|
||
101 0.0052 2 2 Mathias
|
||
8 0.0052 2 2 LeBorne
|
||
12 0.0052 2 2 Borges
|
||
6 0.0051 2 2 Kenney
|
||
70 0.0050 2 2 Henrici
|
||
\end{verbatim}
|
||
|
||
\section{Zachary's karate club: social network of friendships between 34 members [50 points]}
|
||
|
||
\subsection{Write a Matlab code that ranks the five nodes with the largest
|
||
degree centrality? What are their degrees?}
|
||
|
||
Results found here can be computed using the file \texttt{ex6.m}.
|
||
|
||
Please find the top 5 nodes by degree centrality, with their degree and their
|
||
neighbours listed below:
|
||
|
||
\begin{verbatim}
|
||
Node Degree: Neighbours...
|
||
34 16: 9, 10, 14, 15, 16, 19, 20, 21, 23, 24, 27, 28, 29, 30, 31, 32, 33,
|
||
1 15: 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 18, 20, 22, 32,
|
||
33 11: 3, 9, 15, 16, 19, 21, 23, 24, 30, 31, 32, 34,
|
||
3 9: 1, 2, 4, 8, 9, 10, 14, 28, 29, 33,
|
||
2 8: 1, 3, 4, 8, 14, 18, 20, 22, 31,
|
||
\end{verbatim}
|
||
|
||
\subsection{Rank the five nodes with the largest eigenvector centrality. What are
|
||
their (properly normalized) eigenvector centralities?}
|
||
|
||
Results found here can be computed using the file \texttt{ex6.m}.
|
||
|
||
Please find the top 5 nodes by eigenvector centrality (page-rank column)
|
||
listed below:
|
||
|
||
\begin{verbatim}
|
||
page-rank in out author
|
||
34 0.1009 17 17 34
|
||
1 0.0970 16 16 1
|
||
33 0.0717 12 12 33
|
||
3 0.0571 10 10 3
|
||
2 0.0529 9 9 2
|
||
\end{verbatim}
|
||
|
||
\subsection{Are the rankings in (a) and (b) identical? Give a brief verbal
|
||
explanation of the similarities and differences.}
|
||
|
||
The rankings found are identical, even though if we normalize the degree
|
||
centrality to the greatest eigenvector centrality we find slighly different
|
||
values ($[0.1009, 0.0946, 0.0694, 0.0568, 0.0505]$) w.r.t the actual eigenvector
|
||
centrality.
|
||
|
||
The identical rankings may be explained by the fact that by computing the
|
||
eigenvector centrality we are effectively applying PageRank to a symmetrical
|
||
matrix, i.e. to a graph with bidirectional links. Since the links are
|
||
bidirectional, we effectively make all the nodes in the graph of the same
|
||
``importance'' to the eyes of PageRank, thus avoiding a case where a node has
|
||
high PageRank thank to connections with few, but very ``important'' nodes.
|
||
Therefore PageRank is simply reduced to a priotarization of nodes with many
|
||
edges, i.e. the degree centrality ranking.
|
||
|
||
\subsection{Use spectral graph partitioning to find a near-optimal split of the
|
||
network into two groups of 16 and 18 nodes, respectively. List the nodes in the
|
||
two groups. How does spectral bisection compare to the real split observed by
|
||
Zachary?}
|
||
|
||
\begin{thebibliography}{99}
|
||
\bibitem{karate} The social network of a karate club at a US university, M.~E.~J. Newman and M. Girvan, Phys. Rev. E 69,026113 (2004)
|
||
pp. 219-229.
|
||
\end{thebibliography}
|
||
|
||
|
||
\end{document}
|