250 lines
10 KiB
TeX
Executable file
250 lines
10 KiB
TeX
Executable file
\documentclass{scrartcl}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage{float}
|
|
\usepackage{graphicx}
|
|
\usepackage[ruled,vlined]{algorithm2e}
|
|
\usepackage{subcaption}
|
|
\usepackage{amsmath}
|
|
\usepackage{pgfplots}
|
|
\pgfplotsset{compat=newest}
|
|
\usetikzlibrary{plotmarks}
|
|
\usetikzlibrary{arrows.meta}
|
|
\usepgfplotslibrary{patchplots}
|
|
\usepackage{grffile}
|
|
\usepackage{amsmath}
|
|
\usepackage{subcaption}
|
|
\usepgfplotslibrary{external}
|
|
\tikzexternalize
|
|
\usepackage[margin=2.5cm]{geometry}
|
|
|
|
% To compile:
|
|
% sed -i 's#title style={font=\\bfseries#title style={yshift=1ex, font=\\tiny\\bfseries#' *.tex
|
|
% luatex -enable-write18 -shellescape main.tex
|
|
|
|
\pgfplotsset{every x tick label/.append style={font=\tiny, yshift=0.5ex}}
|
|
\pgfplotsset{every title/.append style={font=\tiny, align=center}}
|
|
\pgfplotsset{every y tick label/.append style={font=\tiny, xshift=0.5ex}}
|
|
\pgfplotsset{every z tick label/.append style={font=\tiny, xshift=0.5ex}}
|
|
|
|
\setlength{\parindent}{0cm}
|
|
\setlength{\parskip}{0.5\baselineskip}
|
|
|
|
\title{Optimization methods -- Homework 2}
|
|
\author{Claudio Maggioni}
|
|
|
|
\begin{document}
|
|
|
|
\maketitle
|
|
|
|
\section{Exercise 1}
|
|
|
|
\subsection{Implement the matrix $A$ and the vector $b$, for the moment, without taking into consideration the
|
|
boundary conditions. As you can see, the matrix $A$ is not symmetric. Does an energy function of
|
|
the problem exist? Consider $N = 4$ and show your answer, explaining why it can or cannot exist.}
|
|
|
|
The implementation of a function that generates $A$ w.r.t. the parameter $N$ can be found in the MATLAB script
|
|
\texttt{main.m} under Section 1.1.
|
|
|
|
The matrix $A$ and the vector $b$ appear in the following form:
|
|
|
|
\[A = \begin{bmatrix}
|
|
1 & 0 & 0 & 0 & \ldots & 0 \\
|
|
-1 & 2 & -1 & 0 & \ldots & 0 \\
|
|
0 & -1 & 2 & -1 & \ldots & 0 \\
|
|
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
|
|
0 & 0 & 0 & 0 & \ldots & 1 \\
|
|
\end{bmatrix}\;\;\;b = \begin{bmatrix}0\\h^2\\h^2\\\vdots\\0\end{bmatrix}\]
|
|
|
|
For $N = 4$, we have the following $A$ and $b$:
|
|
|
|
\[A = \begin{bmatrix}
|
|
1 & 0 & 0 & 0 \\
|
|
-1 & 2 & -1 & 0 \\
|
|
0 & -1 & 2 & -1 \\
|
|
0 & 0 & 0 & 1 \\
|
|
\end{bmatrix}\;\;\;b = \begin{bmatrix}0\\\frac19\\\frac19\\0\end{bmatrix}\]
|
|
|
|
In order to solve the minimization problem we need to minimize the energy function:
|
|
|
|
\[\phi(x) = \frac12 x^T A x - b^T x\]
|
|
|
|
Computing the gradient of the minimizer, and considering that $A$ is clearly not symmetric as shown above, we find:
|
|
|
|
\[\Delta \phi(x) = \frac12A^T x + \frac12Ax - b\]
|
|
|
|
If $A$ would be symmetric, $\Delta \phi(x)$ is equal to $Ax - b$ and therefore the semantic equivalence between this energy function and the solution of $Ax = b$ is straightforward. However, if $A$ is not symmetric then this does not hold and an enegry function therefore does not exist.
|
|
|
|
\subsection{Once the new matrix has been derived, write the energy function related to the new problem
|
|
and the corresponding gradient and Hessian.}
|
|
|
|
As by the definitions of $A$ and $b$ given in the assignment, we already enforce $x_1 = x_n = 0$, since the first and the last term of the matrix-vector product $Ax$ are $x_1$ and $x_n$ respectively and the system of equations represented by $Ax = b$ would indeed include the equalities $x_1 = b_1 = 0$ and $x_n = b_n = 0$. Therefore, we can simply alter the matrix $A$ without any need to perform any other transformation in the system.
|
|
|
|
We therefore define $\overline{A}$ as a copy of $A$ where $\overline{A}_{2,1} = \overline{A}_{n-1, n} = 0$.
|
|
|
|
The objective function then becomes $\phi(x) = \frac12 x^T \overline{A} x - b^T x$.
|
|
And since the objective is a standard quadratic form, the gradient is $\overline{A}x - b$ while the Hessian is $\overline{A}$.
|
|
|
|
\subsection{Write the Conjugate Gradient algorithm in the pdf and implement it Matlab code in a function
|
|
called \texttt{CGSolve}.}
|
|
|
|
The Conjugate Gradient algorithm is the following:
|
|
|
|
\begin{algorithm}[H]
|
|
\SetAlgoLined
|
|
Set $r_0 \gets Ax_0 - b$, $p_0 \gets r_0$, $k \gets 0$\;
|
|
\While{$r_k \neq 0$}{%
|
|
$\alpha_k \gets \frac{r^T_kr_k}{p^T_kAp_k}$\;
|
|
$x_{k+1} \gets x_k + \alpha_k p_k$\;
|
|
$r_{k+1} \gets r_k + \alpha_k A p_k$\;
|
|
$\beta_{k+1} \gets \frac{r^T_{k+1}r_{k+1}}{r^T_kr_k}$\;
|
|
$p_{k+1} \gets -r_{k+1} + \beta_{k+1}p_k$\;
|
|
$k \gets k + 1$\;
|
|
}
|
|
\caption{Conjugate Gradient algorithm}
|
|
\end{algorithm}
|
|
|
|
The MATLAB solution of this task can be found in Section 1.3 of the script \texttt{main.m} under the function \texttt{CGSolve}.
|
|
|
|
\subsection{Solve the Poisson problem.}
|
|
|
|
The solution of this task can be found in Section 1.4 of the script \texttt{main.m}.
|
|
Due to space constraints, the $R^{1000}$ column vector for the solution of $x$ is not shown here but can be easily derived by running the script and checking the variable \texttt{x} after the script execution.
|
|
|
|
\subsection{Plot the value of energy function and the norm of the gradient (here,
|
|
use semilogy) as functions of the iterations.}
|
|
|
|
The code to generate the plots below can be found in Section 1.5 of the script \texttt{main.m}.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}{0.5\textwidth}
|
|
\resizebox{\textwidth}{\textwidth}{\input{obvalues}}
|
|
\caption{Objective function values w.r.t. iteration number}
|
|
\end{subfigure}
|
|
\begin{subfigure}{0.5\textwidth}
|
|
\resizebox{\textwidth}{\textwidth}{\input{gnorms}}
|
|
\caption{Norm of the gradient w.r.t. iteration number \\ (y-axis is log scaled)}
|
|
\end{subfigure}
|
|
\caption{Plots for Exercise 1.4.}
|
|
\end{figure}
|
|
|
|
\subsection{Finally, explain why the Conjugate Gradient method is a Krylov subspace method.}
|
|
|
|
We can say that the Conjugate Gradient method is a Krylov subspace method since we can say that both all residuals $r_i$ and all search directions $p_i$ are contained in the span of $0$ to $k$ repeated applications of matrix transformer $A$ onto the initial search direction $p_0$. Broadly speaking, this property is directly connected with the fact that the CG methods performs up to $n$ steps exactly for a $n$ by $n$ matrix, walking steps that are all $A$-orthogonal with each other (i.e. for all couples of search steps $p_i$, $p_j$ where $i \neq j$, $\langle Ap_i, p_j \rangle = 0$).
|
|
To sum up our claim, we can say CG is indeed a Krylov subspace method because:
|
|
|
|
\[\text{span}\{r_0, r_1, \ldots, r_n\} = \text{span}\{r_0, A r_0, \ldots, A^k r_0\}\]
|
|
\[\text{span}\{p_0, p_1, \ldots, p_n\} = \text{span}\{r_0, A r_0, \ldots, A^k r_0\}\]
|
|
|
|
These statements have been already proven in class and the proof can be found in Theorem 5.3 of Nocedal's book.
|
|
|
|
\section{Exercise 2}
|
|
|
|
Consider the linear system $Ax = b$, where the matrix $A$ is constructed in three different ways:
|
|
|
|
\begin{itemize}
|
|
\item $A_1 =$ diag([1:10])
|
|
\item $A_2 =$ diag(ones(1,10))
|
|
\item $A_3 =$ diag([1, 1, 1, 3, 4, 5, 5, 5, 10, 10])
|
|
\item $A_4 =$ diag([1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0])
|
|
\end{itemize}
|
|
|
|
\subsection{How many distinct eigenvalues has each matrix?}
|
|
|
|
Each matrix has a distinct number of eigenvalues equal to the number of distinct
|
|
elements on its diagonal. So, in order, each A has respectively 10, 1, 5, and 10 distinct eigenvalues.
|
|
|
|
\subsection{Construct a right-hand side $b=$rand(10,1) and apply the
|
|
Conjugate Gradient method to solve the system for each $A$.}
|
|
|
|
The solution of this task can be found in section 2.2 of the \texttt{main.m} MATLAB script.
|
|
Below are the chosen $b$ vector (which is unchanged between executions due to fixing the random generator seed in the script) and the solutions of $x$ for each matrix respectively.
|
|
|
|
\[b = \begin{bmatrix}0.814723686393179\\
|
|
0.905791937075619\\
|
|
0.126986816293506\\
|
|
0.913375856139019\\
|
|
0.632359246225410\\
|
|
0.0975404049994095\\
|
|
0.278498218867048\\
|
|
0.546881519204984\\
|
|
0.957506835434298\\
|
|
0.964888535199277\end{bmatrix}\;
|
|
x_1 = \begin{bmatrix}0.814723686393179\\
|
|
0.452895968537810\\
|
|
0.0423289387645020\\
|
|
0.228343964034755\\
|
|
0.126471849245082\\
|
|
0.0162567341665680\\
|
|
0.0397854598381500\\
|
|
0.0683601899006230\\
|
|
0.106389648381589\\
|
|
0.0964888535199280\end{bmatrix}\;x_2 = \begin{bmatrix}0.814723686393179\\
|
|
0.905791937075619\\
|
|
0.126986816293506\\
|
|
0.913375856139019\\
|
|
0.632359246225410\\
|
|
0.0975404049994095\\
|
|
0.278498218867048\\
|
|
0.546881519204984\\
|
|
0.957506835434298\\
|
|
0.964888535199277\end{bmatrix}\]\[x_3 = \begin{bmatrix}0.814723686393179\\
|
|
0.905791937075619\\
|
|
0.126986816293506\\
|
|
0.304458618713007\\
|
|
0.158089811556352\\
|
|
0.0195080809998819\\
|
|
0.0556996437734097\\
|
|
0.109376303840997\\
|
|
0.0957506835434296\\
|
|
0.0964888535199280\end{bmatrix}\;x_4 = \begin{bmatrix}0.740657896716011\\
|
|
0.754826614267239\\
|
|
0.0976821653904972\\
|
|
0.652411326111541\\
|
|
0.421572830214428\\
|
|
0.0609627567866090\\
|
|
0.163822480881755\\
|
|
0.303823066390868\\
|
|
0.503950965995613\\
|
|
0.482444267601989\end{bmatrix}\]
|
|
|
|
\subsection{Compute the logarithm energy norm of the error for each matrix
|
|
and plot it with respect to the number of iteration.}
|
|
|
|
The code to generate the plots below and to compute the logarithm energy norm of the error can be found in section 2.3 of the \texttt{main.m} MATLAB script.
|
|
|
|
\begin{figure}[H]
|
|
\begin{subfigure}{0.5\textwidth}
|
|
\resizebox{\textwidth}{!}{\input{a1}}
|
|
\caption{First matrix}
|
|
\end{subfigure}
|
|
\begin{subfigure}{0.5\textwidth}
|
|
\resizebox{\textwidth}{!}{\input{a2}}
|
|
\caption{Second matrix}
|
|
\end{subfigure}
|
|
\begin{subfigure}{0.5\textwidth}
|
|
\resizebox{\textwidth}{!}{\input{a3}}
|
|
\caption{Third matrix}
|
|
\end{subfigure}
|
|
\begin{subfigure}{0.5\textwidth}
|
|
\resizebox{\textwidth}{!}{\input{a4}}
|
|
\caption{Fourth matrix}
|
|
\end{subfigure}
|
|
\caption{Plots of logarithm energy norm of the error per iteration. Minus infinity logarithms not shown in the plot.}
|
|
\end{figure}
|
|
|
|
\subsection{Comment on the convergence of the method for the different matrices. What can you say observing
|
|
the number of iterations obtained and the number of clusters of the eigenvalues of the related
|
|
matrix?}
|
|
|
|
The method converges quickly for each matrix. The fastest convergence surely happens for $A2$, which is
|
|
the identity matrix and therefore makes the $Ax = b$ problem trivial.
|
|
|
|
For all the other matrices, we observe the energy norm of the error decreasing exponentially as the iterations
|
|
increase, eventually reaching $0$ for the cases where the method converges exactly (namely on matrices $A1$ and $A3$).
|
|
|
|
Other than for the fourth matrix, the number of iterations is exactly equal
|
|
to the number of distinct eigenvalues for the matrix. That exception on the fourth matrix is simply due to the
|
|
tolerance termination condition holding true for an earlier iteration, i.e. we terminate early since we find an
|
|
approximation of $x$ with residual norm below $10^{-8}$.
|
|
|
|
\end{document}
|