2021-04-22 14:39:35 +00:00
|
|
|
\documentclass{scrartcl}
|
|
|
|
\usepackage{pdfpages}
|
|
|
|
\usepackage[utf8]{inputenc}
|
|
|
|
\usepackage{float}
|
|
|
|
\usepackage{graphicx}
|
|
|
|
\usepackage[ruled,vlined]{algorithm2e}
|
|
|
|
\usepackage{subcaption}
|
|
|
|
\usepackage{hyperref}
|
|
|
|
\usepackage{amsmath}
|
|
|
|
\usepackage{pgfplots}
|
|
|
|
\pgfplotsset{compat=newest}
|
|
|
|
\usetikzlibrary{plotmarks}
|
|
|
|
\usetikzlibrary{arrows.meta}
|
|
|
|
\usepgfplotslibrary{patchplots}
|
|
|
|
\usepackage{grffile}
|
|
|
|
\usepackage{amsmath}
|
|
|
|
\usepackage{subcaption}
|
|
|
|
\usepgfplotslibrary{external}
|
|
|
|
\tikzexternalize
|
|
|
|
\usepackage[margin=2.5cm]{geometry}
|
|
|
|
|
|
|
|
% To compile:
|
|
|
|
% sed -i 's#title style={font=\\bfseries#title style={yshift=1ex, font=\\tiny\\bfseries#' *.tex
|
|
|
|
% luatex -enable-write18 -shellescape main.tex
|
|
|
|
|
|
|
|
\pgfplotsset{every x tick label/.append style={font=\tiny, yshift=0.5ex}}
|
|
|
|
\pgfplotsset{every title/.append style={font=\tiny, align=center}}
|
|
|
|
\pgfplotsset{every y tick label/.append style={font=\tiny, xshift=0.5ex}}
|
|
|
|
\pgfplotsset{every z tick label/.append style={font=\tiny, xshift=0.5ex}}
|
|
|
|
|
|
|
|
\setlength{\parindent}{0cm}
|
|
|
|
\setlength{\parskip}{0.5\baselineskip}
|
|
|
|
|
2021-04-23 12:51:06 +00:00
|
|
|
\title{Optimization methods -- Homework 3} \author{Claudio Maggioni}
|
2021-04-22 14:39:35 +00:00
|
|
|
|
|
|
|
\begin{document}
|
|
|
|
|
|
|
|
\maketitle
|
|
|
|
|
|
|
|
\section{Exercise 1}
|
|
|
|
|
|
|
|
\subsection{Exercise 1.1}
|
|
|
|
|
2021-04-23 12:51:06 +00:00
|
|
|
Please consult the MATLAB implementation in the files \texttt{Newton.m},
|
|
|
|
\texttt{GD.m}, and \texttt{backtracking.m}. Please note that, for this and
|
|
|
|
subsequent exercises, the gradient descent method without backtracking
|
|
|
|
activated uses a fixed $\alpha=1$ despite the indications on the assignment
|
|
|
|
sheet. This was done in order to comply with the forum post on iCorsi found
|
|
|
|
here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}.
|
2021-04-22 14:39:35 +00:00
|
|
|
|
|
|
|
\subsection{Exercise 1.2}
|
|
|
|
|
2021-04-23 12:51:06 +00:00
|
|
|
Please consult the MATLAB implementation in the file \texttt{main.m} in section
|
|
|
|
1.2.
|
2021-04-22 14:39:35 +00:00
|
|
|
|
|
|
|
\subsection{Exercise 1.3}
|
|
|
|
|
2021-04-23 12:51:06 +00:00
|
|
|
Please find the requested plots in figure \ref{fig:1}. The code used to
|
|
|
|
generate these plots can be found in section 1.3 of \texttt{main.m}.
|
2021-04-22 14:39:35 +00:00
|
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
|
\begin{subfigure}{0.5\textwidth}
|
|
|
|
\resizebox{\textwidth}{\textwidth}{\includegraphics{ex1-3.jpg}}
|
|
|
|
\caption{Zoomed plot on $x = (-1,1)$ and $y = (-1,1)$}
|
|
|
|
\end{subfigure}
|
|
|
|
\begin{subfigure}{0.5\textwidth}
|
|
|
|
\resizebox{\textwidth}{\textwidth}{\input{ex1-3-gd}}
|
|
|
|
\caption{Complete plot}
|
|
|
|
\end{subfigure}
|
|
|
|
\caption{Steps in the energy landscape for Newton and GD methods}\label{fig:1}
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
\subsection{Exercise 1.4}
|
|
|
|
|
2021-04-23 12:51:06 +00:00
|
|
|
Please find the requested plots in figure \ref{fig:gsppn}. The code used to
|
|
|
|
generate these plots can be found in section 1.4 of \texttt{main.m}.
|
2021-04-22 14:39:35 +00:00
|
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
|
\begin{subfigure}{0.45\textwidth}
|
|
|
|
\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad-nonlog.jpg}}
|
|
|
|
\caption{Gradient norms \\(zoomed, y axis is linear for this plot)}
|
|
|
|
\end{subfigure}
|
|
|
|
\begin{subfigure}{0.45\textwidth}
|
|
|
|
\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-nonlog.jpg}}
|
|
|
|
\caption{Objective function values \\(zoomed, y axis is linear for this plot)}
|
|
|
|
\end{subfigure}
|
|
|
|
\begin{subfigure}{0.45\textwidth}
|
|
|
|
\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad.jpg}}
|
|
|
|
\caption{Gradient norms (zoomed)}
|
|
|
|
\end{subfigure}
|
|
|
|
\begin{subfigure}{0.45\textwidth}
|
|
|
|
\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys.jpg}}
|
|
|
|
\caption{Objective function values (zoomed)}
|
|
|
|
\end{subfigure}
|
|
|
|
\begin{subfigure}{0.45\textwidth}
|
|
|
|
\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad-large.jpg}}
|
|
|
|
\caption{Gradient norms}
|
|
|
|
\end{subfigure}
|
|
|
|
\begin{subfigure}{0.45\textwidth}
|
|
|
|
\centering
|
|
|
|
\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-large.jpg}}
|
|
|
|
\caption{Objective function values}
|
|
|
|
\end{subfigure}
|
2021-04-23 12:51:06 +00:00
|
|
|
\caption{Gradient norms and objective function values (y-axes) w.r.t.
|
|
|
|
iteration numbers (x-axis) for Newton and GD methods (y-axis is log
|
|
|
|
scaled, points at $y=0$ not shown due to log scale)}\label{fig:gsppn}
|
2021-04-22 14:39:35 +00:00
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
\section{Exercise 1.5}
|
|
|
|
|
2021-04-23 12:51:06 +00:00
|
|
|
The best performing method for this very set of input data is the Newton method
|
|
|
|
without backtracking, since it converges in only 2 iterations. The second best
|
|
|
|
performing one is the Newton method with backtracking, with convergence in 12
|
|
|
|
iterations. The gradient method achieves convergence with backtracking slowly
|
|
|
|
with 21102 iterations, while with a fixed $\alpha=1$ the method diverges in a
|
|
|
|
dozen of iterations resulting in catastrophic numerical instability leading to
|
|
|
|
\texttt{x\_k = [NaN; NaN]}.
|
|
|
|
|
|
|
|
Analyzing the movement in the energy landscape (figure \ref{fig:1}), the more
|
|
|
|
``coordinated'' method
|
|
|
|
in terms of direction of iterations steps appears to be the Netwton method with
|
|
|
|
backtracking. Other than performing a higher number of iterations when compared
|
|
|
|
with the classical variant, the method maintains its iteration directions
|
|
|
|
approximately at a 45 degree angle from the x axis, roughly always pointing at
|
|
|
|
the minimizer $x^*$. However, this movement strategy is definitely inefficient
|
|
|
|
compared with the 2-step convergence achieved by Newton without backtracking,
|
|
|
|
which follows a conjugate gradient like path finding in each step a component
|
|
|
|
of the true minimizer. GD with backtracking instead follows an inefficient
|
|
|
|
zig-zagging pattern with iterates in the vicinity of the Netwon + backtracking
|
|
|
|
iterates. Finally, GD without backtracking quickly degenerates as it can be
|
|
|
|
seen by the enlarged plot.
|
|
|
|
|
|
|
|
When looking at gradient norms and objective function
|
|
|
|
values (figure \ref{fig:gsppn}) over time, the degeneration of GD without
|
|
|
|
backtracking and the inefficiency of GD with backtracking can clearly be seen.
|
|
|
|
Newton with backtracking offers fairly smooth gradient norm and objective value
|
|
|
|
curves with an exponential decreasing slope in both for the last 5-10
|
|
|
|
iterations. Netwon without backtracking instead shoots at the first iteration
|
|
|
|
at gradient $\nabla f(x_1) \approx 450$ and objective value $f(x_1) \approx
|
|
|
|
100$, but quickly has both values decrease to 0 for its second iteration
|
|
|
|
achieving convergence.
|
2021-04-22 14:39:35 +00:00
|
|
|
|
|
|
|
\section{Exercise 2}
|
|
|
|
|
|
|
|
\subsection{Exercise 2.1}
|
|
|
|
|
|
|
|
Please consult the MATLAB implementation in the file \texttt{BGFS.m}.
|
|
|
|
|
|
|
|
\subsection{Exercise 2.2}
|
|
|
|
|
2021-04-23 12:51:06 +00:00
|
|
|
Please consult the MATLAB implementation in the file \texttt{main.m} in section
|
|
|
|
2.2.
|
2021-04-22 14:39:35 +00:00
|
|
|
|
|
|
|
\subsection{Exercise 2.3}
|
|
|
|
|
2021-04-23 12:51:06 +00:00
|
|
|
Please find the requested plots in figure \ref{fig:3}. The code used to
|
|
|
|
generate these plots can be found in section 2.3 of \texttt{main.m}.
|
2021-04-22 14:39:35 +00:00
|
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
|
\centering
|
|
|
|
\resizebox{.6\textwidth}{.6\textwidth}{\input{ex2-3}}
|
|
|
|
\caption{Steps in the energy landscape for BGFS method}\label{fig:3}
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
\subsection{Exercise 2.4}
|
|
|
|
|
2021-04-23 12:51:06 +00:00
|
|
|
Please find the requested plots in figure \ref{fig:4}. The code used to
|
|
|
|
generate these plots can be found in section 2.4 of \texttt{main.m}.
|
2021-04-22 14:39:35 +00:00
|
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
|
\begin{subfigure}{0.5\textwidth}`
|
|
|
|
\resizebox{\textwidth}{\textwidth}{\input{ex2-4-grad}}
|
|
|
|
\caption{Gradient norms}
|
|
|
|
\end{subfigure}
|
|
|
|
\begin{subfigure}{0.5\textwidth}
|
|
|
|
\resizebox{\textwidth}{\textwidth}{\input{ex2-4-ys}}
|
|
|
|
\caption{Objective function values}
|
|
|
|
\end{subfigure}
|
2021-04-23 12:51:06 +00:00
|
|
|
\caption{Gradient norms and objective function values (y-axes) w.r.t.
|
|
|
|
iteration numbers (x-axis) for BFGS method (y-axis is log scaled,
|
|
|
|
points at $y=0$ not shown due to log scale)}\label{fig:4}
|
2021-04-22 14:39:35 +00:00
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
\subsection{Exercise 2.5}
|
|
|
|
|
2021-04-23 12:51:06 +00:00
|
|
|
The following table summarizes the number of iterations required by each method to achieve convergence:
|
|
|
|
|
|
|
|
\begin{center}
|
|
|
|
\begin{tabular}{c|c|c}
|
|
|
|
\textbf{Method} & \textbf{Backtracking} & \textbf{\# of iterations} \\\hline
|
|
|
|
Newton & No & 2 \\
|
|
|
|
Newton & Yes & 12 \\
|
|
|
|
BGFS & Yes & 26 \\
|
|
|
|
Gradient descent & Yes & 21102 \\
|
|
|
|
Gradient descent & No ($\alpha = 1$) & Diverges after 6 \\
|
|
|
|
\end{tabular}
|
|
|
|
\end{center}
|
|
|
|
|
|
|
|
From the table above we can see that the BGFS
|
|
|
|
method is in the same performance order of magnitude as the Newton method,
|
|
|
|
albeit its number of iterations required to converge are more than double (26)
|
|
|
|
of the ones of Newton with backtracking (12), and more than ten times of the
|
|
|
|
ones required by the Newton method without backtracking (2).
|
|
|
|
|
|
|
|
From the iterates plot and the gradient norm and objective function values
|
|
|
|
plots (respectively located in figure \ref{fig:3} and \ref{fig:4}) we can see
|
|
|
|
that BGFS behaves similarly to the Newton method with backtracking, loosely
|
|
|
|
following its curves. The only noteworthy difference lies in the energy
|
|
|
|
landscape plot, where BGFS occasionally ``steps back'' performing iterations in
|
|
|
|
the opposite direction of the minimizer. This behaviour can also be observed in
|
|
|
|
the plots in figure \ref{fig:4}, where several bumps and spikes are present in
|
|
|
|
the gradient norm plot and small plateaus can be found in the objective
|
|
|
|
function value plot.
|
2021-04-22 14:39:35 +00:00
|
|
|
\end{document}
|