\documentclass{scrartcl}
\usepackage{pdfpages}
\usepackage[utf8]{inputenc}
\usepackage{float}
\usepackage{graphicx}
\usepackage[ruled,vlined]{algorithm2e}
\usepackage{subcaption}
\usepackage{hyperref}
\usepackage{amsmath}
\usepackage{pgfplots}
\pgfplotsset{compat=newest}
\usetikzlibrary{plotmarks}
\usetikzlibrary{arrows.meta}
\usepgfplotslibrary{patchplots}
\usepackage{grffile}
\usepackage{amsmath}
\usepackage{subcaption}
\usepgfplotslibrary{external}
\tikzexternalize
\usepackage[margin=2.5cm]{geometry}

% To compile:
% sed -i 's#title style={font=\\bfseries#title style={yshift=1ex, font=\\tiny\\bfseries#' *.tex
% luatex -enable-write18 -shellescape main.tex

\pgfplotsset{every x tick label/.append style={font=\tiny, yshift=0.5ex}}
\pgfplotsset{every title/.append style={font=\tiny, align=center}}
\pgfplotsset{every y tick label/.append style={font=\tiny, xshift=0.5ex}}
\pgfplotsset{every z tick label/.append style={font=\tiny, xshift=0.5ex}}

\setlength{\parindent}{0cm}
\setlength{\parskip}{0.5\baselineskip}

\title{Optimization methods -- Homework 3} \author{Claudio Maggioni}

\begin{document}

\maketitle

\section{Exercise 1}

\subsection{Exercise 1.1}

Please consult the MATLAB implementation in the files \texttt{Newton.m},
\texttt{GD.m}, and \texttt{backtracking.m}.  Please note that, for this and
subsequent exercises, the gradient descent method without backtracking
activated uses a fixed $\alpha=1$ despite the indications on the assignment
sheet. This was done in order to comply with the forum post on iCorsi found
here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}.

\subsection{Exercise 1.2}

Please consult the MATLAB implementation in the file \texttt{main.m} in section
1.2.

\subsection{Exercise 1.3}

Please find the requested plots in figure \ref{fig:1}. The code used to
generate these plots can be found in section 1.3 of \texttt{main.m}.

\begin{figure}[h]
	\begin{subfigure}{0.5\textwidth}
		\resizebox{\textwidth}{\textwidth}{\includegraphics{ex1-3.jpg}}
		\caption{Zoomed plot on $x = (-1,1)$ and $y = (-1,1)$}
	\end{subfigure}
	\begin{subfigure}{0.5\textwidth}
		\resizebox{\textwidth}{\textwidth}{\input{ex1-3-gd}}
		\caption{Complete plot (the blue line is GD with $\alpha = 1$)}
	\end{subfigure}
	\caption{Steps in the energy landscape for Newton and GD methods}\label{fig:1}
\end{figure}

\subsection{Exercise 1.4}

Please find the requested plots in figure \ref{fig:gsppn}. The code used to
generate these plots can be found in section 1.4 of \texttt{main.m}.

\begin{figure}[h]
	\begin{subfigure}{0.45\textwidth}
		\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad-nonlog.jpg}}
		\caption{Gradient norms \\(zoomed, y axis is linear for this plot)}
	\end{subfigure}
	\begin{subfigure}{0.45\textwidth}
		\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-nonlog.jpg}}
		\caption{Objective function values \\(zoomed, y axis is linear for this plot)}
	\end{subfigure}
	\begin{subfigure}{0.45\textwidth}
		\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad.jpg}}
		\caption{Gradient norms (zoomed)}
	\end{subfigure}
	\begin{subfigure}{0.45\textwidth}
		\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys.jpg}}
		\caption{Objective function values (zoomed)}
	\end{subfigure}
	\begin{subfigure}{0.45\textwidth}
		\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad-large.jpg}}
		\caption{Gradient norms}
	\end{subfigure}
	\hfill
	\begin{subfigure}{0.45\textwidth}
		\centering
		\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-large.png}}
		\caption{Objective function values}
	\end{subfigure}
	\caption{Gradient norms and objective function values (y-axes) w.r.t.
	iteration numbers (x-axis) for Newton and GD methods (y-axis is log
	scaled, points at $y=0$ not shown due to log scale)}\label{fig:gsppn}
\end{figure}

\section{Exercise 1.5}

The best performing method for this very set of input data is the Newton method
without backtracking, since it converges in only 2 iterations. The second best
performing one is the Newton method with backtracking, with convergence in 12
iterations. The gradient method achieves convergence with backtracking slowly
with 21102 iterations, while with a fixed $\alpha=1$ the method diverges in a
dozen of iterations resulting in catastrophic numerical instability leading to
\texttt{x\_k = [NaN; NaN]}.

Analyzing the movement in the energy landscape (figure \ref{fig:1}), the more
``coordinated'' method
in terms of direction of iterations steps appears to be the Netwton method with
backtracking. Other than performing a higher number of iterations when compared
with the classical variant, the method maintains its iteration directions
approximately at a 45 degree angle from the x axis, roughly always pointing at
the minimizer $x^*$. However, this movement strategy is definitely inefficient
compared with the 2-step convergence achieved by Newton without backtracking,
which follows a conjugate gradient like path finding in each step a component
of the true minimizer. GD with backtracking instead follows an inefficient
zig-zagging pattern with iterates in the vicinity of the Netwon + backtracking
iterates. Finally, GD without backtracking quickly degenerates as it can be
seen by the enlarged plot.

When looking at gradient norms and objective function
values (figure \ref{fig:gsppn}) over time, the degeneration of GD without
backtracking and the inefficiency of GD with backtracking can clearly be seen.
Newton with backtracking offers fairly smooth gradient norm and objective value
curves with an exponential decreasing slope in both for the last 5-10
iterations. Netwon without backtracking instead shoots at the first iteration
at gradient $\nabla f(x_1) \approx 450$ and objective value $f(x_1) \approx
100$, but quickly has both values decrease to 0 for its second iteration
achieving convergence.

\section{Exercise 2}

\subsection{Exercise 2.1}

Please consult the MATLAB implementation in the file \texttt{BGFS.m}.

\subsection{Exercise 2.2}

Please consult the MATLAB implementation in the file \texttt{main.m} in section
2.2.

\subsection{Exercise 2.3}

Please find the requested plots in figure \ref{fig:3}. The code used to
generate these plots can be found in section 2.3 of \texttt{main.m}.

\begin{figure}[h]
	\centering
	\resizebox{.6\textwidth}{.6\textwidth}{\input{ex2-3}}
	\caption{Steps in the energy landscape for BGFS method}\label{fig:3}
\end{figure}

\subsection{Exercise 2.4}

Please find the requested plots in figure \ref{fig:4}. The code used to
generate these plots can be found in section 2.4 of \texttt{main.m}.

\begin{figure}[h]
	\begin{subfigure}{0.5\textwidth}`
		\resizebox{\textwidth}{\textwidth}{\input{ex2-4-grad}}
		\caption{Gradient norms}
	\end{subfigure}
	\begin{subfigure}{0.5\textwidth}
		\resizebox{\textwidth}{\textwidth}{\input{ex2-4-ys}}
		\caption{Objective function values}
	\end{subfigure}
	\caption{Gradient norms and objective function values (y-axes) w.r.t.
	iteration numbers (x-axis) for BFGS method (y-axis is log scaled,
	points at $y=0$ not shown due to log scale)}\label{fig:4}
\end{figure}

\subsection{Exercise 2.5}

The following table summarizes the number of iterations required by each method to achieve convergence:

\begin{center}
\begin{tabular}{c|c|c}
	\textbf{Method} & \textbf{Backtracking} & \textbf{\# of iterations} \\\hline
	Newton & No & 2 \\
	Newton & Yes & 12 \\
	BGFS & Yes & 26 \\
	Gradient descent & Yes & 21102 \\
	Gradient descent & No ($\alpha = 1$) & Diverges after 6 \\
\end{tabular}
\end{center}

From the table above we can see that the BGFS
method is in the same performance order of magnitude as the Newton method,
albeit its number of iterations required to converge are more than double (26)
of the ones of Newton with backtracking (12), and more than ten times of the
ones required by the Newton method without backtracking (2).

From the iterates plot and the gradient norm and objective function values
plots (respectively located in figure \ref{fig:3} and \ref{fig:4}) we can see
that BGFS behaves similarly to the Newton method with backtracking, loosely
following its curves. The only noteworthy difference lies in the energy
landscape plot, where BGFS occasionally ``steps back'' performing iterations in
the opposite direction of the minimizer. This behaviour can also be observed in
the plots in figure \ref{fig:4}, where several bumps and spikes are present in
the gradient norm plot and small plateaus can be found in the objective
function value plot.
\end{document}