\documentclass{scrartcl} \usepackage{pdfpages} \usepackage[utf8]{inputenc} \usepackage{float} \usepackage{graphicx} \usepackage[ruled,vlined]{algorithm2e} \usepackage{subcaption} \usepackage{hyperref} \usepackage{amsmath} \usepackage{pgfplots} \pgfplotsset{compat=newest} \usetikzlibrary{plotmarks} \usetikzlibrary{arrows.meta} \usepgfplotslibrary{patchplots} \usepackage{grffile} \usepackage{amsmath} \usepackage{subcaption} \usepgfplotslibrary{external} \tikzexternalize \usepackage[margin=2.5cm]{geometry} % To compile: % sed -i 's#title style={font=\\bfseries#title style={yshift=1ex, font=\\tiny\\bfseries#' *.tex % luatex -enable-write18 -shellescape main.tex \pgfplotsset{every x tick label/.append style={font=\tiny, yshift=0.5ex}} \pgfplotsset{every title/.append style={font=\tiny, align=center}} \pgfplotsset{every y tick label/.append style={font=\tiny, xshift=0.5ex}} \pgfplotsset{every z tick label/.append style={font=\tiny, xshift=0.5ex}} \setlength{\parindent}{0cm} \setlength{\parskip}{0.5\baselineskip} \title{Optimization methods -- Homework 3} \author{Claudio Maggioni} \begin{document} \maketitle \section{Exercise 1} \subsection{Exercise 1.1} Please consult the MATLAB implementation in the files \texttt{Newton.m}, \texttt{GD.m}, and \texttt{backtracking.m}. Please note that, for this and subsequent exercises, the gradient descent method without backtracking activated uses a fixed $\alpha=1$ despite the indications on the assignment sheet. This was done in order to comply with the forum post on iCorsi found here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}. Here is a plot of the Rosenbrock function in 3d, with our starting point in red ($(0,0)$), and the true minimizer in black ($(1,1)$): \begin{center} \resizebox{0.6\textwidth}{0.6\textwidth}{\includegraphics{rosenb.jpg}} \end{center} \subsection{Exercise 1.2} Please consult the MATLAB implementation in the file \texttt{main.m} in section 1.2. \subsection{Exercise 1.3} Please find the requested plots in figure \ref{fig:1}. The code used to generate these plots can be found in section 1.3 of \texttt{main.m}. \begin{figure}[h] \begin{subfigure}{0.5\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{ex1-3.jpg}} \caption{Zoomed plot on $x = (-1,1)$ and $y = (-1,1)$} \end{subfigure} \begin{subfigure}{0.5\textwidth} \resizebox{\textwidth}{\textwidth}{\input{ex1-3-gd}} \caption{Complete plot (the blue line is GD with $\alpha = 1$)} \end{subfigure} \caption{Steps in the energy landscape for Newton and GD methods}\label{fig:1} \end{figure} \subsection{Exercise 1.4} Please find the requested plots in figure \ref{fig:gsppn}. The code used to generate these plots can be found in section 1.4 of \texttt{main.m}. \begin{figure}[h] \begin{subfigure}{0.45\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad-nonlog.jpg}} \caption{Gradient norms \\(zoomed, y axis is linear for this plot)} \end{subfigure} \begin{subfigure}{0.45\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-nonlog.jpg}} \caption{Objective function values \\(zoomed, y axis is linear for this plot)} \end{subfigure} \begin{subfigure}{0.45\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad.jpg}} \caption{Gradient norms (zoomed)} \end{subfigure} \begin{subfigure}{0.45\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys.jpg}} \caption{Objective function values (zoomed)} \end{subfigure} \begin{subfigure}{0.45\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad-large.jpg}} \caption{Gradient norms} \end{subfigure} \hfill \begin{subfigure}{0.45\textwidth} \centering \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-large.png}} \caption{Objective function values} \end{subfigure} \caption{Gradient norms and objective function values (y-axes) w.r.t. iteration numbers (x-axis) for Newton and GD methods (y-axis is log scaled, points at $y=0$ not shown due to log scale)}\label{fig:gsppn} \end{figure} \section{Exercise 1.5} The best performing method for this very set of input data is the Newton method without backtracking, since it converges in only 2 iterations. The second best performing one is the Newton method with backtracking, with convergence in 12 iterations. The gradient method achieves convergence with backtracking slowly with 21102 iterations, while with a fixed $\alpha=1$ the method diverges in a dozen of iterations resulting in catastrophic numerical instability leading to \texttt{x\_k = [NaN; NaN]}. Analyzing the movement in the energy landscape (figure \ref{fig:1}), the more ``coordinated'' method in terms of direction of iterations steps appears to be the Netwton method with backtracking. Other than performing a higher number of iterations when compared with the classical variant, the method maintains its iteration directions approximately at a 45 degree angle from the x axis, roughly always pointing at the minimizer $x^*$. However, this movement strategy is definitely inefficient compared with the 2-step convergence achieved by Newton without backtracking, which follows a conjugate gradient like path finding in each step a component of the true minimizer. GD with backtracking instead follows an inefficient zig-zagging pattern with iterates in the vicinity of the Netwon + backtracking iterates. Finally, GD without backtracking quickly degenerates as it can be seen by the enlarged plot. From the initial plot in the Rosenbrock's function, we can see why algorithms using backtracking follow roughly the $(0,0) - (1,1)$ diagonal: since this region is effectively a ``valley'' in the 3D energy landscape, due to the nature of the backtracking methods and their strict adherence of the Wolfe conditions these methods avoid wild ``climbs'' (unlike the Netwon method without backtracking) and instead finding iterates with either sufficient decrease or a not to high increase. When looking at gradient norms and objective function values (figure \ref{fig:gsppn}) over time, the degeneration of GD without backtracking and the inefficiency of GD with backtracking can clearly be seen. Newton with backtracking offers fairly smooth gradient norm and objective value curves with an exponential decreasing slope in both for the last 5-10 iterations. Netwon without backtracking instead shoots at the first iteration at gradient $\nabla f(x_1) \approx 450$ and objective value $f(x_1) \approx 100$, but quickly has both values decrease to 0 for its second iteration achieving convergence. What has been observed in this assignment matches with the theory behind the methods. Since the Newton method has quadratic convergence and uses quadratic information (i.e. using the hessian in the step direction calculation) the number of iterations required to find the minimizer when already close to it (as we are with $x_0 = [0;0]$) is significantly less than the ones required for linear methods, like gradient descent. However, it must be said that for an objective with an high number of dimensions a single iteration of a quadratic method is significantly more costly than a single iteration of a linear method due to the quadratically growing number of cells in the hessian matrix, which makes it harder and harder to compute as the number of dimensions increase. \section{Exercise 2} \subsection{Exercise 2.1} Please consult the MATLAB implementation in the file \texttt{BGFS.m}. \subsection{Exercise 2.2} Please consult the MATLAB implementation in the file \texttt{main.m} in section 2.2. \subsection{Exercise 2.3} Please find the requested plots in figure \ref{fig:3}. The code used to generate these plots can be found in section 2.3 of \texttt{main.m}. \begin{figure}[h] \centering \resizebox{.6\textwidth}{.6\textwidth}{\input{ex2-3}} \caption{Steps in the energy landscape for BGFS method}\label{fig:3} \end{figure} \subsection{Exercise 2.4} Please find the requested plots in figure \ref{fig:4}. The code used to generate these plots can be found in section 2.4 of \texttt{main.m}. \begin{figure}[h] \begin{subfigure}{0.5\textwidth}` \resizebox{\textwidth}{\textwidth}{\input{ex2-4-grad}} \caption{Gradient norms} \end{subfigure} \begin{subfigure}{0.5\textwidth} \resizebox{\textwidth}{\textwidth}{\input{ex2-4-ys}} \caption{Objective function values} \end{subfigure} \caption{Gradient norms and objective function values (y-axes) w.r.t. iteration numbers (x-axis) for BFGS method (y-axis is log scaled, points at $y=0$ not shown due to log scale)}\label{fig:4} \end{figure} \subsection{Exercise 2.5} The following table summarizes the number of iterations required by each method to achieve convergence: \begin{center} \begin{tabular}{c|c|c} \textbf{Method} & \textbf{Backtracking} & \textbf{\# of iterations} \\\hline Newton & No & 2 \\ Newton & Yes & 12 \\ BGFS & Yes & 26 \\ Gradient descent & Yes & 21102 \\ Gradient descent & No ($\alpha = 1$) & Diverges after 6 \\ \end{tabular} \end{center} From the table above we can see that the BGFS method is in the same performance order of magnitude as the Newton method, albeit its number of iterations required to converge are more than double (26) of the ones of Newton with backtracking (12), and more than ten times of the ones required by the Newton method without backtracking (2). From the iterates plot and the gradient norm and objective function values plots (respectively located in figure \ref{fig:3} and \ref{fig:4}) we can see that BGFS behaves similarly to the Newton method with backtracking, loosely following its curves. The only noteworthy difference lies in the energy landscape plot, where BGFS occasionally ``steps back'' performing iterations in the opposite direction of the minimizer. This behaviour can also be observed in the plots in figure \ref{fig:4}, where several bumps and spikes are present in the gradient norm plot and small plateaus can be found in the objective function value plot. Comparing these results with the theory behind BFGS, we can say the results that have been obtained fall within what we expect from the theory. Since BFGS is a superlinear but not quadratic method of convergence, its ``speed'' in terms of number of iterations falls within linear methods (like GD) and quadratic methods (like Newton). \end{document}