\documentclass{scrartcl} \usepackage{pdfpages} \usepackage[utf8]{inputenc} \usepackage{float} \usepackage{graphicx} \usepackage[ruled,vlined]{algorithm2e} \usepackage{subcaption} \usepackage{hyperref} \usepackage{amsmath} \usepackage{pgfplots} \pgfplotsset{compat=newest} \usetikzlibrary{plotmarks} \usetikzlibrary{arrows.meta} \usepgfplotslibrary{patchplots} \usepackage{grffile} \usepackage{amsmath} \usepackage{subcaption} \usepgfplotslibrary{external} \tikzexternalize \usepackage[margin=2.5cm]{geometry} % To compile: % sed -i 's#title style={font=\\bfseries#title style={yshift=1ex, font=\\tiny\\bfseries#' *.tex % luatex -enable-write18 -shellescape main.tex \pgfplotsset{every x tick label/.append style={font=\tiny, yshift=0.5ex}} \pgfplotsset{every title/.append style={font=\tiny, align=center}} \pgfplotsset{every y tick label/.append style={font=\tiny, xshift=0.5ex}} \pgfplotsset{every z tick label/.append style={font=\tiny, xshift=0.5ex}} \setlength{\parindent}{0cm} \setlength{\parskip}{0.5\baselineskip} \title{Optimization methods -- Homework 3} \author{Claudio Maggioni} \begin{document} \maketitle \section{Exercise 1} \subsection{Exercise 1.1} Please consult the MATLAB implementation in the files \texttt{Newton.m}, \texttt{GD.m}, and \texttt{backtracking.m}. Please note that, for this and subsequent exercises, the gradient descent method without backtracking activated uses a fixed $\alpha=1$ despite the indications on the assignment sheet. This was done in order to comply with the forum post on iCorsi found here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}. \subsection{Exercise 1.2} Please consult the MATLAB implementation in the file \texttt{main.m} in section 1.2. \subsection{Exercise 1.3} Please find the requested plots in figure \ref{fig:1}. The code used to generate these plots can be found in section 1.3 of \texttt{main.m}. \begin{figure}[h] \begin{subfigure}{0.5\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{ex1-3.jpg}} \caption{Zoomed plot on $x = (-1,1)$ and $y = (-1,1)$} \end{subfigure} \begin{subfigure}{0.5\textwidth} \resizebox{\textwidth}{\textwidth}{\input{ex1-3-gd}} \caption{Complete plot (the blue line is GD with $\alpha = 1$)} \end{subfigure} \caption{Steps in the energy landscape for Newton and GD methods}\label{fig:1} \end{figure} \subsection{Exercise 1.4} Please find the requested plots in figure \ref{fig:gsppn}. The code used to generate these plots can be found in section 1.4 of \texttt{main.m}. \begin{figure}[h] \begin{subfigure}{0.45\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad-nonlog.jpg}} \caption{Gradient norms \\(zoomed, y axis is linear for this plot)} \end{subfigure} \begin{subfigure}{0.45\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-nonlog.jpg}} \caption{Objective function values \\(zoomed, y axis is linear for this plot)} \end{subfigure} \begin{subfigure}{0.45\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad.jpg}} \caption{Gradient norms (zoomed)} \end{subfigure} \begin{subfigure}{0.45\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys.jpg}} \caption{Objective function values (zoomed)} \end{subfigure} \begin{subfigure}{0.45\textwidth} \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-grad-large.jpg}} \caption{Gradient norms} \end{subfigure} \hfill \begin{subfigure}{0.45\textwidth} \centering \resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-large.png}} \caption{Objective function values} \end{subfigure} \caption{Gradient norms and objective function values (y-axes) w.r.t. iteration numbers (x-axis) for Newton and GD methods (y-axis is log scaled, points at $y=0$ not shown due to log scale)}\label{fig:gsppn} \end{figure} \section{Exercise 1.5} The best performing method for this very set of input data is the Newton method without backtracking, since it converges in only 2 iterations. The second best performing one is the Newton method with backtracking, with convergence in 12 iterations. The gradient method achieves convergence with backtracking slowly with 21102 iterations, while with a fixed $\alpha=1$ the method diverges in a dozen of iterations resulting in catastrophic numerical instability leading to \texttt{x\_k = [NaN; NaN]}. Analyzing the movement in the energy landscape (figure \ref{fig:1}), the more ``coordinated'' method in terms of direction of iterations steps appears to be the Netwton method with backtracking. Other than performing a higher number of iterations when compared with the classical variant, the method maintains its iteration directions approximately at a 45 degree angle from the x axis, roughly always pointing at the minimizer $x^*$. However, this movement strategy is definitely inefficient compared with the 2-step convergence achieved by Newton without backtracking, which follows a conjugate gradient like path finding in each step a component of the true minimizer. GD with backtracking instead follows an inefficient zig-zagging pattern with iterates in the vicinity of the Netwon + backtracking iterates. Finally, GD without backtracking quickly degenerates as it can be seen by the enlarged plot. When looking at gradient norms and objective function values (figure \ref{fig:gsppn}) over time, the degeneration of GD without backtracking and the inefficiency of GD with backtracking can clearly be seen. Newton with backtracking offers fairly smooth gradient norm and objective value curves with an exponential decreasing slope in both for the last 5-10 iterations. Netwon without backtracking instead shoots at the first iteration at gradient $\nabla f(x_1) \approx 450$ and objective value $f(x_1) \approx 100$, but quickly has both values decrease to 0 for its second iteration achieving convergence. \section{Exercise 2} \subsection{Exercise 2.1} Please consult the MATLAB implementation in the file \texttt{BGFS.m}. \subsection{Exercise 2.2} Please consult the MATLAB implementation in the file \texttt{main.m} in section 2.2. \subsection{Exercise 2.3} Please find the requested plots in figure \ref{fig:3}. The code used to generate these plots can be found in section 2.3 of \texttt{main.m}. \begin{figure}[h] \centering \resizebox{.6\textwidth}{.6\textwidth}{\input{ex2-3}} \caption{Steps in the energy landscape for BGFS method}\label{fig:3} \end{figure} \subsection{Exercise 2.4} Please find the requested plots in figure \ref{fig:4}. The code used to generate these plots can be found in section 2.4 of \texttt{main.m}. \begin{figure}[h] \begin{subfigure}{0.5\textwidth}` \resizebox{\textwidth}{\textwidth}{\input{ex2-4-grad}} \caption{Gradient norms} \end{subfigure} \begin{subfigure}{0.5\textwidth} \resizebox{\textwidth}{\textwidth}{\input{ex2-4-ys}} \caption{Objective function values} \end{subfigure} \caption{Gradient norms and objective function values (y-axes) w.r.t. iteration numbers (x-axis) for BFGS method (y-axis is log scaled, points at $y=0$ not shown due to log scale)}\label{fig:4} \end{figure} \subsection{Exercise 2.5} The following table summarizes the number of iterations required by each method to achieve convergence: \begin{center} \begin{tabular}{c|c|c} \textbf{Method} & \textbf{Backtracking} & \textbf{\# of iterations} \\\hline Newton & No & 2 \\ Newton & Yes & 12 \\ BGFS & Yes & 26 \\ Gradient descent & Yes & 21102 \\ Gradient descent & No ($\alpha = 1$) & Diverges after 6 \\ \end{tabular} \end{center} From the table above we can see that the BGFS method is in the same performance order of magnitude as the Newton method, albeit its number of iterations required to converge are more than double (26) of the ones of Newton with backtracking (12), and more than ten times of the ones required by the Newton method without backtracking (2). From the iterates plot and the gradient norm and objective function values plots (respectively located in figure \ref{fig:3} and \ref{fig:4}) we can see that BGFS behaves similarly to the Newton method with backtracking, loosely following its curves. The only noteworthy difference lies in the energy landscape plot, where BGFS occasionally ``steps back'' performing iterations in the opposite direction of the minimizer. This behaviour can also be observed in the plots in figure \ref{fig:4}, where several bumps and spikes are present in the gradient norm plot and small plateaus can be found in the objective function value plot. \end{document}