hw3: ready for submission

2021-04-23 14:51:06 +02:00 · 2021-04-23 14:51:06 +02:00 · a2bf9591df
commit a2bf9591df
parent 421192b7e5
2 changed files with 83 additions and 16 deletions
--- a/Claudio_Maggioni_3/Claudio_Maggioni_3.pdf
+++ b/Claudio_Maggioni_3/Claudio_Maggioni_3.pdf
--- a/Claudio_Maggioni_3/Claudio_Maggioni_3.tex
+++ b/Claudio_Maggioni_3/Claudio_Maggioni_3.tex
@ -31,8 +31,7 @@
 \setlength{\parindent}{0cm}
 \setlength{\parskip}{0.5\baselineskip}

-\title{Optimization methods -- Homework 3}
-\author{Claudio Maggioni}
+\title{Optimization methods -- Homework 3} \author{Claudio Maggioni}

 \begin{document}

@ -42,18 +41,22 @@

 \subsection{Exercise 1.1}

-Please consult the MATLAB implementation in the files \texttt{Newton.m}, \texttt{GD.m}, and \texttt{backtracking.m}.
-Please note that, for this and subsequent exercises, the gradient descent method without backtracking activated uses a
-fixed $\alpha=1$ despite the indications on the assignment sheet. This was done in order to comply with the forum post
-on iCorsi found here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}.
+Please consult the MATLAB implementation in the files \texttt{Newton.m},
+\texttt{GD.m}, and \texttt{backtracking.m}.  Please note that, for this and
+subsequent exercises, the gradient descent method without backtracking
+activated uses a fixed $\alpha=1$ despite the indications on the assignment
+sheet. This was done in order to comply with the forum post on iCorsi found
+here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}.

 \subsection{Exercise 1.2}

-Please consult the MATLAB implementation in the file \texttt{main.m} in section 1.2.
+Please consult the MATLAB implementation in the file \texttt{main.m} in section
+1.2.

 \subsection{Exercise 1.3}

-Please find the requested plots in figure \ref{fig:1}. The code used to generate these plots can be found in section 1.3 of \texttt{main.m}.
+Please find the requested plots in figure \ref{fig:1}. The code used to
+generate these plots can be found in section 1.3 of \texttt{main.m}.

 \begin{figure}[h]
 	\begin{subfigure}{0.5\textwidth}
@ -69,7 +72,8 @@ Please find the requested plots in figure \ref{fig:1}. The code used to generate

 \subsection{Exercise 1.4}

-Please find the requested plots in figure \ref{fig:gsppn}. The code used to generate these plots can be found in section 1.4 of \texttt{main.m}.
+Please find the requested plots in figure \ref{fig:gsppn}. The code used to
+generate these plots can be found in section 1.4 of \texttt{main.m}.

 \begin{figure}[h]
 	\begin{subfigure}{0.45\textwidth}
@ -97,12 +101,44 @@ Please find the requested plots in figure \ref{fig:gsppn}. The code used to gene
 		\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-large.jpg}}
 		\caption{Objective function values}
 	\end{subfigure}
-	\caption{Gradient norms and objective function values (y-axes) w.r.t. iteration numbers (x-axis) for Newton and GD methods (y-axis is log scaled, points at $y=0$ not shown due to log scale)}\label{fig:gsppn}
+	\caption{Gradient norms and objective function values (y-axes) w.r.t.
+	iteration numbers (x-axis) for Newton and GD methods (y-axis is log
+	scaled, points at $y=0$ not shown due to log scale)}\label{fig:gsppn}
 \end{figure}

 \section{Exercise 1.5}

-TBD
+The best performing method for this very set of input data is the Newton method
+without backtracking, since it converges in only 2 iterations. The second best
+performing one is the Newton method with backtracking, with convergence in 12
+iterations. The gradient method achieves convergence with backtracking slowly
+with 21102 iterations, while with a fixed $\alpha=1$ the method diverges in a
+dozen of iterations resulting in catastrophic numerical instability leading to
+\texttt{x\_k = [NaN; NaN]}.
+
+Analyzing the movement in the energy landscape (figure \ref{fig:1}), the more
+``coordinated'' method
+in terms of direction of iterations steps appears to be the Netwton method with
+backtracking. Other than performing a higher number of iterations when compared
+with the classical variant, the method maintains its iteration directions
+approximately at a 45 degree angle from the x axis, roughly always pointing at
+the minimizer $x^*$. However, this movement strategy is definitely inefficient
+compared with the 2-step convergence achieved by Newton without backtracking,
+which follows a conjugate gradient like path finding in each step a component
+of the true minimizer. GD with backtracking instead follows an inefficient
+zig-zagging pattern with iterates in the vicinity of the Netwon + backtracking
+iterates. Finally, GD without backtracking quickly degenerates as it can be
+seen by the enlarged plot.
+
+When looking at gradient norms and objective function
+values (figure \ref{fig:gsppn}) over time, the degeneration of GD without
+backtracking and the inefficiency of GD with backtracking can clearly be seen.
+Newton with backtracking offers fairly smooth gradient norm and objective value
+curves with an exponential decreasing slope in both for the last 5-10
+iterations. Netwon without backtracking instead shoots at the first iteration
+at gradient $\nabla f(x_1) \approx 450$ and objective value $f(x_1) \approx
+100$, but quickly has both values decrease to 0 for its second iteration
+achieving convergence.

 \section{Exercise 2}

@ -112,11 +148,13 @@ Please consult the MATLAB implementation in the file \texttt{BGFS.m}.

 \subsection{Exercise 2.2}

-Please consult the MATLAB implementation in the file \texttt{main.m} in section 2.2.
+Please consult the MATLAB implementation in the file \texttt{main.m} in section
+2.2.

 \subsection{Exercise 2.3}

-Please find the requested plots in figure \ref{fig:3}. The code used to generate these plots can be found in section 2.3 of \texttt{main.m}.
+Please find the requested plots in figure \ref{fig:3}. The code used to
+generate these plots can be found in section 2.3 of \texttt{main.m}.

 \begin{figure}[h]
 	\centering
@ -126,7 +164,8 @@ Please find the requested plots in figure \ref{fig:3}. The code used to generate

 \subsection{Exercise 2.4}

-Please find the requested plots in figure \ref{fig:4}. The code used to generate these plots can be found in section 2.4 of \texttt{main.m}.
+Please find the requested plots in figure \ref{fig:4}. The code used to
+generate these plots can be found in section 2.4 of \texttt{main.m}.

 \begin{figure}[h]
 	\begin{subfigure}{0.5\textwidth}`
@ -137,11 +176,39 @@ Please find the requested plots in figure \ref{fig:4}. The code used to generate
 		\resizebox{\textwidth}{\textwidth}{\input{ex2-4-ys}}
 		\caption{Objective function values}
 	\end{subfigure}
-	\caption{Gradient norms and objective function values (y-axes) w.r.t. iteration numbers (x-axis) for BFGS method (y-axis is log scaled, points at $y=0$ not shown due to log scale)}\label{fig:4}
+	\caption{Gradient norms and objective function values (y-axes) w.r.t.
+	iteration numbers (x-axis) for BFGS method (y-axis is log scaled,
+	points at $y=0$ not shown due to log scale)}\label{fig:4}
 \end{figure}

 \subsection{Exercise 2.5}

-TBD
+The following table summarizes the number of iterations required by each method to achieve convergence:

+\begin{center}
+\begin{tabular}{c|c|c}
+	\textbf{Method} & \textbf{Backtracking} & \textbf{\# of iterations} \\\hline
+	Newton & No & 2 \\
+	Newton & Yes & 12 \\
+	BGFS & Yes & 26 \\
+	Gradient descent & Yes & 21102 \\
+	Gradient descent & No ($\alpha = 1$) & Diverges after 6 \\
+\end{tabular}
+\end{center}
+
+From the table above we can see that the BGFS
+method is in the same performance order of magnitude as the Newton method,
+albeit its number of iterations required to converge are more than double (26)
+of the ones of Newton with backtracking (12), and more than ten times of the
+ones required by the Newton method without backtracking (2).
+
+From the iterates plot and the gradient norm and objective function values
+plots (respectively located in figure \ref{fig:3} and \ref{fig:4}) we can see
+that BGFS behaves similarly to the Newton method with backtracking, loosely
+following its curves. The only noteworthy difference lies in the energy
+landscape plot, where BGFS occasionally ``steps back'' performing iterations in
+the opposite direction of the minimizer. This behaviour can also be observed in
+the plots in figure \ref{fig:4}, where several bumps and spikes are present in
+the gradient norm plot and small plateaus can be found in the objective
+function value plot.
 \end{document}