hw3: ready for submission

2021-04-23 14:51:06 +02:00 · 2021-04-23 14:51:06 +02:00 · a2bf9591df
commit a2bf9591df
parent 421192b7e5
2 changed files with 83 additions and 16 deletions
--- a/Claudio_Maggioni_3/Claudio_Maggioni_3.pdf
+++ b/Claudio_Maggioni_3/Claudio_Maggioni_3.pdf
--- a/Claudio_Maggioni_3/Claudio_Maggioni_3.tex
+++ b/Claudio_Maggioni_3/Claudio_Maggioni_3.tex
@ -31,8 +31,7 @@
 \setlength{\parindent}{0cm}
 \setlength{\parskip}{0.5\baselineskip}
-\title{Optimization methods -- Homework 3}
+\title{Optimization methods -- Homework 3} \author{Claudio Maggioni}
 \author{Claudio Maggioni}
 \begin{document}
@ -42,18 +41,22 @@
 \subsection{Exercise 1.1}
-Please consult the MATLAB implementation in the files \texttt{Newton.m}, \texttt{GD.m}, and \texttt{backtracking.m}.
+Please consult the MATLAB implementation in the files \texttt{Newton.m},
-Please note that, for this and subsequent exercises, the gradient descent method without backtracking activated uses a
+\texttt{GD.m}, and \texttt{backtracking.m}.  Please note that, for this and
-fixed $\alpha=1$ despite the indications on the assignment sheet. This was done in order to comply with the forum post
+subsequent exercises, the gradient descent method without backtracking
-on iCorsi found here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}.
+activated uses a fixed $\alpha=1$ despite the indications on the assignment
 sheet. This was done in order to comply with the forum post on iCorsi found
 here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}.
 \subsection{Exercise 1.2}
-Please consult the MATLAB implementation in the file \texttt{main.m} in section 1.2.
+Please consult the MATLAB implementation in the file \texttt{main.m} in section
 1.2.
 \subsection{Exercise 1.3}
-Please find the requested plots in figure \ref{fig:1}. The code used to generate these plots can be found in section 1.3 of \texttt{main.m}.
+Please find the requested plots in figure \ref{fig:1}. The code used to
 generate these plots can be found in section 1.3 of \texttt{main.m}.
 \begin{figure}[h]
 	\begin{subfigure}{0.5\textwidth}
@ -69,7 +72,8 @@ Please find the requested plots in figure \ref{fig:1}. The code used to generate
 \subsection{Exercise 1.4}
-Please find the requested plots in figure \ref{fig:gsppn}. The code used to generate these plots can be found in section 1.4 of \texttt{main.m}.
+Please find the requested plots in figure \ref{fig:gsppn}. The code used to
 generate these plots can be found in section 1.4 of \texttt{main.m}.
 \begin{figure}[h]
 	\begin{subfigure}{0.45\textwidth}
@ -97,12 +101,44 @@ Please find the requested plots in figure \ref{fig:gsppn}. The code used to gene
 		\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-large.jpg}}
 		\caption{Objective function values}
 	\end{subfigure}
-	\caption{Gradient norms and objective function values (y-axes) w.r.t. iteration numbers (x-axis) for Newton and GD methods (y-axis is log scaled, points at $y=0$ not shown due to log scale)}\label{fig:gsppn}
+	\caption{Gradient norms and objective function values (y-axes) w.r.t.
 	iteration numbers (x-axis) for Newton and GD methods (y-axis is log
 	scaled, points at $y=0$ not shown due to log scale)}\label{fig:gsppn}
 \end{figure}
 \section{Exercise 1.5}
-TBD
+The best performing method for this very set of input data is the Newton method
 without backtracking, since it converges in only 2 iterations. The second best
 performing one is the Newton method with backtracking, with convergence in 12
 iterations. The gradient method achieves convergence with backtracking slowly
 with 21102 iterations, while with a fixed $\alpha=1$ the method diverges in a
 dozen of iterations resulting in catastrophic numerical instability leading to
 \texttt{x\_k = [NaN; NaN]}.
 Analyzing the movement in the energy landscape (figure \ref{fig:1}), the more
 ``coordinated'' method
 in terms of direction of iterations steps appears to be the Netwton method with
 backtracking. Other than performing a higher number of iterations when compared
 with the classical variant, the method maintains its iteration directions
 approximately at a 45 degree angle from the x axis, roughly always pointing at
 the minimizer $x^*$. However, this movement strategy is definitely inefficient
 compared with the 2-step convergence achieved by Newton without backtracking,
 which follows a conjugate gradient like path finding in each step a component
 of the true minimizer. GD with backtracking instead follows an inefficient
 zig-zagging pattern with iterates in the vicinity of the Netwon + backtracking
 iterates. Finally, GD without backtracking quickly degenerates as it can be
 seen by the enlarged plot.
 When looking at gradient norms and objective function
 values (figure \ref{fig:gsppn}) over time, the degeneration of GD without
 backtracking and the inefficiency of GD with backtracking can clearly be seen.
 Newton with backtracking offers fairly smooth gradient norm and objective value
 curves with an exponential decreasing slope in both for the last 5-10
 iterations. Netwon without backtracking instead shoots at the first iteration
 at gradient $\nabla f(x_1) \approx 450$ and objective value $f(x_1) \approx
 100$, but quickly has both values decrease to 0 for its second iteration
 achieving convergence.
 \section{Exercise 2}
@ -112,11 +148,13 @@ Please consult the MATLAB implementation in the file \texttt{BGFS.m}.
 \subsection{Exercise 2.2}
-Please consult the MATLAB implementation in the file \texttt{main.m} in section 2.2.
+Please consult the MATLAB implementation in the file \texttt{main.m} in section
 2.2.
 \subsection{Exercise 2.3}
-Please find the requested plots in figure \ref{fig:3}. The code used to generate these plots can be found in section 2.3 of \texttt{main.m}.
+Please find the requested plots in figure \ref{fig:3}. The code used to
 generate these plots can be found in section 2.3 of \texttt{main.m}.
 \begin{figure}[h]
 	\centering
@ -126,7 +164,8 @@ Please find the requested plots in figure \ref{fig:3}. The code used to generate
 \subsection{Exercise 2.4}
-Please find the requested plots in figure \ref{fig:4}. The code used to generate these plots can be found in section 2.4 of \texttt{main.m}.
+Please find the requested plots in figure \ref{fig:4}. The code used to
 generate these plots can be found in section 2.4 of \texttt{main.m}.
 \begin{figure}[h]
 	\begin{subfigure}{0.5\textwidth}`
@ -137,11 +176,39 @@ Please find the requested plots in figure \ref{fig:4}. The code used to generate
 		\resizebox{\textwidth}{\textwidth}{\input{ex2-4-ys}}
 		\caption{Objective function values}
 	\end{subfigure}
-	\caption{Gradient norms and objective function values (y-axes) w.r.t. iteration numbers (x-axis) for BFGS method (y-axis is log scaled, points at $y=0$ not shown due to log scale)}\label{fig:4}
+	\caption{Gradient norms and objective function values (y-axes) w.r.t.
 	iteration numbers (x-axis) for BFGS method (y-axis is log scaled,
 	points at $y=0$ not shown due to log scale)}\label{fig:4}
 \end{figure}
 \subsection{Exercise 2.5}
-TBD
+The following table summarizes the number of iterations required by each method to achieve convergence:
 \begin{center}
 \begin{tabular}{c|c|c}
 	\textbf{Method} & \textbf{Backtracking} & \textbf{\# of iterations} \\\hline
 	Newton & No & 2 \\
 	Newton & Yes & 12 \\
 	BGFS & Yes & 26 \\
 	Gradient descent & Yes & 21102 \\
 	Gradient descent & No ($\alpha = 1$) & Diverges after 6 \\
 \end{tabular}
 \end{center}
 From the table above we can see that the BGFS
 method is in the same performance order of magnitude as the Newton method,
 albeit its number of iterations required to converge are more than double (26)
 of the ones of Newton with backtracking (12), and more than ten times of the
 ones required by the Newton method without backtracking (2).
 From the iterates plot and the gradient norm and objective function values
 plots (respectively located in figure \ref{fig:3} and \ref{fig:4}) we can see
 that BGFS behaves similarly to the Newton method with backtracking, loosely
 following its curves. The only noteworthy difference lies in the energy
 landscape plot, where BGFS occasionally ``steps back'' performing iterations in
 the opposite direction of the minimizer. This behaviour can also be observed in
 the plots in figure \ref{fig:4}, where several bumps and spikes are present in
 the gradient norm plot and small plateaus can be found in the objective
 function value plot.
 \end{document}