hw3: ready for submission
This commit is contained in:
parent
421192b7e5
commit
a2bf9591df
2 changed files with 83 additions and 16 deletions
Binary file not shown.
|
@ -31,8 +31,7 @@
|
||||||
\setlength{\parindent}{0cm}
|
\setlength{\parindent}{0cm}
|
||||||
\setlength{\parskip}{0.5\baselineskip}
|
\setlength{\parskip}{0.5\baselineskip}
|
||||||
|
|
||||||
\title{Optimization methods -- Homework 3}
|
\title{Optimization methods -- Homework 3} \author{Claudio Maggioni}
|
||||||
\author{Claudio Maggioni}
|
|
||||||
|
|
||||||
\begin{document}
|
\begin{document}
|
||||||
|
|
||||||
|
@ -42,18 +41,22 @@
|
||||||
|
|
||||||
\subsection{Exercise 1.1}
|
\subsection{Exercise 1.1}
|
||||||
|
|
||||||
Please consult the MATLAB implementation in the files \texttt{Newton.m}, \texttt{GD.m}, and \texttt{backtracking.m}.
|
Please consult the MATLAB implementation in the files \texttt{Newton.m},
|
||||||
Please note that, for this and subsequent exercises, the gradient descent method without backtracking activated uses a
|
\texttt{GD.m}, and \texttt{backtracking.m}. Please note that, for this and
|
||||||
fixed $\alpha=1$ despite the indications on the assignment sheet. This was done in order to comply with the forum post
|
subsequent exercises, the gradient descent method without backtracking
|
||||||
on iCorsi found here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}.
|
activated uses a fixed $\alpha=1$ despite the indications on the assignment
|
||||||
|
sheet. This was done in order to comply with the forum post on iCorsi found
|
||||||
|
here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}.
|
||||||
|
|
||||||
\subsection{Exercise 1.2}
|
\subsection{Exercise 1.2}
|
||||||
|
|
||||||
Please consult the MATLAB implementation in the file \texttt{main.m} in section 1.2.
|
Please consult the MATLAB implementation in the file \texttt{main.m} in section
|
||||||
|
1.2.
|
||||||
|
|
||||||
\subsection{Exercise 1.3}
|
\subsection{Exercise 1.3}
|
||||||
|
|
||||||
Please find the requested plots in figure \ref{fig:1}. The code used to generate these plots can be found in section 1.3 of \texttt{main.m}.
|
Please find the requested plots in figure \ref{fig:1}. The code used to
|
||||||
|
generate these plots can be found in section 1.3 of \texttt{main.m}.
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{subfigure}{0.5\textwidth}
|
\begin{subfigure}{0.5\textwidth}
|
||||||
|
@ -69,7 +72,8 @@ Please find the requested plots in figure \ref{fig:1}. The code used to generate
|
||||||
|
|
||||||
\subsection{Exercise 1.4}
|
\subsection{Exercise 1.4}
|
||||||
|
|
||||||
Please find the requested plots in figure \ref{fig:gsppn}. The code used to generate these plots can be found in section 1.4 of \texttt{main.m}.
|
Please find the requested plots in figure \ref{fig:gsppn}. The code used to
|
||||||
|
generate these plots can be found in section 1.4 of \texttt{main.m}.
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{subfigure}{0.45\textwidth}
|
\begin{subfigure}{0.45\textwidth}
|
||||||
|
@ -97,12 +101,44 @@ Please find the requested plots in figure \ref{fig:gsppn}. The code used to gene
|
||||||
\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-large.jpg}}
|
\resizebox{\textwidth}{\textwidth}{\includegraphics{1-4-ys-large.jpg}}
|
||||||
\caption{Objective function values}
|
\caption{Objective function values}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\caption{Gradient norms and objective function values (y-axes) w.r.t. iteration numbers (x-axis) for Newton and GD methods (y-axis is log scaled, points at $y=0$ not shown due to log scale)}\label{fig:gsppn}
|
\caption{Gradient norms and objective function values (y-axes) w.r.t.
|
||||||
|
iteration numbers (x-axis) for Newton and GD methods (y-axis is log
|
||||||
|
scaled, points at $y=0$ not shown due to log scale)}\label{fig:gsppn}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\section{Exercise 1.5}
|
\section{Exercise 1.5}
|
||||||
|
|
||||||
TBD
|
The best performing method for this very set of input data is the Newton method
|
||||||
|
without backtracking, since it converges in only 2 iterations. The second best
|
||||||
|
performing one is the Newton method with backtracking, with convergence in 12
|
||||||
|
iterations. The gradient method achieves convergence with backtracking slowly
|
||||||
|
with 21102 iterations, while with a fixed $\alpha=1$ the method diverges in a
|
||||||
|
dozen of iterations resulting in catastrophic numerical instability leading to
|
||||||
|
\texttt{x\_k = [NaN; NaN]}.
|
||||||
|
|
||||||
|
Analyzing the movement in the energy landscape (figure \ref{fig:1}), the more
|
||||||
|
``coordinated'' method
|
||||||
|
in terms of direction of iterations steps appears to be the Netwton method with
|
||||||
|
backtracking. Other than performing a higher number of iterations when compared
|
||||||
|
with the classical variant, the method maintains its iteration directions
|
||||||
|
approximately at a 45 degree angle from the x axis, roughly always pointing at
|
||||||
|
the minimizer $x^*$. However, this movement strategy is definitely inefficient
|
||||||
|
compared with the 2-step convergence achieved by Newton without backtracking,
|
||||||
|
which follows a conjugate gradient like path finding in each step a component
|
||||||
|
of the true minimizer. GD with backtracking instead follows an inefficient
|
||||||
|
zig-zagging pattern with iterates in the vicinity of the Netwon + backtracking
|
||||||
|
iterates. Finally, GD without backtracking quickly degenerates as it can be
|
||||||
|
seen by the enlarged plot.
|
||||||
|
|
||||||
|
When looking at gradient norms and objective function
|
||||||
|
values (figure \ref{fig:gsppn}) over time, the degeneration of GD without
|
||||||
|
backtracking and the inefficiency of GD with backtracking can clearly be seen.
|
||||||
|
Newton with backtracking offers fairly smooth gradient norm and objective value
|
||||||
|
curves with an exponential decreasing slope in both for the last 5-10
|
||||||
|
iterations. Netwon without backtracking instead shoots at the first iteration
|
||||||
|
at gradient $\nabla f(x_1) \approx 450$ and objective value $f(x_1) \approx
|
||||||
|
100$, but quickly has both values decrease to 0 for its second iteration
|
||||||
|
achieving convergence.
|
||||||
|
|
||||||
\section{Exercise 2}
|
\section{Exercise 2}
|
||||||
|
|
||||||
|
@ -112,11 +148,13 @@ Please consult the MATLAB implementation in the file \texttt{BGFS.m}.
|
||||||
|
|
||||||
\subsection{Exercise 2.2}
|
\subsection{Exercise 2.2}
|
||||||
|
|
||||||
Please consult the MATLAB implementation in the file \texttt{main.m} in section 2.2.
|
Please consult the MATLAB implementation in the file \texttt{main.m} in section
|
||||||
|
2.2.
|
||||||
|
|
||||||
\subsection{Exercise 2.3}
|
\subsection{Exercise 2.3}
|
||||||
|
|
||||||
Please find the requested plots in figure \ref{fig:3}. The code used to generate these plots can be found in section 2.3 of \texttt{main.m}.
|
Please find the requested plots in figure \ref{fig:3}. The code used to
|
||||||
|
generate these plots can be found in section 2.3 of \texttt{main.m}.
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
|
@ -126,7 +164,8 @@ Please find the requested plots in figure \ref{fig:3}. The code used to generate
|
||||||
|
|
||||||
\subsection{Exercise 2.4}
|
\subsection{Exercise 2.4}
|
||||||
|
|
||||||
Please find the requested plots in figure \ref{fig:4}. The code used to generate these plots can be found in section 2.4 of \texttt{main.m}.
|
Please find the requested plots in figure \ref{fig:4}. The code used to
|
||||||
|
generate these plots can be found in section 2.4 of \texttt{main.m}.
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\begin{subfigure}{0.5\textwidth}`
|
\begin{subfigure}{0.5\textwidth}`
|
||||||
|
@ -137,11 +176,39 @@ Please find the requested plots in figure \ref{fig:4}. The code used to generate
|
||||||
\resizebox{\textwidth}{\textwidth}{\input{ex2-4-ys}}
|
\resizebox{\textwidth}{\textwidth}{\input{ex2-4-ys}}
|
||||||
\caption{Objective function values}
|
\caption{Objective function values}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\caption{Gradient norms and objective function values (y-axes) w.r.t. iteration numbers (x-axis) for BFGS method (y-axis is log scaled, points at $y=0$ not shown due to log scale)}\label{fig:4}
|
\caption{Gradient norms and objective function values (y-axes) w.r.t.
|
||||||
|
iteration numbers (x-axis) for BFGS method (y-axis is log scaled,
|
||||||
|
points at $y=0$ not shown due to log scale)}\label{fig:4}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\subsection{Exercise 2.5}
|
\subsection{Exercise 2.5}
|
||||||
|
|
||||||
TBD
|
The following table summarizes the number of iterations required by each method to achieve convergence:
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\begin{tabular}{c|c|c}
|
||||||
|
\textbf{Method} & \textbf{Backtracking} & \textbf{\# of iterations} \\\hline
|
||||||
|
Newton & No & 2 \\
|
||||||
|
Newton & Yes & 12 \\
|
||||||
|
BGFS & Yes & 26 \\
|
||||||
|
Gradient descent & Yes & 21102 \\
|
||||||
|
Gradient descent & No ($\alpha = 1$) & Diverges after 6 \\
|
||||||
|
\end{tabular}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
From the table above we can see that the BGFS
|
||||||
|
method is in the same performance order of magnitude as the Newton method,
|
||||||
|
albeit its number of iterations required to converge are more than double (26)
|
||||||
|
of the ones of Newton with backtracking (12), and more than ten times of the
|
||||||
|
ones required by the Newton method without backtracking (2).
|
||||||
|
|
||||||
|
From the iterates plot and the gradient norm and objective function values
|
||||||
|
plots (respectively located in figure \ref{fig:3} and \ref{fig:4}) we can see
|
||||||
|
that BGFS behaves similarly to the Newton method with backtracking, loosely
|
||||||
|
following its curves. The only noteworthy difference lies in the energy
|
||||||
|
landscape plot, where BGFS occasionally ``steps back'' performing iterations in
|
||||||
|
the opposite direction of the minimizer. This behaviour can also be observed in
|
||||||
|
the plots in figure \ref{fig:4}, where several bumps and spikes are present in
|
||||||
|
the gradient norm plot and small plateaus can be found in the objective
|
||||||
|
function value plot.
|
||||||
\end{document}
|
\end{document}
|
||||||
|
|
Reference in a new issue