midterm: done 1
This commit is contained in:
parent
0220738cc6
commit
87746f945f
7 changed files with 137 additions and 182 deletions
Binary file not shown.
|
@ -48,6 +48,13 @@ activated uses a fixed $\alpha=1$ despite the indications on the assignment
|
|||
sheet. This was done in order to comply with the forum post on iCorsi found
|
||||
here: \url{https://www.icorsi.ch/mod/forum/discuss.php?d=81144}.
|
||||
|
||||
Here is a plot of the Rosenbrock function in 3d, with our starting point in red ($(0,0)$),
|
||||
and the true minimizer in black ($(1,1)$):
|
||||
|
||||
\begin{center}
|
||||
\resizebox{0.6\textwidth}{0.6\textwidth}{\includegraphics{rosenb.jpg}}
|
||||
\end{center}
|
||||
|
||||
\subsection{Exercise 1.2}
|
||||
|
||||
Please consult the MATLAB implementation in the file \texttt{main.m} in section
|
||||
|
@ -131,6 +138,12 @@ zig-zagging pattern with iterates in the vicinity of the Netwon + backtracking
|
|||
iterates. Finally, GD without backtracking quickly degenerates as it can be
|
||||
seen by the enlarged plot.
|
||||
|
||||
From the initial plot in the Rosenbrock's function, we can see why algorithms using
|
||||
backtracking follow roughly the $(0,0) - (1,1)$ diagonal: since this region is effectively a
|
||||
``valley'' in the 3D energy landscape, due to the nature of the backtracking methods and their
|
||||
strict adherence of the Wolfe conditions these methods avoid wild ``climbs'' (unlike the Netwon method without backtracking)
|
||||
and instead finding iterates with either sufficient decrease or a not to high increase.
|
||||
|
||||
When looking at gradient norms and objective function
|
||||
values (figure \ref{fig:gsppn}) over time, the degeneration of GD without
|
||||
backtracking and the inefficiency of GD with backtracking can clearly be seen.
|
||||
|
@ -141,6 +154,16 @@ at gradient $\nabla f(x_1) \approx 450$ and objective value $f(x_1) \approx
|
|||
100$, but quickly has both values decrease to 0 for its second iteration
|
||||
achieving convergence.
|
||||
|
||||
What has been observed in this assignment matches with the theory behind the methods.
|
||||
Since the Newton method has quadratic convergence and uses quadratic information (i.e.
|
||||
using the hessian in the step direction calculation) the number of iterations required
|
||||
to find the minimizer when already close to it (as we are with $x_0 = [0;0]$) is significantly
|
||||
less than the ones required for linear methods, like gradient descent. However, it must be
|
||||
said that for an objective with an high number of dimensions a single iteration of a quadratic
|
||||
method is significantly more costly than a single iteration of a linear method due to the
|
||||
quadratically growing number of cells in the hessian matrix, which makes it harder and harder
|
||||
to compute as the number of dimensions increase.
|
||||
|
||||
\section{Exercise 2}
|
||||
|
||||
\subsection{Exercise 2.1}
|
||||
|
@ -212,4 +235,10 @@ the opposite direction of the minimizer. This behaviour can also be observed in
|
|||
the plots in figure \ref{fig:4}, where several bumps and spikes are present in
|
||||
the gradient norm plot and small plateaus can be found in the objective
|
||||
function value plot.
|
||||
|
||||
Comparing these results with the theory behind BFGS, we can say the results that have been
|
||||
obtained fall within what we expect from the theory. Since BFGS is a superlinear but not quadratic
|
||||
method of convergence, its ``speed'' in terms of number of iterations falls within linear methods (like GD) and
|
||||
quadratic methods (like Newton).
|
||||
|
||||
\end{document}
|
||||
|
|
|
@ -1,176 +0,0 @@
|
|||
%% Homework 3 - Optimization Methods
|
||||
% Author: Claudio Maggioni
|
||||
%
|
||||
% Note: exercises are not in the right order due to matlab constraints of
|
||||
% functions inside of scripts.
|
||||
|
||||
clc
|
||||
clear
|
||||
close all
|
||||
|
||||
% Set to non-zero to generate LaTeX for graphs
|
||||
enable_m2tikz = 0;
|
||||
if enable_m2tikz
|
||||
addpath /home/claudio/git/matlab2tikz/src
|
||||
else
|
||||
matlab2tikz = @(a,b,c) 0;
|
||||
end
|
||||
|
||||
syms x y;
|
||||
f = (1 - x)^2 + 100 * (y - x^2)^2;
|
||||
global fl
|
||||
fl = matlabFunction(f);
|
||||
|
||||
%% 1.3 - Newton and GD solutions and energy plot
|
||||
|
||||
[x1, xs1, gs1] = Newton(f, [0;0], 50000, 1e-6, true);
|
||||
plot(xs1(1, :), xs1(2, :), 'Marker', '.');
|
||||
fprintf("Newton backtracking: it=%d\n", size(xs1, 2)-1);
|
||||
xlim([-0.01 1.01])
|
||||
ylim([-0.01 1.01])
|
||||
hold on;
|
||||
[x2, xs2, gs2] = Newton(f, [0;0], 50000, 1e-6, false);
|
||||
plot(xs2(1, :), xs2(2, :), 'Marker', '.');
|
||||
fprintf("Newton: it=%d\n", size(xs2, 2)-1);
|
||||
|
||||
[x3, xs3, gs3] = GD(f, [0;0], 50000, 1e-6, true);
|
||||
plot(xs3(1, :), xs3(2, :), 'Marker', '.');
|
||||
fprintf("GD backtracking: it=%d\n", size(xs3, 2)-1);
|
||||
|
||||
[x4, xs4, gs4] = GD(f, [0;0], 50000, 1e-6, false);
|
||||
plot(xs4(1, :), xs4(2, :), 'Marker', '.');
|
||||
fprintf("GD: it=%d\n", size(xs4, 2)-1);
|
||||
hold off;
|
||||
legend('Newton + backtracking', 'Newton', 'GD + backtracking', 'GD (alpha=1)')
|
||||
sgtitle("Iterations of Newton and Gradient descent methods over 2D energy landscape");
|
||||
matlab2tikz('showInfo', false, './ex1-3.tex');
|
||||
|
||||
figure;
|
||||
plot(xs4(1, :), xs4(2, :), 'Marker', '.');
|
||||
legend('GD (alpha=1)')
|
||||
sgtitle("Iterations of Newton and Gradient descent methods over 2D energy landscape");
|
||||
matlab2tikz('showInfo', false, './ex1-3-large.tex');
|
||||
|
||||
%% 2.3 - BGFS solution and energy plot
|
||||
|
||||
figure;
|
||||
[x5, xs5, gs5] = BGFS(f, [0;0], eye(2), 50000, 1e-6);
|
||||
xlim([-0.01 1.01])
|
||||
ylim([-0.01 1.01])
|
||||
plot(xs5(1, :), xs5(2, :), 'Marker', '.');
|
||||
fprintf("BGFS backtracking: it=%d\n", size(xs5, 2)-1);
|
||||
|
||||
sgtitle("Iterations of BGFS method over 2D energy landscape");
|
||||
matlab2tikz('showInfo', false, './ex2-3.tex');
|
||||
|
||||
%% 1.4 - Newton and GD gradient norm log
|
||||
|
||||
figure;
|
||||
semilogy(0:size(xs1, 2)-1, gs1, 'Marker', '.');
|
||||
ylim([5e-10, 1e12]);
|
||||
xlim([-1, 30]);
|
||||
hold on
|
||||
semilogy(0:size(xs2, 2)-1, gs2, 'Marker', '.');
|
||||
semilogy(0:size(xs3, 2)-1, gs3, 'Marker', '.');
|
||||
semilogy(0:size(xs4, 2)-1, gs4, 'Marker', '.');
|
||||
hold off
|
||||
|
||||
legend('Newton + backtracking', 'Newton', 'GD + backtracking', 'GD (alpha=1)')
|
||||
sgtitle("Gradient norm w.r.t. iteration number for Newton and GD methods");
|
||||
matlab2tikz('showInfo', false, './ex1-4-grad.tex');
|
||||
|
||||
|
||||
figure;
|
||||
plot(0:size(xs1, 2)-1, gs1, 'Marker', '.');
|
||||
ylim([5e-10, 25]);
|
||||
xlim([-1, 30]);
|
||||
hold on
|
||||
plot(0:size(xs2, 2)-1, gs2, 'Marker', '.');
|
||||
plot(0:size(xs3, 2)-1, gs3, 'Marker', '.');
|
||||
plot(0:size(xs4, 2)-1, gs4, 'Marker', '.');
|
||||
hold off
|
||||
|
||||
legend('Newton + backtracking', 'Newton', 'GD + backtracking', 'GD (alpha=1)')
|
||||
sgtitle("Gradient norm w.r.t. iteration number for Newton and GD methods");
|
||||
matlab2tikz('showInfo', false, './ex1-4-grad.tex');
|
||||
|
||||
figure;
|
||||
semilogy(0:size(xs3, 2)-1, gs3, 'Marker', '.');
|
||||
ylim([1e-7, 1e10]);
|
||||
hold on
|
||||
semilogy(0:size(xs4, 2)-1, gs4, 'Marker', '.');
|
||||
hold off
|
||||
|
||||
legend('GD + backtracking', 'GD (alpha=1)')
|
||||
sgtitle("Gradient norm w.r.t. iteration number for Newton and GD methods");
|
||||
matlab2tikz('showInfo', false, './ex1-4-grad-large.tex');
|
||||
|
||||
figure;
|
||||
ys1 = funvalues(xs1);
|
||||
ys2 = funvalues(xs2);
|
||||
ys3 = funvalues(xs3);
|
||||
ys4 = funvalues(xs4);
|
||||
ys5 = funvalues(xs5);
|
||||
semilogy(0:size(xs1, 2)-1, ys1, 'Marker', '.');
|
||||
ylim([5e-19, 1e12]);
|
||||
xlim([-1, 30]);
|
||||
hold on
|
||||
semilogy(0:size(xs2, 2)-1, ys2, 'Marker', '.');
|
||||
semilogy(0:size(xs3, 2)-1, ys3, 'Marker', '.');
|
||||
semilogy(0:size(xs4, 2)-1, ys4, 'Marker', '.');
|
||||
hold off
|
||||
|
||||
legend('Newton + backtracking', 'Newton', 'GD + backtracking', 'GD (alpha=1)')
|
||||
sgtitle("Objective function value w.r.t. iteration number for Newton and GD methods");
|
||||
matlab2tikz('showInfo', false, './ex1-4-ys.tex');
|
||||
|
||||
ys1 = funvalues(xs1);
|
||||
ys2 = funvalues(xs2);
|
||||
ys3 = funvalues(xs3);
|
||||
ys4 = funvalues(xs4);
|
||||
ys5 = funvalues(xs5);
|
||||
semilogy(0:size(xs1, 2)-1, ys1, 'Marker', '.');
|
||||
ylim([5e-19, 20]);
|
||||
xlim([-1, 30]);
|
||||
hold on
|
||||
semilogy(0:size(xs2, 2)-1, ys2, 'Marker', '.');
|
||||
semilogy(0:size(xs3, 2)-1, ys3, 'Marker', '.');
|
||||
semilogy(0:size(xs4, 2)-1, ys4, 'Marker', '.');
|
||||
hold off
|
||||
|
||||
legend('Newton + backtracking', 'Newton', 'GD + backtracking', 'GD (alpha=1)')
|
||||
sgtitle("Objective function value w.r.t. iteration number for Newton and GD methods");
|
||||
matlab2tikz('showInfo', false, './ex1-4-ys.tex');
|
||||
|
||||
figure;
|
||||
|
||||
semilogy(0:size(xs3, 2)-1, ys3, 'Marker', '.');
|
||||
semilogy(0:size(xs4, 2)-1, ys4, 'Marker', '.');
|
||||
|
||||
legend('GD + backtracking', 'GD (alpha=1)')
|
||||
sgtitle("Objective function value w.r.t. iteration number for Newton and GD methods");
|
||||
matlab2tikz('showInfo', false, './ex1-4-ys-large.tex');
|
||||
|
||||
%% 2.4 - BGFS gradient norms plot
|
||||
|
||||
figure;
|
||||
semilogy(0:size(xs5, 2)-1, gs5, 'Marker', '.');
|
||||
|
||||
sgtitle("Gradient norm w.r.t. iteration number for BGFS method");
|
||||
matlab2tikz('showInfo', false, './ex2-4-grad.tex');
|
||||
|
||||
%% 2.4 - BGFS objective values plot
|
||||
|
||||
figure;
|
||||
semilogy(0:size(xs5, 2)-1, ys5, 'Marker', '.');
|
||||
|
||||
sgtitle("Objective function value w.r.t. iteration number for BGFS methods");
|
||||
matlab2tikz('showInfo', false, './ex2-4-ys.tex');
|
||||
|
||||
function ys = funvalues(xs)
|
||||
ys = zeros(1, size(xs, 2));
|
||||
global fl
|
||||
for i = 1:size(xs, 2)
|
||||
ys(i) = fl(xs(1,i), xs(2,i));
|
||||
end
|
||||
end
|
|
@ -6,7 +6,6 @@
|
|||
|
||||
clc
|
||||
clear
|
||||
close all
|
||||
|
||||
% Set to non-zero to generate LaTeX for graphs
|
||||
enable_m2tikz = 0;
|
||||
|
@ -16,11 +15,31 @@ else
|
|||
matlab2tikz = @(a,b,c) 0;
|
||||
end
|
||||
|
||||
%% 1.1 - Rosenbrock function definition and surf plot
|
||||
close all
|
||||
syms x y;
|
||||
f = (1 - x)^2 + 100 * (y - x^2)^2;
|
||||
global fl
|
||||
fl = matlabFunction(f);
|
||||
|
||||
xs = -0.2:0.01:1.2;
|
||||
Z = zeros(size(xs,2), size(xs,2));
|
||||
|
||||
for x = 1:size(xs, 2)
|
||||
for y = 1:size(xs, 2)
|
||||
Z(x,y) = fl(xs(x), xs(y));
|
||||
end
|
||||
end
|
||||
|
||||
surf(xs, xs, Z, 'EdgeColor', 'none', 'FaceAlpha', 0.4);
|
||||
hold on
|
||||
plot3([0,1],[0,1],[fl(0,0),fl(1,1)],'-r.');
|
||||
plot3([1],[1],[fl(1,1)],'-k.');
|
||||
hold off
|
||||
|
||||
figure;
|
||||
|
||||
|
||||
%% 1.2 - Minimizing the Rosenbrock function
|
||||
|
||||
[x1, xs1, gs1] = Newton(f, [0;0], 50000, 1e-6, true);
|
||||
|
|
BIN
Claudio_Maggioni_3/rosenb.jpg
Normal file
BIN
Claudio_Maggioni_3/rosenb.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 56 KiB |
|
@ -18,12 +18,12 @@ header-includes:
|
|||
### Question (a)
|
||||
|
||||
As already covered in the course, the gradient of a standard quadratic form at a
|
||||
point $ x_0$ is equal to:
|
||||
point $x_0$ is equal to:
|
||||
|
||||
$$ \nabla f(x_0) = A x_0 - b $$
|
||||
|
||||
Plugging in the definition of $x_0$ and knowing that $\nabla f(x_m) = A x_m - b = 0$
|
||||
(according to the first necessary condition for a minimizer), we obtain:
|
||||
Plugging in the definition of $x_0$ and knowing that $\nabla f(x_m) = A x_m - b
|
||||
= 0$ (according to the first necessary condition for a minimizer), we obtain:
|
||||
|
||||
$$ \nabla f(x_0) = A (x_m + v) - b = A x_m + A v - b = b + \lambda v - b =
|
||||
\lambda v $$
|
||||
|
@ -52,7 +52,90 @@ $\alpha \neq \frac{1}{\lambda}$.
|
|||
Therefore, since $x_1 = x_m$, we have proven SD
|
||||
converges to the minimizer in one iteration.
|
||||
|
||||
### Point 2
|
||||
## Point 2
|
||||
|
||||
The right answer is choice (a), since the energy norm of the error indeed always
|
||||
decreases monotonically.
|
||||
|
||||
To prove that this is true, we first consider a way to express any iterate $x_k$
|
||||
in function of the minimizer $x_s$ and of the missing iterations:
|
||||
|
||||
$$x_k = x_s + \sum_{i=k}^{N} \alpha_i A^i p_0$$
|
||||
|
||||
This formula makes use of the fact that step directions in CG are all
|
||||
A-orthogonal with each other, so the k-th search direction $p_k$ is equal to
|
||||
$A^k p_0$, where $p_0 = -r_0$ and $r_0$ is the first residual.
|
||||
|
||||
Given that definition of iterates, we're able to express the error after
|
||||
iteration $k$ $e_k$ in a similar fashion:
|
||||
|
||||
$$e_k = x_k - x_s = \sum_{i=k}^{N} \alpha_i A^i p_0$$
|
||||
|
||||
We then recall the definition of energy norm $\|e_k\|_A$:
|
||||
|
||||
$$\|e_k\|_A = \sqrt{\langle Ae_k, e_k \rangle}$$
|
||||
|
||||
We then want to show that $\|e_k\|_A = \|x_k - x_s\|_A > \|e_{k+1}\|_A$, which
|
||||
in turn is equivalent to claim that:
|
||||
|
||||
$$\langle Ae_k, e_k \rangle > \langle Ae_{k+1}, e_{k+1} \rangle$$
|
||||
|
||||
Knowing that the dot product is linear w.r.t. either of its arguments, we pull
|
||||
out the sum term related to the k-th step (i.e. the first term in the sum that
|
||||
makes up $e_k$) from both sides of $\langle Ae_k, e_k \rangle$,
|
||||
obtaining the following:
|
||||
|
||||
$$\langle Ae_{k+1}, e_{k+1} \rangle + \langle \alpha_k A^{k+1} p_0, e_k \rangle
|
||||
+ \langle Ae_{k+1},\alpha_k A^k p_0 \rangle > \langle Ae_{k+1}, e_{k+1}
|
||||
\rangle$$
|
||||
|
||||
which in turn is equivalent to claim that:
|
||||
|
||||
$$\langle \alpha_k A^{k+1} p_0, e_k \rangle
|
||||
+ \langle Ae_{k+1},\alpha_k A^k p_0 \rangle > 0$$
|
||||
|
||||
From this expression we can collect term $\alpha_k$ thanks to linearity of the
|
||||
dot-product:
|
||||
|
||||
$$\alpha_k (\langle A^{k+1} p_0, e_k \rangle
|
||||
+ \langle Ae_{k+1}, A^k p_0 \rangle) > 0$$
|
||||
|
||||
and we can further "ignore" the $\alpha_k$ term since we know that all
|
||||
$\alpha_i$s are positive by definition:
|
||||
|
||||
$$\langle A^{k+1} p_0, e_k \rangle
|
||||
+ \langle Ae_{k+1}, A^k p_0 \rangle > 0$$
|
||||
|
||||
Then, we convert the dot-products in their equivalent vector to vector product
|
||||
form, and we plug in the definitions of $e_k$ and $e_{k+1}$:
|
||||
|
||||
$$p_0^T (A^{k+1})^T (\sum_{i=k}^{N} \alpha_i A^i p_0) +
|
||||
p_0^T (A^{k})^T (\sum_{i=k+1}^{N} \alpha_i A^i p_0) > 0$$
|
||||
|
||||
We then pull out the sum to cover all terms thanks to associativity of vector
|
||||
products:
|
||||
|
||||
$$\sum_{i=k}^N (p_0^T (A^{k+1})^T A^i p_0) \alpha_i+ \sum_{i=k+1}^N
|
||||
(p_0^T (A^{k})^T A^i p_0) \alpha_i > 0$$
|
||||
|
||||
We then, as before, can "ignore" all $\alpha_i$ terms since we know by
|
||||
definition that
|
||||
they are all strictly positive. We then recalled that we assumed that A is
|
||||
symmetric, so $A^T = A$. In the end we have to show that these two
|
||||
inequalities are true:
|
||||
|
||||
$$p_0^T A^{k+1+i} p_0 > 0 \; \forall i \in [k,N]$$
|
||||
$$p_0^T A^{k+i} p_0 > 0 \; \forall i \in [k+1,N]$$
|
||||
|
||||
To show these inequalities are indeed true, we recall that A is symmetric and
|
||||
positive definite. We then consider that if a matrix A is SPD, then $A^i$ for
|
||||
any positive $i$ is also SPD[^1]. Therefore, both inequalities are trivially
|
||||
true due to the definition of positive definite matrices.
|
||||
|
||||
[^1]: source: [Wikipedia - Definite Matrix $\to$ Properties $\to$
|
||||
Multiplication](
|
||||
https://en.wikipedia.org/wiki/Definite_matrix#Multiplication)
|
||||
|
||||
Thanks to this we have indeed proven that the delta $\|e_k\|_A - \|e_{k+1}\|_A$
|
||||
is indeed positive and thus as $i$ increases the energy norm of the error
|
||||
monotonically decreases.
|
||||
|
|
Binary file not shown.
Reference in a new issue