\[\nabla^2 J(x^*)\geq0\Leftrightarrow A \text{ is positive semi-definite}\]
\subsection{Sufficient conditions}
It is a sufficient condition for $x^*$ to be a minimizer of $J$ that the first necessary condition is true and that:
\[\nabla^2 J(x^*) > 0\Leftrightarrow A \text{ is positive definite}\]
\subsection{Does $\min_{x \in R^n} J(x)$ have a unique solution?}
Not in general. If for example we consider A and b to be only zeros, then $J(x)=0$ for all $x \in\!R^n$ and thus $J$ would have an infinite number of minimizers.
However, if $A$ is guaranteed to be s.p.d, the minimizer would be unique because the first order necessary condition would hold only for one value $x^*$. This is because the linear system $Ax^*= b$ would have one and only one solution (due to $A$ being full rank and that solution being the minimizer since the hessian would be always convex).
\subsection{Finding the optimal step length $\alpha$}
Considering $p$, our search direction, as the negative of the gradient (as dictated by the gradient method), we can rewrite the problem of finding an optimal step size $\alpha$ as the problem of minimizing the objective function along the line where $p$ belongs. This can be written as minimizing a function $l(\alpha)$, where:
\[l(\alpha)=\langle A(x +\alpha p), x +\alpha p\rangle\]
To minimize we compute the gradient of $l(\alpha)$ and fix it to zero to find a stationary point, finding a value for $\alpha$ in function of $A$, $x$ and $p$.
\[l'(\alpha)=2\cdot\langle A (x +\alpha p), p \rangle=2\cdot\left(\langle Ax, p \rangle+\alpha\langle Ap, p \rangle\right)\]
Since $A$ is s.p.d. by definition the hessian of function $l(\alpha)$ will always be positive, the stationary point found above is a minimizer of $l(\alpha)$ and thus the definition of $\alpha$ given above gives the optimal search step for the gradient method.
\subsection{Matlab code for the gradient method and convergence results}
The main MATLAB file to run to execute the gradient method is \texttt{ex3.m}. Convergence results and number of iterations are shown below, where the verbatim program output is written:
What has been said before about the convergence of the gradient method is additionally showed in the last two sets of plots.
From the objective function plot we can see that iterations starting from $\begin{bmatrix}10&10\end{bmatrix}^T$ (depicted in yellow) take the highest number of iterations to reach the minimizer (or an acceptable approximation of it). The zig-zag behaviour described before can be also observed in the contour plots, showing the iteration steps taken for each $\mu$ and starting from each $x_0$.
Finally, in the gradient norm plots a phenomena that creates increasingly flatter plateaus as $\mu$ increases can be observed.