diff --git a/Project1/project_1_maggioni_claudio.pdf b/Project1/project_1_maggioni_claudio.pdf index bc86944..bb87295 100644 Binary files a/Project1/project_1_maggioni_claudio.pdf and b/Project1/project_1_maggioni_claudio.pdf differ diff --git a/Project1/project_1_maggioni_claudio.tex b/Project1/project_1_maggioni_claudio.tex index 9bf06c0..553047c 100644 --- a/Project1/project_1_maggioni_claudio.tex +++ b/Project1/project_1_maggioni_claudio.tex @@ -7,7 +7,6 @@ \usepackage{fancyvrb} \usepackage{tikz} - \begin{document} \setassignment @@ -221,7 +220,12 @@ implementing the pseudocode, my implementation: \end{figure} The results of the matrix multiplication benchmark for the naive, blocked, and -BLAS implementations are shown in Figure \ref{fig:bench}. +BLAS implementations are shown in Figure \ref{fig:bench}. The blocked +implementation achieves approximately 50\% more FLOPS than the naive +implementation thanks to the optimisations in space and temporal cache locality +described. However, the blocked implementation achives less than a tenth of +FLOPS compared to Intel MKL BLAS based one due to the microarchitecture +optimization the latter one is able to exploit. \begin{figure}[t] \includegraphics[width=\textwidth]{timing.pdf}