hw1: submittable

This commit is contained in:
Claudio Maggioni 2022-10-05 10:30:24 +02:00
parent f6dd2a2d6b
commit b99792f558
2 changed files with 6 additions and 2 deletions

View file

@ -7,7 +7,6 @@
\usepackage{fancyvrb}
\usepackage{tikz}
\begin{document}
\setassignment
@ -221,7 +220,12 @@ implementing the pseudocode, my implementation:
\end{figure}
The results of the matrix multiplication benchmark for the naive, blocked, and
BLAS implementations are shown in Figure \ref{fig:bench}.
BLAS implementations are shown in Figure \ref{fig:bench}. The blocked
implementation achieves approximately 50\% more FLOPS than the naive
implementation thanks to the optimisations in space and temporal cache locality
described. However, the blocked implementation achives less than a tenth of
FLOPS compared to Intel MKL BLAS based one due to the microarchitecture
optimization the latter one is able to exploit.
\begin{figure}[t]
\includegraphics[width=\textwidth]{timing.pdf}