hw1: submittable

This commit is contained in:
Claudio Maggioni 2022-10-05 10:30:24 +02:00
parent f6dd2a2d6b
commit b99792f558
2 changed files with 6 additions and 2 deletions

View file

@ -7,7 +7,6 @@
\usepackage{fancyvrb} \usepackage{fancyvrb}
\usepackage{tikz} \usepackage{tikz}
\begin{document} \begin{document}
\setassignment \setassignment
@ -221,7 +220,12 @@ implementing the pseudocode, my implementation:
\end{figure} \end{figure}
The results of the matrix multiplication benchmark for the naive, blocked, and The results of the matrix multiplication benchmark for the naive, blocked, and
BLAS implementations are shown in Figure \ref{fig:bench}. BLAS implementations are shown in Figure \ref{fig:bench}. The blocked
implementation achieves approximately 50\% more FLOPS than the naive
implementation thanks to the optimisations in space and temporal cache locality
described. However, the blocked implementation achives less than a tenth of
FLOPS compared to Intel MKL BLAS based one due to the microarchitecture
optimization the latter one is able to exploit.
\begin{figure}[t] \begin{figure}[t]
\includegraphics[width=\textwidth]{timing.pdf} \includegraphics[width=\textwidth]{timing.pdf}