hw1: submittable
This commit is contained in:
parent
f6dd2a2d6b
commit
b99792f558
2 changed files with 6 additions and 2 deletions
Binary file not shown.
|
@ -7,7 +7,6 @@
|
|||
\usepackage{fancyvrb}
|
||||
\usepackage{tikz}
|
||||
|
||||
|
||||
\begin{document}
|
||||
|
||||
\setassignment
|
||||
|
@ -221,7 +220,12 @@ implementing the pseudocode, my implementation:
|
|||
\end{figure}
|
||||
|
||||
The results of the matrix multiplication benchmark for the naive, blocked, and
|
||||
BLAS implementations are shown in Figure \ref{fig:bench}.
|
||||
BLAS implementations are shown in Figure \ref{fig:bench}. The blocked
|
||||
implementation achieves approximately 50\% more FLOPS than the naive
|
||||
implementation thanks to the optimisations in space and temporal cache locality
|
||||
described. However, the blocked implementation achives less than a tenth of
|
||||
FLOPS compared to Intel MKL BLAS based one due to the microarchitecture
|
||||
optimization the latter one is able to exploit.
|
||||
|
||||
\begin{figure}[t]
|
||||
\includegraphics[width=\textwidth]{timing.pdf}
|
||||
|
|
Reference in a new issue